Because we can?

9 min read May 9, 2018 at 12:54am on Privacy and Technology

In 2016, the Productivity Commission held an inquiry into ways to improve the availability and use of public and private sector data. We didn't have the capacity at the time to make a submission, although I did write a post on my personal blog in response to the Digital Health strategy, discussing my thoughts on Professor Fiona Stanley's submission on behalf of the Telethon Kids Institute (especially this article).

The report was finalised a little over a year ago; the Government's response was published on May 1st. I'd commend Justin Warren (from the EFA Board)'s excellent "too long, Justin reads" summary on twitter, which begins here

The first thing I'm stuck by is the framing of the assisting Minister's introduction - "Australia's data is a major national resource". This is the catch-cry of big data enthusiasts everywhere, and it follows that because we now generate an enormous amount of data through what we do every day - individually, and as a nation, that it would almost be unethical not to harness this data.

I'll summarise the issues that jump out at me in this push for mining our data.

Speculative benefits

The problem with big data analysis is that - in many cases - you don't know what you'll find until you go looking. This can be good - going into analysis without expectations can sometimes turn up some gem that nobody anticipated. However, it can also mean that that your data maybe being analysed and nothing much will come of it. I think this is probably less likely, but we just can't know prospectively.

The other problem with this is the sort of not-clinically-significant association trawling that seems to make up so many of the science stories in the media ("Big data confirms* dark chocolate is associated with an [increased | decreased] risk of [insert random disease here]"). 

Uncertain benefits mean (to me as a doctor, at least) that you need to be more mindful of the risks - when the benefits are great, you may be less concerned about them (broken ribs from CPR which saves your life?). But when the benefits are marginal, there is a possibility you will do more harm than good.

Minimisation of the risks


Screenshot from Trent's presentation


"If, like Fiona Stanley, you believe there are no privacy risks from in health, then you don't understand the technology well enough, and you probably shouldn't be doing it."

-- Trent presenting to medical colleagues about linkage research

A point I've made numerous times before is that healthcare workers, in general, are terrible at computers. It's kind of fine, you'd probably prefer if they were experts in health than in tech. But it also means that when the tech gets complicated, there's a lack of understanding of some of the technical complexities, and things can get missed.

"A significant proportion of the [data linkage] literature has been published outside of Computer Science and Information Security venues, with a particular prevalence for publication within the medical domain. This has resulted in proposals not being subjected to the normal level of rigour and analysis associated with information security."

-- Culnane, Rubinstein & Teague casting shade on public health linkage research

People who say there's no risk are either completely clueless, or feeding you PR spin. (I disagreed with the guy doing my hospital's media training who was trying to get me to say "there's no risk" about something).

You only have to look at the Government's woeful record on IT projects (#censusfail, #notmydebt, breaches of confidential data on asylum seekers, politicians' phone numbers, public servants' payroll details, incorrectly de-identified medicare data, I'm sure there will be more to come) to be sure that the inevitable myHealthRecord data breach is a when, and not an if scenario.

Health data is not Open Data

The government would have you think that they are enormous fans of openness and transparency - after all, we've joined the Open Government Partnership, there's more than 28,000 datasets on

(please ignore the facts that the Government hates FOI, persecutes whistleblowers (and their lawyers), and still hasn't appointed a permanent replacement to the Office of the Information Commissioner; other than that, transparency is tops!)

There's certainly a strong argument for data on the spending of public money being made available, and it's one I agree with.

But this does not include people's personal medical information. Such privacy protection we have in Australia is provided by the Privacy Act and the National Privacy Principles. The NPPs specifically include health information as "sensitive information" and require extra measures to be taken to keep this data safe.  This data should not be considered open, and even aggregate data, if not handled with sufficient care has the potential to breach patients' confidentiality. 

In his introduction to CSIRO's De-Identification Decision Making Framework [pdf link], former Information Commissioner, Timothy Pilgrim notes:

" data environments are really only appropriate for data that is either not derived from personal information, or data that through an extremely robust de-identification process that ensures with a very high degree of confidence that no individuals are reasonably identifiable"

The work of the University of Melbourne team who looked at the MBS/PBS dataset have shown that we are not yet at this high degree of confidence.

Data as a natural resource

As mentioned above, there's lots of framing of this issue as "we create so much data, it would be unethical not to use it". It's almost like our personal data is coal in the ground, desperate for us to dig it up and burn it with abandon. I rather wonder why we aren't gripped by the same ethical imperative to build more solar panels to catch the gigajoules of energy the sun casts onto our wide, brown land, but that's another story.

We do what we must, because we can. 

-- Still Alive, Portal, Valve software

The important difference is that data has the potential to cause harm to individuals when it is misused. This can be through the loss of privacy that occurs with a databreach, but it can also come about because of the use of aggregate data. How this can occur is a bit beyond the scope of (an already too-long) blog post, but the book Weapons of Math Destruction - by a hedge-fund manager turned data-scientist is well worth a read.

It is for this reason we believe that there should be an informed consent process for data sharing. The issue, of course, is how to obtain truly informed consent, given that the knowledge of the risks seems to be so poor, even among those who make use of the data. Certainly in the health field, the principles of the Declaration of Helsinki also explicitly state that they apply to research on data derived from human subjects, and we think this is a good model for reuse of data for research.

Good citizens share their data

This one's right before the contents page of the Government's response;

“Some 91 per cent of Australians would be willing to share their de-identified medical data if it went towards research purposes.”

-- Research Australia, 2016

"De-identified" is of course the rub, because the Digital Health Strategy is equally clear on patient expectations:

The consultation process made it clear that Australians expect strong safeguards to ensure their health information is safe and secure, and that their data is used only when necessary and when they choose

-- Australian Digital Health Strategy, p 17

There's indirect pressure on people with this one - from the Government, who ask if you want your health services to be as efficient as possible, but also from researchers (see the article I linked above for the most egregious example). You have concerns about sharing your data, even after we've (incorrectly) explained to you that it's completely safe?  Why do you hate good healthcare?

Garbage in, Garbage out

Even when you accept that big data will have benefits for health - and I believe this is the case - how useful is it going to be?

I'll illustrate this one with a brief example from my practice.

Healthcare-associated bloodstream infections (bacteria in your blood) are a marker of quality of healthcare. Some of the potentially avoidable causes include having intravenous drips left in for too long, unnecessary tubes in other places (like urinary catheters) or the failure to recognise or correctly treat a more simple infection in a timely manner.

Everyone agrees that HA-BSIs are bad, and we should do what we can to prevent them. The government has introduced measures that preventable complications (including HA-BSIs) will attract a funding penalty for hospitals, to act as an incentive to prevent them.

(Ignore for a moment the counter-intuition about having something bad happen and then needing to improve your systems while having less money with which to do it)

My hospital got a report back from Queensland Health corporate office that it will be of the order of a couple of hundred grand a year. The amount is based on the number of HA-BSIs, which are taken from the medical record by staff in the clinical coding team.

Our infection control nurses asked for the patient record numbers of the bloodstream infections for the financial year, and did a mini audit.

There were 21 patients coded as having a bloodstream infection.

  • 13 had no positive blood culture during their admission (this is the test that is diagnostic of a BSI)
  • 4 had positive blood cultures, but were not considered to have BSIs because the organisms were not pathogenic
  • 1 had a BSI which was community-onset, not healthcare-associated
  • 3 did actually have HA-BSIs.

So only 3/21 (or 14%) of patients were considered by the subject-area experts to have the diagnosis that they'd been coded as having.

The population health statistics that will be taken from myHealthRecord will also be based on coding data.

What on Earth does big data on a dataset with only 14% accuracy mean? Is this really acceptable because "it's the best we've got"? And given the accuracy is this poor, how does this affect our framing of the risks vs the benefits of sharing data?


While Future Wise welcomes the creation of a data commissioner, it would be far preferable that this be backed up with better legislative protections - ideally, something along the lines of the EU's GPDR, or at the very least a significant strengthening of the Privacy Act.

We support Open Data, and while we are cautiously in favour of using data to improve policy and outcomes, we strongly believe it should be done properly; using the best data in the best ways, with the best protection of the data subjects.

Future Wise is concerned that the secondary use of healthcare data seems to be assumed to be happening, regardless of the results of the consultation we submitted to, but we believe that the principles of truly informed consent apply to all consumer data, not just that related to healthcare.

This nexus between technology and healthcare - how to best integrate complex systems with competing demands, in a way that achieves the best outcome for everyone, is where we see our role here at Future Wise, so you (and the Data Commissioner) will be hearing lots more from us about this.

We'd love to hear from you; contact us via Twitter

Image credit :Big data is watching you - by ev - via Unsplash - CC0