How can AI help safely increase the secondary use of health data?

Blog

Artificial intelligence (AI) tools have generated significant enthusiasm in recent times, particularly within healthcare. Generative AI tools like OpenAI’s ChatGPT seem to be constantly in the news, and generative AI’s potential to solve difficult problems in healthcare like predicting patient outcomes and navigating health records leave them poised to transform patient experiences.

Healthcare is considered a data-rich sector, but often secondary use of health data is obstructed by privacy concerns and regulation. These regulations differ in different countries, though they have recently been somewhat unified in Europe by the passing of the European General Data Protection Regulation (GDPR). In Finland, research and innovation has been spurred by forward-thinking legislation for secondary use of health data, and a similar type of legislation envisioned in the European Health Data Space (EHDS) is on the horizon.

AI-enhanced anonymization enables safe use of health data

One way that AI has helped with the secondary use of health data is in enabling new technologies. New AI-enhanced next-generation anonymization technology opens up new possibilities for utilizing sensitive health data in a very safe way, by irreversibly removing all identifiers to an individual in a dataset while maintaining very high data quality. Traditionally, anonymization has not been utilized widely due to poor quality of legacy anonymization methods. With next-generation anonymization methods, however, artificial intelligence chooses the optimal solution among the infinite number of ways to anonymize a dataset. This optimal solution satisfies privacy requirements, but the data retains the statistical properties of the original and is suitable for downstream uses.

High-quality anonymous data is the key to the secondary use of health data in many different use cases, such as enabling transborder data collaborations for researchers, re-using legacy data, getting more real-world evidence, investigating rare diseases with larger datasets, and speeding up and reducing the cost of clinical trials by using external control arms.

Study shows high-quality anonymized data to be as useful as original data

In fact, data anonymized with a next-generation anonymization method was found to be just as useful as original, sensitive data when used as an external control arm for a randomized clinical trial. This study was conducted by Bayer, MedEngine and VEIL.AI and has been published in BMC Medical Research Methodology.

As Jussi Leinonen, Strategic Project Lead at Bayer, states: “This is a significant achievement. In our study, we could draw the same conclusions from anonymized data as from traditional pseudonymized, individual-level research data.”

In order for the authority or data protection officer to be able to consent to releasing anonymized data, there is also a need to verify the anonymity of datasets that have undergone the anonymization process. The Finnish health and social data permit authority that authorizes secondary use of health data, Findata, has accepted an anonymization verification protocol created by VEIL.AI. This protocol allowed Bayer to export its anonymized data across the borders from the Findata secure environment to Bayer’s own secure environment and opened up great new possibilities for further enrichment and utilization of data, such as merging the anonymized data with legacy RCT data and getting more anonymized RWD and RCT data.

Researchers and Data Scientists need high-quality data to be successful in their work. Because sensitive data is regulated and access to it can be difficult, the secondary use of health data relies on high-quality anonymized data. AI-enhanced next-generation data anonymization technology can unleash the potential secondary uses of health data benefiting patients, healthcare, and the whole society.

The article was first published on pharmaca.fi

How can AI help safely increase the secondary use of health data?

AI-enhanced anonymization enables safe use of health data

Study shows high-quality anonymized data to be as useful as original data

The End of the Linear Data Pipeline

New application from VEIL.AI:

Pseudonymization vs. Anonymization: What Counts as Personal?