Bayer’s project Future Clinical Trials is utilizing next-generation anonymization technology

News

Drug development is a laborious, time-consuming process. On average, developing a new drug takes 10 years – about seven of which are spent on clinical research. This does not even touch on the considerable costs of developing a new prescription drug, which can amount to two billion euros on average.

Bayer has an ongoing project which, if successful, is anticipated to affect Bayer’s clinical trial practices globally.

VEIL.AI interviewed Bayer’s Strategic Project Lead Jussi Leinonen on “Future Clinical Trials” – the project that utilizes VEIL.AI’s next-generation anonymization technology which has attracted attention among clinical trial and data scientist experts.

Can you tell us shortly about the FCT project and how it all started?

“Future Clinical Trials” is a three-year, multi-million-euro project aimed at speeding up clinical trials, and increasing patient safety.

The project includes two larger parts: customer centricity, which aims to simplify the patient journey and decrease additional burden to research sites, and data science, which focuses on integrated data sets, pattern recognition, and external control arms.

Even before the planning phase of the project, a trend could be seen where the importance of data-centricity was increasing in drug development. The use of external real-world data (RWD) and the reuse of our own clinical trial data (RCT) played a very important role in our thinking. Integration of RWD with RCT data is a challenge, though, which is not only related to differences in data collection and different data models, but also in data quality, privacy, and ownership.

But how to do it was easier said than done. Anonymization of data was one idea – if we had high-quality anonymized data, it would make data usability and sharing much easier.

At the beginning of the project, you ordered a concept for anonymizing your own internal data from VEIL.AI – can you tell us more about the benefits of the anonymization concept?

It was easy to justify the need for why we needed advanced anonymization of external data – it is a potential way we can get good quality external RWD in our own analytics environment. But to justify why the company’s own data should be anonymized in a more advanced way was tougher.

The created concept helps us to explain that RCT data is often pseudonymous data with limitations, and it cannot be freely reused. By anonymizing the internal dataset, the data becomes GDPR-free and reusable for the organization.

In the big picture it is all about how we want to improve the organization’s capabilities in utilizing its own sensitive data. From a data scientist’s point of view, one could say that the greatest value of enabling efficient data sharing is that collaboration and co-development with external collaborators is much easier.

This can lead to an interesting reflection on how to measure the value or benefit that is created within the organization, but not necessarily for yourself – someone else will get the insight of the dataset that was anonymized, and you may not even know about it.

This is also a company cultural issue – how to value such things. Therefore, it is also important that senior management understands the concept of anonymization and its possible value-generating scenarios.

In our international organization it was also important to show the differences in regulations. Many may be surprised that anonymized data is different from HIPAA compliant data. From a European point of view, HIPAA compliant does not necessarily mean anonymized data, but rather pseudonymized data that is subject to GDPR. That is why we in Europe need to go beyond the HIPAA regulation.

During the process it became evident that the reuse of our own data is limited if it is not anonymized. Creating the concept also included interviews and discussions with the organization’s various stakeholders – it was a good learning process overall.

What kinds of goals do you have regarding data science in the project?

Concerning integrated datasets, we are building the capabilities of embedding RWD into clinical development programs. The idea is to anonymize the pseudonymized external RWD, which enables the data to be transferred to our own environment, where we can merge it with our own RCT data.

When talking about external control arms, we want to create a methodology for providing virtual or synthetic controls for clinical trials based on RWD and legacy clinical trial data.

Can you open that up a bit more – what does an external control arm mean?

Patient recruitment and retention in clinical trials are major challenges in the development of new drugs. If there is high-quality data available, part of the control patient group could be replaced in some instances with an external control arm – i.e., with a group formed based on external health data, most commonly data from real-world data like electronic health records / registries or previous clinical trials. This can have many positive impacts like increasing efficiency, reducing delays, and lowering costs in the evaluation of new therapies. Advanced anonymization has the potential to be used to ease the difficulties in using sensitive subject-level health data for this purpose.

What kind of savings could be achieved?

This is a bit of a complex question to answer and depends on the specific trial. A preliminary conservative estimate is something between 10-20%, which is not applicable to every trial, but it is kind of an average rough estimation.

However, it is also good to note that we are not only talking about cost savings here, but also the drug’s time-to-market time savings. The patients can get the new drug faster.

How do the regulators – e.g., FDA and EMA – feel about using the external control arms?

They are guiding the industry towards this direction and there are very interesting use cases already ongoing. Using health data from multiple countries and data sources makes a lot of sense in creation of external control arms as late phase clinical trials have patients from multiple countries as well.

What kind of a team of people from Bayer do you have onboard this project?

We have a great team in this project. Role-wise we have pretty much the same type of setup as in a normal clinical study – with some exceptions. Our core project team members work in Bayer’s global roles: clinical data science & analytics, clinical development & operations, oncology development, R&D, IT and medical affairs, half of us are coming from outside of Finland.

Finally, one last question: What kinds of things do you see in the future regarding the utilization of next-generation anonymization technology?

Health data can also become outdated as treatment paradigms do. If you anonymize one dataset, it cannot be used indefinitely. It needs to be updated from time to time to retain the applicability of the data for a patient today. That also implies that anonymizing and sharing data does not decrease the value of the original data, even if some datasets would be exported in a (limited) anonymized or synthetized format.

It would also be great if an anonymized dataset for research purposes could be acquired from a data controller, e.g., from a university hospital and transferred to the pharma company’s own analytics environment in a transparent and repeatable manner.

High-quality health data is valuable and not easily collected – there is a huge need to find ways to utilize it better for secondary use on a global scale.

Bayer’s project Future Clinical Trials is utilizing next-generation anonymization technology

Pseudonymization vs. Anonymization: What Counts as Personal?

Finland’s Health Data Leadership: Pioneering Opportunities for Finnish Expertise

Unlocking the full potential of health data under the EU AI Act and EHDS with high-quality anonymization