Study on 'fragile' AI predictive models provides 'cautionary tale' about use in medicine

A doctor touches a futuristic chart. - Copyright Canva

Published on 19/01/2024 - 07:00•Updated 01/02/2024 - 01:44

A new study showing that machine learning models are study-specific and difficult to generalise provides a "cautionary tale" about using AI in medicine, experts say.

There is hope that artificial intelligence (AI) has the potential to improve medical treatment by predicting patient outcomes, yet a new study warns that AI-powered models may be limited.

Researchers at the University of Cologne in Germany and Yale University in the US analysed a machine learning model to see how well it predicted schizophrenia patients’ responses to antipsychotic medications.

This type of prediction could be very useful in medicine, and in particular in psychiatry, as patients respond differently to treatment.

“Some people respond very well to medication and some do not, and this makes it sometimes difficult to relieve people of their symptoms as quickly as possible,” Joseph Kambeitz, a professor of biological psychiatry at the University of Cologne and co-author of the study, told Euronews Next.

“Often medical doctors and healthcare professionals don’t have a good way to predict which patient will respond well to which medication,” he said.

The result of this problem is that it can take a long time to find the best course of treatment, which some hope AI models will change.

But the new findings published in the journal Science found that while AI statistical models were highly accurate when trained and used on a specific trial’s dataset, they were not able to be generalised to other studies.

This suggests that machine learning “predictive models are fragile and that excellent performance in one clinical context is not a strong indicator of performance on future patients,” the authors write.

Dessislava Pachamanova, a professor at Babson College in the US who studies predictive analytics and machine learning, said the study points out several “important limitations” for using these models for patient treatment and “provides a cautionary tale about the application of AI in medicine more generally”.

“One of the core problems is the nature of medical data. To perform well, AI models need massive amounts of high-quality data – yet, patient treatment data are expensive to collect,” Pachamanova, who was not involved in the study, added.

“Patients often come in and out of the system, receive treatments at different institutions, and sometimes provide faulty data on how they follow physician recommendations. Their complete journeys are difficult to track and link to specific outcomes,” she said.

How did the researchers test the models?

In one test, the researchers trained and tested the AI models on the same data while in another test they used “cross-validation” where they split the data into subsets - training an AI model on one subset and testing it on another multiple times.

Already, when they split the data into different sets, they found that the model performed “quite okay but already” worse than without breaking up the data.

They also tested the models more rigorously, training them on one study’s data and testing them on another study’s data.

They found that the “AI models work well when you use them within a given study, but when you take them out of that study context, they perform very poorly,” Kambeitz said.

What does this mean for the future use of AI to predict health outcomes?

Pachamanova said there is a need for more research on how to improve data collection and the reliability of predictive models for medical treatment.

“In the long term, AI is going to be essential for generating the next level of medical advancement. However, realising that potential will require an industry-wide shift in the way medical data are acquired, processed, stored, and analysed by advanced models,” she added.

Kambeitz, meanwhile, predicts that these predictive models, especially in psychiatry, will not be used clinically for a long time.

In areas of medicine where data is more structured with less reliance on patient questionnaires, the clinical application of AI is perhaps not as far away, Kambeitz said, but in psychiatry, “it’s further away meaning more than ten years probably”.

But he does not doubt that “it eventually will have an impact”.

Comments