One in five UK doctors use a generative artificial intelligence (GenAI) tool, such as OpenAI’s ChatGPT or Google’s Gemini, to assist with clinical practice. This is evident from a recent survey among approximately 1,000 general practitioners.
Physicians reported using GenAI to generate post-appointment documentation, help make clinical decisions, and provide information to patients such as understandable discharge summaries and treatment plans.
Given the hype surrounding artificial intelligence combined with the challenges healthcare systems face, it is no surprise that physicians and policymakers alike see AI as the key to modernizing and transforming our healthcare services.
But GenAI is a recent innovation that fundamentally challenges the way we think about patient safety. There is still much we need to know about GenAI before it can be used safely in daily clinical practice.
The problems with GenAI
Traditionally, AI applications are developed to perform a very specific task. For example, deep learning neural networks have been used for classification in imaging and diagnostics. Such systems appear to be effective in analyzing mammograms to support breast cancer screening.
But GenAI isn’t trained to perform a narrowly defined task. These technologies are based on so-called foundation models, which have generic options. This means they can generate text, pixels, audio or even a combination of these.
These capabilities are then tailored to different applications, such as answering user questions, producing code, or creating images. The possibilities for interacting with this kind of AI seem to be limited only by the user’s imagination.
Crucially, because the technology was not developed for use in a specific context or for a specific purpose, we don’t actually know how doctors can use it safely. This is just one reason why GenAI is not yet suitable for widespread use in healthcare.
Another problem with the use of GenAI in healthcare is the well-documented phenomenon of ‘hallucinations’. Hallucinations are nonsensical or untruthful statements based on the input given.
Hallucinations have been investigated in the context of having GenAI create summaries of text. One study found that several GenAI tools produced results that made incorrect links based on what was said in the text, or that summaries contained information that was not even referenced in the text.
Hallucinations occur because GenAI works based on probability – such as predicting which word will follow in a given context – rather than being based on ‘understanding’ in the human sense. This means that results produced by GenAI are plausible rather than necessarily truthful.
This plausibility is another reason why it is too early to safely use GenAI in routine medical practice.
Imagine a GenAI tool that listens in to a patient’s consultation and then produces an electronic summary note. On the one hand, this gives the GP or nurse the freedom to deal better with their patient. But on the other hand, the GenAI could potentially make notes based on what it deems plausible.
For example, the GenAI summary may change the frequency or severity of the patient’s symptoms, add symptoms that the patient never complained about, or include information that the patient or doctor never mentioned.
Doctors and nurses would have to proofread all AI-generated notes with wide eyes and have an excellent memory to distinguish the factual information from the plausible – but made up – information.
This can be fine in a traditional GP practice, where the GP knows the patient well enough to identify inaccuracies. But in our fragmented healthcare system, where patients are often seen by different healthcare providers, any inaccuracies in patients’ notes can pose significant risks to their health, including delays, improper treatment and misdiagnoses.
The risks associated with hallucinations are significant. But it is worth noting that researchers and developers are currently working on reducing the likelihood of hallucinations.
Patient safety
Another reason why it’s too early to adopt GenAI in healthcare is because patient safety depends on interactions with the AI to determine how well it works in a given context and setting – looking at how the technology works with people, how it fits in with rules and pressures and the culture and priorities within a larger healthcare system. Such a systems perspective would determine whether the use of GenAI is safe.
But because GenAI isn’t designed for a specific use, that means it’s adaptable and can be used in ways we can’t fully predict. Furthermore, developers regularly update their technology and add new generic capabilities that change the behavior of the GenAI application.
Furthermore, harm can occur even if the technology appears to work safely and as intended – again depending on the context of use.
For example, the introduction of GenAI conversational agents for triage could influence the willingness of different patients to engage with the healthcare system. Patients with lower digital literacy, people whose first language is not English, and non-verbal patients may find GenAI difficult to use. So while the technology might ‘work’ in principle, it could still contribute to harm if the technology didn’t work equally for all users.
The point here is that with GenAI, such risks are much more difficult to anticipate in advance through traditional security analysis approaches. These are concerned with understanding how a failure in technology can cause harm in specific contexts. Healthcare could benefit immensely from the adoption of GenAI and other AI tools.
But before these technologies can be used more broadly in healthcare, safety assurance and regulation will need to better respond to developments in where and how these technologies are used.
It is also imperative that GenAI tool developers and regulators work with the communities that use these technologies to develop tools that can be used regularly and safely in clinical practice.
Mark Sujan, Professor of Safety Sciences, University of York
This article is republished from The Conversation under a Creative Commons license. Read the original article.