There is abundant evidence that doctors, nurses, and other medical personnel are overworked. Two different studies found that 77% of physicians and 48% of nurses reported feeling burned out at work.
One reason is that the demand for healthcare services is increasing, but the supply of medical personnel is not keeping up. Another reason is that hospitals and other healthcare organizations are under pressure to cut costs, which can lead to staff shortages. Finally, the COVID-19 pandemic has put enormous extra burden on healthcare workers, with many experiencing burnout and mental health issues.
I have written before about how one chatbot, ChatGPT, is able to provide medical expertise despite being trained on general knowledge datasets (although medical topics are of course included). In a recent study, ChatGPT was able to answer ~50% correct on the national medical board exams (USMLE) used to certify doctors; a passing score is roughly 60%. It is plausible that this medical expertise along with its chatting ability could be leveraged to perform tasks relating to medical communication to patients that may not require highly specialized medical knowledge.
A research group from UCSD asked whether ChatGPT could adequately reply to medical questions from patients compared to human doctors. In the study, the primary data were from the social media site Reddit, and its forum (subreddit) called r/AskDocs in which people ask medical questions and sometimes receive a response from physicians on the site. The researchers gathered a number of such exchanges from October 2022 in which a verified physician responded to a public question, and 195 were randomly chosen. The questions were then posed to ChatGPT who provided a response that could be compared to the physician reply.
Both AI and human answers were evaluated by a group of medical experts. For each question/response pair, the evaluators chose "which response was better," and judged both “the quality of information provided” and “the empathy or bedside manner provided” for all responses on a scale from 1 to 5 thus measuring quality (accuracy of information) and empathy.
Over the 195 questions/responses pairs, evaluators preferred the ChatGPT responses to physician responses 78.6% of the time. Chatbot responses were also rated significantly higher in quality and more empathetic than physician responses (Figure 1). More specifically, "[t]he proportion of responses rated as good or very good quality (≥ 4), for instance, was higher for chatbot than physicians (chatbot: 78.5%, 95% CI, 72.3%-84.1%; physicians: 22.1%, 95% CI, 16.4%-28.2%;).," and "[t]he proportion of responses rated empathetic or very empathetic (≥4) was higher for chatbot than for physicians (chatbot: 45.1%, 95% CI, 38.5%-51.8%; physicians: 4.6%, 95% CI, 2.1%-7.7%)." In other words, ChatGPT generated an empathetic response 45.1% of the time compared to only 4.6% of the human physician answers, a 10-fold difference.
Important caveats are that this is a small preliminary study that is not necessarily representative of real-world situations. The researchers suggest "[f]urther exploration of this technology is warranted in clinical settings, such as using chatbot to draft responses that physicians could then edit. Randomized trials could assess further if using AI assistants might improve responses, lower clinician burnout, and improve patient outcomes." The idea of using a chatbot as an assistant to draft responses for doctors is not far-fetched, and indeed has been recently implemented.
Last month, The Wall Street Journal reported on artificial intelligence (AI) being used to draft patient messages for doctors in a pilot program in California and Wisconsin. The goal of the program is to see if AI can help doctors save time and improve the quality of care they provide. The pilot program involves UC San Diego Health, UW Health and Stanford Healthcare, and hopes to test whether the technology can help cut the amount of time that medical staff spend replying to patients’ online queries, ultimately easing staff burnout.
When a patient sends a message to their doctor, the message is first sent to an AI chatbot which reads the message and then drafts a response. The response is then sent to the doctor for review. The doctor can edit or send the response as-is.
Early anecdotes from the program look promising and are consistent with the findings from the paper described above. The chatbot is able to draft responses that are useful to the doctors. Importantly, the chatbot consults information in the patient’s message, as well as an abbreviated version of their electronic medical history. The AI drafts reference the patient’s medical record when it proposes a response, mentioning their existing medication or the last time they were seen by their physician.
Of course numerous challenges remain. Most importantly, the accuracy of the medical information provided by the chatbot must be as close to perfect as possible so that the doctor doesn't inadvertently pass on potentially harmful recommendations. Second, the chatbot should be robust enough to respond to all types of questions even those that are poorly articulated. Finally, for widespread adoption the doctors need to become comfortable with an AI drafting their responses. For now, some chatbot responses need to be heavily edited by the doctor, and the hospitals have restricted the types of questions that the chatbot can respond to such as avoiding those that seek medical advice.
In conclusion, the program is still in its early stages, but it has the potential to take the first step in a long process that will change the way that doctors interact with patients. Over time one can expect the chatbot to minimize mistakes and gain the trust of both doctor and patient, resulting in further time savings for medical personnel.
Figure 1. Comparing responses from human physicians versus ChatGPT to patient questions on a Reddit forum. Quality (left) and empathy (right) of responses were assessed on a 5 point scale. On both criteria, ChatGPT exceeded the physicians (Ayers et al. JAMA Internal Medicine, 2023).
No comments:
Post a Comment