Study Warns ChatGPT Health Tool Misses Emergency Cases

ChatGPT Health, OpenAI’s dedicated health feature, is facing sharp criticism after an independent study found it frequently failed to recognise medical emergencies and often gave advice that could delay critical care.

Researchers reported that in more than half of urgent cases, the system did not advise an immediate hospital visit, prompting experts to warn the tool could “feasibly lead to unnecessary harm and death”.

The evaluation, published in the February edition of the journal Nature Medicine, is described as the first independent safety assessment of ChatGPT Health. OpenAI launched the feature to limited users in January, promoting it as a way to “securely connect medical records and wellness apps” to generate health advice. According to the company, more than 40 million people ask ChatGPT for health-related information every day.

Lead author Dr Ashwin Ramaswamy and colleagues built 60 realistic patient scenarios that ranged from mild conditions to true emergencies. Three independent physicians reviewed each scenario and agreed, using clinical guidelines, on the level of care needed.

The team then queried ChatGPT Health with each scenario under different conditions such as changing the patient’s gender, adding lab results, or including comments from family members generating nearly 1,000 AI responses. These were compared with the physicians’ recommended actions.

The system performed well in straightforward, clearly defined emergencies such as strokes or severe allergic reactions. But it struggled in more ambiguous or complex cases. In one asthma scenario, ChatGPT Health correctly identified early warning signs of respiratory failure, yet still advised the patient to wait rather than seek emergency treatment.

Overall, in 51.6% of cases where doctors agreed a person needed to go to hospital immediately, ChatGPT Health instead suggested staying home or booking a routine appointment. Alex Ruani, a doctoral researcher in health misinformation mitigation at University College London who was not involved in the study, called that finding “unbelievably dangerous”.

“If you’re experiencing respiratory failure or diabetic ketoacidosis, you have a 50/50 chance of this AI telling you it’s not a big deal,” Ruani said, warning that the reassurance offered by such systems could be deadly if it delays urgent care. In one simulation, she noted, eight times out of 10 — 84% of runs the system sent a suffocating woman to a future appointment “she would not live to see”.

The study also found the model frequently over-reacted in the opposite direction: 64.8% of people described in the scenarios as completely safe were told to seek immediate medical care. Ruani argued that beyond mis-triaging individuals, such behaviour could drive unnecessary pressure on health services.

The researchers observed that ChatGPT Health’s recommendations were highly sensitive to contextual details. The platform was nearly 12 times more likely to minimise symptoms when the simulated patient mentioned that a “friend” believed the issue was not serious. That kind of susceptibility to offhand comments is a concern for those studying AI safety.

“It is why many of us studying these systems are focused on urgently developing clear safety standards and independent auditing mechanisms to reduce preventable harm,” Ruani said.

Ramaswamy, a urology instructor at the Icahn School of Medicine at Mount Sinai in the US, highlighted particular concern around how the system handled suicidal ideation. In one test scenario, a 27-year-old patient reported thoughts of taking “a lot of pills”. When that information was presented alone, ChatGPT Health consistently displayed a crisis intervention banner linking to suicide support services.

But when the researchers added normal lab results with the patient’s words and severity unchanged the behaviour shifted. In 16 attempts under those conditions, the suicide crisis banner did not appear at all. “A crisis guardrail that depends on whether you mentioned your labs is not ready,” Ramaswamy said, adding that such inconsistent protections could be “arguably more dangerous than having no guardrail at all, because no one can predict when it will fail.”

Beyond immediate safety, experts see broader systemic and legal implications. Paul Henman, a digital sociologist and policy specialist at the University of Queensland, described the study as “a really important paper”.

He warned that if ChatGPT Health were widely used in homes, it could both increase unnecessary visits for low-level issues and fail to send people to urgent care when needed, potentially leading to preventable harm and deaths. Henman also pointed to emerging legal risks, noting that cases are already being brought against technology companies in relation to suicide and self-harm after interactions with AI chatbots.

For Henman, a key problem is opacity. “It is not clear what OpenAI is seeking to achieve by creating this product, how it was trained, what guardrails it has introduced and what warnings it provides to users,” he said. “Because we don’t know how ChatGPT Health was trained and what the context it was using, we don’t really know what is embedded into its models.”

An OpenAI spokesperson said the company welcomes independent research evaluating AI systems in healthcare, but argued that the study does not fully reflect how people use ChatGPT Health in practice. The spokesperson added that the model is continually updated and refined.

Ruani countered that even though the research relied on simulated but realistic scenarios, “a plausible risk of harm is enough to justify stronger safeguards and independent oversight”.

With tens of millions of people reportedly turning to ChatGPT for health questions each day, the findings sharpen a debate over how AI tools should be positioned in medicine: as information aids with strict limits, or as more active participants in triage and decision-making. For the researchers behind this study, the answer is clear for now; any system that can tell a suffocating patient to wait days for an appointment needs far stronger checks before being relied on in real-world care.

Tags: AI ChatGPT ChatGPT Health

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Study Finds ChatGPT Health Often Misses Emergencies, Fuels Safety Concerns

Paul Balo

BROWSE BY CATEGORIES

Receive top tech news directly in your inbox

Freshly Squeezed

Browse Archives

Quick Links