Loading…

Beyond the Stethoscope: Operationalizing Interactive Clinical Reasoning in Large Language Models via Proactive Information Seeking

Assistant agents powered by large language models (LLMs) are adept at answering questions, but they often struggle with asking questions to gather relevant information when required. In high-stakes domains such as medical reasoning, it is crucial to acquire all necessary information by proactively a...

Full description

Saved in:

Bibliographic Details
Main Authors:	Stella Li, Shuyue, Balachandran, Vidhisha, Feng, Shangbin, Pierson, Emma, Koh, Pang Wei, Tsvetkov, Yulia
Format:	Conference Proceeding
Language:	English
Subjects:	Accuracy Cognition Interactive Information Seeking Large language models Medical Reasoning Medical services Oral communication Self-reflective Decision Making Stethoscope
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Assistant agents powered by large language models (LLMs) are adept at answering questions, but they often struggle with asking questions to gather relevant information when required. In high-stakes domains such as medical reasoning, it is crucial to acquire all necessary information by proactively asking information-seeking questions before making a decision. In this paper, we introduce DYNAMED-a framework to simulate realistic, dynamic medical consultations between a Patient System and an Expert System. DYNAMED evaluates LLMs' ability to gather further details about patients' symptoms by asking information-seeking questions. We show that state-ofthe- art LLMs (LlaMA-2-Chat 7B & 13B, GPT-3.5 and GPT- 4) struggle at information-seeking-when given limited initial information and prompted to ask questions if needed, their accuracy drops by 10.1% on average compared to starting with the same limited information and asking no questions. Second, we attempt to improve the information-seeking capabilities of the expert model through self-reflection and medical expertise augmentation, and introduce a synthetic medical conversation dataset, MEDQACHAT. Our best model improves the accuracy of the GPT-3.5 Expert System by 19.0%; however, it still lags by 10.0% compared to when it is given full information upfront. While there is still a lot of work to be done to extend the information-seeking abilities of LLM assistants in critical domains, we aim to provide a flexible framework to facilitate further explorations in this direction.
ISSN:	2575-2634
DOI:	10.1109/ICHI61247.2024.00090