Large language models encode clinical knowledge

Karan Singhal, Shekoofeh Azizi, Tao Tu et al.

2023 · Nature · 3,340 citations

Abstract Large language models (LLMs) have demonstrated impressive capabilities, but the bar for clinical applications is high. Attempts to assess the clinical knowledge of models typically rely on automated evaluations based on limited benchmarks. Here, to address these limitations, we present MultiMedQA, a benchmark combining six existing medical question answering datasets spanning professional medicine, research and consumer queries and a new dataset of medical questions searched online, HealthSearchQA. We propose a human evaluation framework for model answers along multiple axes includin…

Read the paper →

Explore this paper's citation graph on Constellation.