Large language models encode clinical knowledge
Karan Singhal, Shekoofeh Azizi, Tao Tu et al.
2023 · Nature · 3,070 citations
Abstract Large language models (LLMs) have demonstrated impressive capabilities, but the bar for clinical applications is high. Attempts to assess the clinical knowledge of models typically rely on automated evaluations based on limited benchmarks. Here, to address these limitations, we present MultiMedQA, a benchmark combining six existing medical question answering datasets spanning professional medicine, research and consumer queries and a new dataset of medical questions searched online, HealthSearchQA. We propose a human evaluation framework for model answers along multiple axes includin…
Explore this paper's citation graph on Constellation.