Large language models (LLMs) demonstrated higher error rates

Large language models (LLMs) demonstrated higher error rates compared to humans in a clinical oncology question bank

1. A comparative evaluation tested five publicly available LLMs on 2044 oncology questions, covering comprehensive topics in the field. The responses were compared to a human benchmark. 2. Only one of the five models tested performed above the 50th percentile, with worse performance observed in clinical oncology subcategories and female-predominant malignancies. Evidence Rating Level: 2

Related Keywords

, Pearson , American College Of Radiology , Rating Level , American College ,