inner-banner-bg

Advances in Neurology and Neuroscience(AN)

ISSN: 2690-909X | DOI: 10.33140/AN

Impact Factor: 1.12

Brandon L. Staple

University of Nebraska Medical Center, Omaha, NE, United States

Publications
  • Review Article   
    Beyond General Purpose Llms: Comparative Performance of A Rag-Enhanced Surgical Subspecialty Model on Board Examination
    Author(s): Brandon L. Staple*, Elijah M. Staple and Cynthia Wallace

    This study evaluates the performance of domain-specific Large Language Models (dLLMs) versus standard Large Language Models (sLLMs) in neurosurgical knowledge assessment, emphasizing the importance of evaluating not merely the factual accuracy of model outputs but also model hallucination mechanisms and the quality of their underlying reasoning processes when considering potential healthcare applications. We compared AtlasGPT, a neurosurgery-focused dLLM utilizing Retrieval-Augmented Generation (RAG), against four sLLMs (GPT-3.5, Gemini, Claude 3.5 Sonnet, and Mistral) using 150 text-only neurosurgical board-style multiple-choice questions. AtlasGPT demonstrated superior accuracy (96.7%) compared to Claude (94.7%), Gemini (92.0%), Mistral (88.7%), and GPT-3.5 (74.7%). An analysis of variance analysis confirmed statistically significant differences between models (F(4,745) = 1127.5, p .. Read More»

    Abstract HTML PDF