LLMs and Stargardt's Disease
Abstract
Gloria Wu, Lochan Bojja, Aadjot Sidhu, Obaid Khan, Hrishi Paliath-Pathiyal, Bojia Liu, Margaret C Wang and Ivan Chim
Background/Objective: Stargardt's disease is the most common form of inherited juvenile macular degeneration affecting 1 in 8,000-10,000 individuals worldwide, with a slight predominance towards females. As large language models (LLMs) increasingly serve as sources of health information, understanding their effectiveness in providing accurate information about rare genetic conditions becomes essential. This study aims to evaluate and compare four major LLMs (ChatGPT, Gemini, Claude, and Character.ai) regarding Stargardt's disease information delivery across different genders.
Methods: Four LLMs were queried using standardized prompts simulating a 14-year-old patient (male/female) newly diagnosed with Stargardt's disease. Responses were analyzed for word count, readability (Flesch-Kincaid Grade Level), response time, and content similarity using cosine analysis.
Results: Significant variations existed across LLMs. Word counts ranged from 53 to 769 words, with Gemini producing the most comprehensive responses (female: 769 words, male: 708 words) and Character.ai the most concise (female: 74 words, male: 53 words). Flesch-Kincaid scores indicated a readability level suitable for high school to college (5.4- 10.8). Response times varied from 5.5 to 13.8 seconds. Cosine similarity scores showed moderate concordance (58.5- 78.3%) between model pairs. All LLMs recommended physician consultation and genetic testing, but varied significantly in the provision of emotional support and comprehensive information.
Conclusions: While all LLMs provided appropriate referral recommendations, substantial disparities exist in the depth of content, readability, and information delivery. No LLM consistently addressed the full spectrum of Stargardt's disease management, including specialist referrals, genetic counseling, and available therapies. These findings underscore the importance of physician oversight and standardization in AI-generated healthcare information to ensure the accuracy of care delivery.

