• AI in primary care: Comparing ChatGPT and family physicians on patient queries

    Muhammed İnan, Özlem Suvak, Cenk Aypak
    Int J Med Inform. 2025 Nov:203:106047. doi: 10.1016/j.ijmedinf.2025.106047. Epub 2025 Jul 12.

    Abstract

    Objective: The integration of artificial intelligence (AI) in medicine has led to growing interest in its applications for primary care. This study evaluates and compares the responses of ChatGPT-4o and family physicians to 200 commonly asked clinical questions in family medicine. 

    Methods: This was a comparative, observational, cross-sectional study was conducted using a dataset of 200 primary care-related questions generated through literature review and expert validation. Three experienced family physicians and ChatGPT-4o independently provided responses. The responses were anonymized and randomly assessed by three independent family medicine experts. Evaluations were based on Likert scales for appropriateness (1-6), accuracy (1-6), comprehensiveness (1-3), and empathy (1-5). Word counts were also recorded. 

    Results: ChatGPT-4o outperformed family physicians across all evaluation metrics (p < 0.01). ChatGPT-4o received higher scores for appropriateness (mean 5.8 ± 0.5 vs. 4.3 ± 1.0), accuracy (5.8 ± 0.5 vs. 4.5 ± 1.1), comprehensiveness (2.4 ± 0.6 vs. 1.4 ± 0.7). and empathy (4.8 ± 0.4 vs. 4.0 ± 0.8). The average word count of ChatGPT's responses (298.8 ± 82.3 words) was significantly longer than that of physicians (106.1 ± 95.0 words). In topic-specific analysis, ChatGPT-4o outperformed physicians, except in General Consultation and Child Infections (p = 0.07, 0.08 respectively). 

    Conclusion: The findings suggest that ChatGPT-4o has the potential to enhance patient education, medical training, and clinical decision support. Future research should explore AI's real-world clinical impact, its role in improving medical education, and strategies to refine AI-generated responses for conciseness and cultural relevance.