Methods: This was a comparative, observational, cross-sectional study was conducted using a dataset of 200 primary care-related questions generated through literature review and expert validation. Three experienced family physicians and ChatGPT-4o independently provided responses. The responses were anonymized and randomly assessed by three independent family medicine experts. Evaluations were based on Likert scales for appropriateness (1-6), accuracy (1-6), comprehensiveness (1-3), and empathy (1-5). Word counts were also recorded.
Results: ChatGPT-4o outperformed family physicians across all evaluation metrics (p < 0.01). ChatGPT-4o received higher scores for appropriateness (mean 5.8 ± 0.5 vs. 4.3 ± 1.0), accuracy (5.8 ± 0.5 vs. 4.5 ± 1.1), comprehensiveness (2.4 ± 0.6 vs. 1.4 ± 0.7). and empathy (4.8 ± 0.4 vs. 4.0 ± 0.8). The average word count of ChatGPT's responses (298.8 ± 82.3 words) was significantly longer than that of physicians (106.1 ± 95.0 words). In topic-specific analysis, ChatGPT-4o outperformed physicians, except in General Consultation and Child Infections (p = 0.07, 0.08 respectively).
Conclusion: The findings suggest that ChatGPT-4o has the potential to enhance patient education, medical training, and clinical decision support. Future research should explore AI's real-world clinical impact, its role in improving medical education, and strategies to refine AI-generated responses for conciseness and cultural relevance.