Assessment of ChatGPT's adherence to ETA-thyroid nodule management guideline over two different time intervals 14 days apart: in binary and multiple-choice queries

Muzaffer Serdar Deniz, Bagdagul Yuksel Guler
Endocrine . 2024 Mar 15. doi: 10.1007/s12020-024-03750-2. Online ahead of print.
Abstract
Objective: Artificial intelligence (AI) has significant potential in healthcare, particularly in providing decision-support in specialized domains like thyroid nodule management. This study assesses the effectiveness of ChatGPT-v4, an advanced AI model, in aligning with the European Thyroid Association (ETA) - 2023 guidelines.
Methods: The study utilized a structured questionnaire comprising 100 questions, divided into true/false and multiple-choice formats, reflecting real-world clinical scenarios in thyroid nodule management. These questions encompassed diagnostic criteria, treatment options, follow-up protocols, and patient counseling. ChatGPT response was evaluated for accuracy, consistency, and comprehensiveness using a six-point Likert scale. The assessment occurred initially and was repeated after 14 days.
Results: In the binary queries, the AI model showed an ability to correct some initially incorrect responses. However, there was a noticeable regression in certain responses. 8 of the 11 previously non-compliant responses remained unchanged, while 3 non-compliant responses were rectified. Conversely, 6 initially compliant answers transitioned to non-compliance after 14 days. In multiple-choice queries, the AI's performance was more consistent. A majority of the responses, 43 (86% of the total), were initially correct and maintained their correctness upon re-assessment. However, 4 responses that were initially incorrect remained unchanged, and 3 correct responses shifted to non-compliance over time.
Conclusion: ChatGPT exhibited improving potential as a clinical support tool in thyroid nodule management altgouh it showed varied performance for binary and multiple-choice questions.

Assessment of ChatGPT's adherence to ETA-thyroid nodule management guideline over two different time intervals 14 days apart: in binary and multiple-choice queries

Abstract