ChatGPT as an effective tool for quality evaluation of radiomics research

Ismail Mese, Burak Kocak
Eur Radiol . 2024 Oct 15. doi: 10.1007/s00330-024-11122-7. Online ahead of print.
Abstract
Objectives: This study aimed to evaluate the effectiveness of ChatGPT-4o in assessing the methodological quality of radiomics research using the radiomics quality score (RQS) compared to human experts.
Methods: Published in European Radiology, European Radiology Experimental, and Insights into Imaging between 2023 and 2024, open-access and peer-reviewed radiomics research articles with creative commons attribution license (CC-BY) were included in this study. Pre-prints from MedRxiv were also included to evaluate potential peer-review bias. Using the RQS, each study was independently assessed twice by ChatGPT-4o and by two radiologists with consensus.
Results: In total, 52 open-access and peer-reviewed articles were included in this study. Both ChatGPT-4o evaluation (average of two readings) and human experts had a median RQS of 14.5 (40.3% percentage score) (p > 0.05). Pairwise comparisons revealed no statistically significant difference between the readings of ChatGPT and human experts (corrected p > 0.05). The intraclass correlation coefficient for intra-rater reliability of ChatGPT-4o was 0.905 (95% CI: 0.840-0.944), and those for inter-rater reliability with human experts for each evaluation of ChatGPT-4o were 0.859 (95% CI: 0.756-0.919) and 0.914 (95% CI: 0.855-0.949), corresponding to good to excellent reliability for all. The evaluation by ChatGPT-4o took less time (2.9-3.5 min per article) compared to human experts (13.9 min per article by one reader). Item-wise reliability analysis showed ChatGPT-4o maintained consistently high reliability across almost all RQS items.
Conclusion: ChatGPT-4o provides reliable and efficient assessments of radiomics research quality. Its evaluations closely align with those of human experts and reduce evaluation time.

ChatGPT as an effective tool for quality evaluation of radiomics research

Abstract