Methodologic Guide for Evaluating Clinical Performance and Effect of Artificial Intelligence Technology for Medical Diagnosis and Prediction.
Radiology. 2018 Mar;286(3):800-809. doi: 10.1148/radiol.2017171920. Epub 2018 Jan 8. Park SH1, Han K1.
The use of artificial intelligence in medicine is currently an issue of great interest, especially with regard to the diagnostic or predictive analysis of medical images. Adoption of an artificial intelligence tool in clinical practice requires careful confirmation of its clinical utility. Herein, the authors explain key methodology points involved in a clinical evaluation of artificial intelligence technology for use in medicine, especially high-dimensional or overparameterized diagnostic or predictive models in which artificial deep neural networks are used, mainly from the standpoints of clinical epidemiology and biostatistics. First, statistical methods for assessing the discrimination and calibration performances of a diagnostic or predictive model are summarized. Next, the effects of disease manifestation spectrum and disease prevalence on the performance results are explained, followed by a discussion of the difference between evaluating the performance with use of internal and external datasets, the importance of using an adequate external dataset obtained from a well-defined clinical cohort to avoid overestimating the clinical performance as a result of overfitting in high-dimensional or overparameterized classification model and spectrum bias, and the essentials for achieving a more robust clinical evaluation. Finally, the authors review the role of clinical trials and observational outcome studies for ultimate clinical verification of diagnostic or predictive artificial intelligence tools through patient outcomes, beyond performance metrics, and how to design such studies. © RSNA, 2018.