Amey Vrudhula, B.S., J. Weston Hughes, B.S., Neal Yuan, M.D., and David Ouyang, M.D.
Many medical diagnoses can be formulated either as classification tasks that predict whether output variables fall into a particular diagnostic group or as regression tasks that predict the underlying measurements of disease severity. AI models trained on classification tasks and those trained on regression tasks may be equally valid in describing a clinical condition, but their performances can be significantly different. To illustrate such differences, we used electrocardiogram (ECG) data from an academic medical center to train two deep-learning AI ECG models to identify individuals with severe cardiomyopathy. The classification and regression models achieved similar results for area under the curve across three sites. However, the regression model achieved significantly better positive predictive values at each site. Differences in problem formulation can result in significant differences in model performance; therefore, careful thought should be given to the design and training of AI algorithms.