EchoNext-Mini: A Dataset and Baseline AI Model for Detecting Structural Heart Disease from Electrocardiograms
John Weston Hughes, Ph.D., Linyuan Jing, Ph.D., Joshua Finer, M.S., Dustin Hartzel, B.S., Christopher Kelsey, B.S., Aaron Long, M.S., Daniel Rocha, M.A., Jeffrey Ruhl, M.S., Timothy Poterucha, M.D., and Pierre Elias, M.D. Abstract
Background:Structural heart disease (SHD) screening has traditionally been limited to echocardiography, limiting access and impacting patient outcomes. We recently published EchoNext, an artificial intelligence (AI) model for detecting structural heart disease from electrocardiograms (ECGs), creating new opportunities for screening.
Methods:In this work, we describe the EchoNext-Mini dataset and model. The EchoNext-Mini dataset is a deidentified subset of the EchoNext dataset consisting of 100,000 ECGs from NewYork-Presbyterian/Columbia University Irving Medical Center, along with corresponding echocardiogram-derived SHD diagnoses and demographic information. Using this dataset, we trained the EchoNext-Mini model, an AI model to detect structural heart disease from ECG waveforms with the same architecture as the original EchoNext model.
Results:The EchoNext-Mini dataset contains 100,000 ECGs from 36,286 patients along with 11 SHD diagnosis labels. Overall, 52% of ECGs are positive for any SHD, with the most common diagnoses being elevated left ventricular wall thickness (24%), heart failure with reduced ejection fraction (24%), and pulmonary hypertension (19%). The EchoNext-Mini model achieves an area under the receiver operating characteristic curve (AUROC) of 82.0 (80.9�83.1), 95% confidence interval (CI), a sensitivity of 70.1 (68.3�71.9), 95% confidence interval (CI), and a specificity of 77.9 (76.5�79.4) in detecting SHD from ECG waveforms, versus an AUROC of 85.2 achieved by the original EchoNext model. AUROCs in detecting subdiagnoses of structural heart disease range from 73.4 (71.9�75.1; left ventricular wall thickness ≥1.3 cm) to 86.6 (84.9�88.2; right ventricular systolic dysfunction).
Conclusions:We have made the EchoNext-Mini model and dataset publicly available to enable further research in AI understanding of ECGs. The size of the dataset and prevalence of varied labels enables a wide range of research. The EchoNext-Mini model achieves performance similar to the larger EchoNext model and offers a strong benchmark on the EchoNext-Mini dataset.