Thomas Z Li, Kaiwen Xu, Aravind Krishnan, Riqiang Gao, Michael N Kammer, Sanja Antic, David Xiao, Michael Knight, Yency Martinez, Rafael Paez, Robert J Lentz, Stephen Deppen, Eric L Grogan, Thomas A Lasko, Kim L Sandler, Fabien Maldonado, Bennett A Landman
Radiol Artif Intell . 2025 Feb 5:e230506. doi: 10.1148/ryai.230506. Online ahead of print.
Purpose To evaluate the performance of eight lung cancer prediction models on patient cohorts with screening-detected, incidentally-detected, and bronchoscopically-biopsied pulmonary nodules. Materials and Methods This study retrospectively evaluated promising predictive models for lung cancer prediction in three clinical settings: lung cancer screening with low-dose CT, incidentally, detected pulmonary nodules, and nodules deemed suspicious enough to warrant a biopsy. The area under the receiver operating characteristic curve (AUC) of eight validated models including logistic regressions on clinical variables and radiologist nodule characterizations, artificial intelligence (AI) on chest CTs, longitudinal imaging AI, and multimodal approaches for prediction of lung cancer risk was assessed in 9 cohorts (n = 898, 896, 882, 219, 364, 117, 131, 115, 373) from multiple institutions. Each model was implemented from their published literature, and each cohort was curated from primary data sources collected over periods within 2002 to 2021. Results No single predictive model emerged as the highest-performing model across all cohorts, but certain models performed better in specific clinical contexts. Single timepoint chest CT AI performed well for screening-detected nodules but did not generalize well to other clinical settings. Longitudinal imaging and multimodal models demonstrated comparatively good performance on incidentally-detected nodules. When applied to biopsied nodules, all models showed low performance. Conclusion Eight lung cancer prediction models failed to generalize well across clinical settings and sites outside of their training distributions.