Development and evaluation of machine learning models based on X-ray radiomics for the classification and differentiation of malignant and benign bone tumors
Claudio E von Schacky, Nikolas J Wilhelm, Valerie S Schäfer, Yannik Leonhardt, Matthias Jung, Pia M Jungmann, Maximilian F Russe, Sarah C Foreman, Felix G Gassert, Florian T Gassert, Benedikt J Schwaiger, Carolin Mogler, Carolin Knebel, Ruediger von Eisenhart-Rothe, Marcus R Makowski, Klaus Woertler, Rainer Burgkart, Alexandra S Gersing
Eur Radiol . 2022 Apr 9. doi: 10.1007/s00330-022-08764-w. Online ahead of print.
Objectives: To develop and validate machine learning models to distinguish between benign and malignant bone lesions and compare the performance to radiologists.
Methods: In 880 patients (age 33.1 ± 19.4 years, 395 women) diagnosed with malignant (n = 213, 24.2%) or benign (n = 667, 75.8%) primary bone tumors, preoperative radiographs were obtained, and the diagnosis was established using histopathology. Data was split 70%/15%/15% for training, validation, and internal testing. Additionally, 96 patients from another institution were obtained for external testing. Machine learning models were developed and validated using radiomic features and demographic information. The performance of each model was evaluated on the test sets for accuracy, area under the curve (AUC) from receiver operating characteristics, sensitivity, and specificity. For comparison, the external test set was evaluated by two radiology residents and two radiologists who specialized in musculoskeletal tumor imaging.
Results: The best machine learning model was based on an artificial neural network (ANN) combining both radiomic and demographic information achieving 80% and 75% accuracy at 75% and 90% sensitivity with 0.79 and 0.90 AUC on the internal and external test set, respectively. In comparison, the radiology residents achieved 71% and 65% accuracy at 61% and 35% sensitivity while the radiologists specialized in musculoskeletal tumor imaging achieved an 84% and 83% accuracy at 90% and 81% sensitivity, respectively.
Conclusions: An ANN combining radiomic features and demographic information showed the best performance in distinguishing between benign and malignant bone lesions. The model showed lower accuracy compared to specialized radiologists, while accuracy was higher or similar compared to residents.
Read Full Article Here: https://doi.org/10.1007/s00330-022-08764-w