Sebastiaan Hermans, Zixuan Hu, Robyn L Ball, Hui Ming Lin, Luciano M Prevedello, Ferco H Berger, Ibrahim Yusuf, Jeffrey D Rudie, Maryam Vazirabad, Adam E Flanders, George Shih, John Mongan, Savvas Nicolaou, Brett S Marinelli, Melissa A Davis, Kirti Magudia, Ervin Sejdić, Errol Colak
Radiol Artif Intell . 2024 Nov 6:e240334. doi: 10.1148/ryai.240334. Online ahead of print.
Purpose To evaluate the performance of the winning machine learning (ML) models from the 2023 RSNA Abdominal Trauma Detection Artificial Intelligence Challenge. Materials and Methods The competition was hosted on Kaggle and took place between July 26, 2023, to October 15, 2023. The multicenter competition dataset consisted of 4,274 abdominal trauma CT scans in which solid organs (liver, spleen and kidneys) were annotated as healthy, low-grade or high-grade injury. Studies were labeled as positive or negative for the presence of bowel/mesenteric injury and active extravasation. In this study, performances of the 8 award-winning models were retrospectively assessed and compared using various metrics, including the area under the receiver operating characteristic curve (AUC), for each injury category. The reported mean values of these metrics were calculated by averaging the performance across all models for each specified injury type. Results The models exhibited strong performance in detecting solid organ injuries, particularly high-grade injuries. For binary detection of injuries, the models demonstrated mean AUC values of 0.92 (range:0.91-0.94) for liver, 0.91 (range:0.87-0.93) for splenic, and 0.94 (range:0.93-0.95) for kidney injuries. The models achieved mean AUC values of 0.98 (range:0.96-0.98) for high-grade liver, 0.98 (range:0.97-0.99) for high-grade splenic, and 0.98 (range:0.97-0.98) for high-grade kidney injuries. For the detection of bowel/mesenteric injuries and active extravasation, the models demonstrated mean AUC values of 0.85 (range:0.74-0.73) and 0.85 (range:0.79-0.89) respectively. Conclusion The award-winning models from the AI challenge demonstrated strong performance in the detection of traumatic abdominal injuries on CT scans, particularly high-grade injuries. These models may serve as a performance baseline for future investigations and algorithms.