Chest: Ai (artificial Intelligence) in the Chest Imaging Pearls - Educational Tools | CT Scanning | CT Imaging

Chest: Ai (artificial Intelligence) in the Chest Imaging Pearls - Educational Tools | CT Scanning | CT Imaging | CT Scan Protocols - CTisus

Imaging Pearls ❯ Chest ❯ AI (Artificial Intelligence) in the Chest

View Pearls by Month:
-- OR --
View Pearls by Topic:
View Pearls by Subsection:

Objective: To evaluate the accuracy and quality of AI-generated chest radiograph interpretations in the emergency department setting.
Design, setting, and participants: This was a retrospective diagnostic study of 500 randomly sampled emergency department encounters at a tertiary care institution including chest radiographs interpreted by both a teleradiology service and on-site attending radiologist from January 2022 to January 2023. An AI interpretation was generated for each radiograph. The 3 radiograph interpretations were each rated in duplicate by 6 emergency department physicians using a 5-point Likert scale.
Main outcomes and measures: The primary outcome was any difference in Likert scores between radiologist, AI, and teleradiology reports, using a cumulative link mixed model. Secondary analyses compared the probability of each report type containing no clinically significant discrepancy with further stratification by finding presence, using a logistic mixed-effects model. Physician comments on discrepancies were recorded.
Generative Artificial Intelligence for Chest Radiograph Interpretation in the Emergency Department.
Huang J, Neill L, Wittbrodt M, et al..
JAMA Netw Open. 2023 Oct 2;6(10):e2336100.
Results: A total of 500 ED studies were included from 500 unique patients with a mean (SD) age of 53.3 (21.6) years; 282 patients (56.4%) were female. There was a significant association of report type with ratings, with post hoc tests revealing significantly greater scores for AI (mean [SE] score, 3.22 [0.34]; P < .001) and radiologist (mean [SE] score, 3.34 [0.34]; P < .001) reports compared with teleradiology (mean [SE] score, 2.74 [0.34]) reports. AI and radiologist reports were not significantly different. On secondary analysis, there was no difference in the probability of no clinically significant discrepancy between the 3 report types. Further stratification of reports by presence of cardiomegaly, pulmonary edema, pleural effusion, infiltrate, pneumothorax, and support devices also yielded no difference in the probability of containing no clinically significant discrepancy between the report types.
Conclusions and relevance: In a representative sample of emergency department chest radiographs, results suggest that the generative AI model produced reports of similar clinical accuracy and textual quality to radiologist reports while providing higher textual quality than teleradiologist reports. Implementation of the model in the clinical workflow could enable timely alerts to life-threatening pathology while aiding imaging interpretation and documentation.
Generative Artificial Intelligence for Chest Radiograph Interpretation in the Emergency Department.
Huang J, Neill L, Wittbrodt M, et al..
JAMA Netw Open. 2023 Oct 2;6(10):e2336100.
Key Points
Question How do emergency department physicians rate artificial intelligence (AI)–generated chest radiograph reports for quality and accuracy, compared with in-house radiologist and teleradiology reports?
Findings In this diagnostic study of the developed generative AI model on a representative sample of 500 emergency department chest radiographs from 500 unique patients, the AI model produced reports of similar clinical accuracy and textual quality to radiology reports while providing higher textual quality than teleradiology reports.
Meaning Results suggest that use of the generative AI tool may facilitate timely interpretation of chest radiography by emergency department physicians.
Generative Artificial Intelligence for Chest Radiograph Interpretation in the Emergency Department.
Huang J, Neill L, Wittbrodt M, et al..
JAMA Netw Open. 2023 Oct 2;6(10):e2336100.
“In this diagnostic study accounting for both clinical accuracy and textual quality, results suggest that our AI tool produced reports similar in performance to a radiologist and better than a teleradiology service in a representative sample of ED chest radiographs. AI report ratings were comparable with those of on-site radiologists across all evaluated pathology categories. Model integration in clinical workflows could enable timely alerts to life-threatening pathology while aiding physician imaging interpretation and speeding up documentation. Further efforts to prospectively evaluate clinical impact and generalizability are needed.”
Generative Artificial Intelligence for Chest Radiograph Interpretation in the Emergency Department.
Huang J, Neill L, Wittbrodt M, et al..
JAMA Netw Open. 2023 Oct 2;6(10):e2336100.
Huang et al. developed a multimodal generative AI model and evaluated its ability to produce full radiology reports for chest radiographs in the emergency department (ED) setting. An encoder-decoder model was trained on 900,000 chest radiographs and generated a report when given an input chest radiograph and its most recent prior radiograph. A retrospective analysis was performed on 500 unique ED encounters with chest radiographs interpreted by three reader categories: a teleradiology service (all with U.S. residency and board experience), 12 ED radiologists (mean, 14.6 ± 12.5 [SD] years of post residency clinical practice experience), and the AI model. Six ED physicians (10.6 ± 6.4 years of postresidency clinical practice experience) rated the AI, radiologist, and teleradiology reports in a blinded fashion using a 5-point Likert scale.
Beyond the AJR: Early Applications of Generative Artificial Intelligence for Radiology Report Interpretation.
Doo FX, Parekh VS.
AJR Am J Roentgenol. 2024 Aug;223(2):e2330696.
“The results indicated that the AI-generated reports and radiologist reports were of similar quality and accuracy (on a Likert scale, AI reports: 3.22 ± 0.34 [SD], radiologist reports: 3.34 ± 0.34, both of which scored significantly higher than teleradiology reports: 2.74 ± 0.34) (p < .001). There was no significant difference in the probability of reports containing clinically significant discrepancies among the three report types, even when stratified by specific findings. This study suggests that generative AI models can produce chest radiograph reports with clinical accuracy and textual quality comparable with those produced by radiologists, showing the potential of AI to enhance radiology services in EDs, especially in settings in which access to radiology services is limited.”
Beyond the AJR: Early Applications of Generative Artificial Intelligence for Radiology Report Interpretation.
Doo FX, Parekh VS.
AJR Am J Roentgenol. 2024 Aug;223(2):e2330696.
“This study represents a step toward clinical application of generative AI. A key strength is the evaluation of an AI model’s capability to generate full radiology reports, compared with prior narrow AI applications. However, one important factor affecting the generalizability of this Huang et al. study is that clinical quality was scored by referring ED physicians, rather than expert radiologists. A recent study on GPT-4 (OpenAI)-generated radiology report impressions has shown that radiologists overall graded AI impressions to be less coherent, less comprehensive, more factually inconsistent, and more medically harmful, whereas referring providers favored GPT-4 impressions for coherence and diminished harmfulness.”
Beyond the AJR: Early Applications of Generative Artificial Intelligence for Radiology Report Interpretation.
Doo FX, Parekh VS.
AJR Am J Roentgenol. 2024 Aug;223(2):e2330696.
“The debate about AI potentially replacing human radiologists has since shifted toward recognizing AI’s role as a supportive, rather than a substitutive, tool . Even when using AI as an adjunct, radiologists may not necessarily save time, especially if needing to study and correct errors . Although generative AI introduces additional ways that AI can augment radiologist workflow, this study also shows the need to further refine the technology to understand its limitations as an adjunct to human expertise and to apply it to more diverse care settings and patient populations . Also, we must carefully consider factors such as patient privacy, consistency of generated outputs, and the potential impact on clinical workflows .”
Beyond the AJR: Early Applications of Generative Artificial Intelligence for Radiology Report Interpretation.
Doo FX, Parekh VS.
AJR Am J Roentgenol. 2024 Aug;223(2):e2330696.
Takeaway Point
“A generative AI model for chest radiograph interpretations was comparable to radiologists and superior to teleradiology services in the ED setting, when judged by ED physicians, highlighting its potential as a supportive tool in emergency radiology.”
Beyond the AJR: Early Applications of Generative Artificial Intelligence for Radiology Report Interpretation.
Doo FX, Parekh VS.
AJR Am J Roentgenol. 2024 Aug;223(2):e2330696.

Background: Chest radiography remains the most common radiologic examination, and interpretation of its results can be difficult.
Purpose: To explore the potential benefit of artificial intelligence (AI) assistance in the detection of thoracic abnormalities on chest radiographs by evaluating the performance of radiologists with different levels of expertise, with and without AI assistance.
Materials and Methods: Patients who underwent both chest radiography and thoracic CT within 72 hours between January 2010 and December 2020 in a French public hospital were screened retrospectively. Radiographs were randomly included until reaching 500 radiographs, with about 50% of radiographs having abnormal findings. A senior thoracic radiologist annotated the radiographs for five abnormalities (pneumothorax, pleural effusion, consolidation, mediastinal and hilar mass, lung nodule) based on the corresponding CT results (ground truth). A total of 12 readers (four thoracic radiologists, four general radiologists, four radiology residents) read half the radiographs without AI and half the radiographs with AI (ChestView; Gleamer). Changes in sensitivity and specificity were measured using paired t tests.
Using AI to Improve Radiologist Performance in Detection of Abnormalities on Chest Radiographs
Souhail Bennani et al.
Radiology 2023; 309(3):e230860
Results: The study included 500 patients (mean age, 54 years Å} 19 [SD]; 261 female, 239 male), with 522 abnormalities visible on 241 radiographs. On average, for all readers, AI use resulted in an absolute increase in sensitivity of 26% (95% CI: 20, 32), 14% (95% CI: 11, 17), 12% (95% CI: 10, 14), 8.5% (95% CI: 6, 11), and 5.9% (95% CI: 4, 8) for pneumothorax, consolidation, nodule, pleural effusion, and mediastinal and hilar mass, respectively (P < .001). Specificity increased with AI assistance (3.9% [95% CI: 3.2, 4.6], 3.7% [95% CI: 3, 4.4], 2.9% [95% CI: 2.3, 3.5], and 2.1% [95% CI: 1.6, 2.6] for pleural effusion, mediastinal and hilar mass, consolidation, and nodule, respectively), except in the diagnosis of pneumothorax (−0.2%; 95% CI: −0.36, −0.04; P = .01). The mean reading time was 81 seconds without AI versus 56 seconds with AI (31% decrease, P < .001).
Conclusion: AI-assisted chest radiography interpretation resulted in absolute increases in sensitivity for all radiologists of various levels ofexpertise and reduced the reading times; specificity increased with AI, except in the diagnosis of pneumothorax.
Using AI to Improve Radiologist Performance in Detection of Abnormalities on Chest Radiographs
Souhail Bennani et al.
Radiology 2023; 309(3):e230860
Summary
Artificial intelligence assistance can improve the detection accuracy of thoracic abnormalities on chest radiographs across radiologists with varying levels of expertise, leading to marked improvements in sensitivity and a reduction in interpretation time.
Key Results
■ In a retrospective study of 500 patients who underwent chest radiography and thoracic CT for all abnormality types, artificial intelligence (AI)-assisted chest radiography interpretation resulted in increased sensitivity of 6%–26% (P < .001) for all readers, including thoracic radiologists, general radiologists, and radiology residents.
■ Mean reading time was 81 seconds without AI versus 56 seconds with AI (a decrease of 31%, P < .001), with a 17% reduction for radiographs with abnormalities versus a 38% reduction for radiographs with no abnormalities.
Using AI to Improve Radiologist Performance in Detection of Abnormalities on Chest Radiographs
Souhail Bennani et al.
Radiology 2023; 309(3):e230860
Our results showed that AI assistance resulted in absolute increases in sensitivity for all readers of various levels of experience, including general radiologists and radiology residents, in detecting all five types of abnormalities on chest radiographs: from 5.3% for mediastinal and hilar mass to 25.3% for pneumothorax (P < .001). Specificity increased with AI assistance (from 2.1% [95% CI: 1.6, 2.6] for nodule to 3.9% [95% CI: 3.2, 4.6]), except in the diagnosis of pneumothorax (−0.2%; 95% CI: −0.36, −0.04; P = .01). Although unassisted thoracic radiologists outperformed unassisted general radiologists for the five abnormality types, assisted thoracic radiologists solely outperformed assisted general radiologists in the detection of consolidations (73.9% [95% CI: 67, 80] vs 70.5% [95% CI: 64, 77]; P = .01). Finally, the mean reading time was 81 seconds without AI versus 56 seconds with AI, for a 31% reduction (P < .001), with 17% reduction for radiographs with abnormalities and 38% reduction for radiographs with no abnormalities.
Using AI to Improve Radiologist Performance in Detection of Abnormalities on Chest Radiographs
Souhail Bennani et al.
Radiology 2023; 309(3):e230860
‘In regard to the impact of AI on reading time, there are conflicting data, with some reports citing a 10% reduction in reading time and others citing an increase of more than 100%. In our study, the 31% decrease in reading time was more important than previously reported. As in the study by Shin et al, we observed that the time saved in reading is greater for radiographs without abnormalities, which represent the majority of chest radiographs in clinical practice.”
Using AI to Improve Radiologist Performance in Detection of Abnormalities on Chest Radiographs
Souhail Bennani et al.
Radiology 2023; 309(3):e230860

PURPOSE Low-dose computed tomography (LDCT) for lung cancer screening is effective, although most eligible
people are not being screened. Tools that provide personalized future cancer risk assessment could focus
approaches toward those most likely to benefit. We hypothesized that a deep learning model assessing the entirevolumetric LDCT data could be built to predict individual risk without requiring additional demographic or
clinical data.
CONCLUSION Sybil can accurately predict an individual’s future lung cancer risk from a single LDCT scan to
further enable personalized screening. Future study is required to understand Sybil’s clinical applications. Our
model and annotations are publicly available.
Sybil: A Validated Deep Learning Model to Predict Future Lung Cancer Risk From a Single Low-Dose Chest Computed Tomography
Peter G. Mikhael et al.
J Clin Oncol 2023 (in press)

“The goal of oncology is to provide the longest possible survival outcomes with the therapeutics that are currently available without sacrificing patients’ quality of life. In lung cancer, several data points over a patient’s diagnostic and treatment course are relevant to optimizing outcomes in the form of precision medicine, and artificial intelligence (AI) provides the opportunity to use available data from molecular information to radiomics, in combination with patient and tumor characteristics, to help clinicians provide individualized care. In doing so, AI can help create models to identify cancer early in diagnosis and deliver tailored therapy on the basis of available information, both at the time of diagnosis and in real time as they are undergoing treatment. The purpose of this review is to summarize the current literature in AI specific to lung cancer and how it applies to the multidisciplinary team taking care of these complex patients.”
Integration of artificial intelligencein lung cancer: Rise of the machine
Colton Ladbury et al.
Cell Reports Medicine (2023), https://doi.org/10.1016/j.xcrm.2023.100933
“The use of AI to augment imaging technology has found success in several disciplines, including computer-aided detection and diagnosis (CAD), convolutional neural networks (CNNs), and radiomics.CAD systems are typically standalone with a unified goal of detection or diagnosis of disease. At its core, it is simply trying to aid practitioners with identification of disease, with primary focus on that binary outcome. The field of radiomics seeks to use medical imaging to generate high-dimensional quantitative data, which can in turn be used for analysis that seeks to better understand the underlying characteristics of disease.Radiomics is inherently meant to support the overall diagnosis and management of patients at any point in the imaging workflow and can be combined with other patient characteristics to produce powerful support tools, and therefore can be considered a natural extension of CAD.”
Integration of artificial intelligencein lung cancer: Rise of the machine
Colton Ladbury et al.
Cell Reports Medicine (2023), https://doi.org/10.1016/j.xcrm.2023.100933
In addition to assisting with identifying lung cancers, AI can also help predict oncologic outcomes overall and who will respond to therapy. Predicting outcomes including locoregional and distant recurrence, progression-free survival, and overall survival (OS) can be challenging, given that factors that influence these outcomesare multivariate. Imaging features are no doubt highly relevant, but these must be combined with patient and tumor.
Integration of artificial intelligence in lung cancer: Rise of the machine
Colton Ladbury et al.
Cell Reports Medicine (2023), https://doi.org/10.1016/j.xcrm.2023.100933
“AI has also been applied to treatment decision making. A clinical decision support system (CDSS) is a tool to assist physicians in making clinical decisions on the basis of analyses of multiple data points on a particular patient. Watson for Oncology (WFO) is one example of a CDSS that has been applied to the treatment management of lung cancer. A study comparing decisions made by WFO to a multidisciplinary team found relatively high concordance in recommendations for early stage and metastatic disease (92.4%–100%) but lower rates of concordance in stage II or III (80.8%–84.6%).51 Therefore, although there is room for improvement for decision support, these tools will be critical for standardizing lung cancer treatment across available treatment options and disciplines, thereby enhancing outcomes.”
Integration of artificial intelligence in lung cancer: Rise of the machine
Colton Ladbury et al.
Cell Reports Medicine (2023), https://doi.org/10.1016/j.xcrm.2023.100933
“Surgical resection is standard of care for management of localized lung cancer. Extent of surgery depends on several factors, including disease progression and patient eligibility. When possible, lobectomy has been established as standard, with improved disease control and/or survival compared with smaller wedge resections63 and larger pneumonectomies.64 Furthermore, the mortality rate of lobectomies is 2.3% compared with 6.8% with pneumonectomies. However, not every patient will be a candidate for lobectomy, because of factors such as medical history, smoking history, and lung function. AI offers an opportunity to better risk-stratify patients to come up with an optimal treatment plan, which might also include no surgery at all if risk is too high.”
Integration of artificial intelligence in lung cancer: Rise of the machine
Colton Ladbury et al.
Cell Reports Medicine (2023), https://doi.org/10.1016/j.xcrm.2023.100933
“Although AI is clearly an invaluable tool to the multidisciplinary lung cancer care team, several barriers remain to its widespread implementation and availability. First, AI relies heavily on data, and data acquisition and organization continue to be a challenge that AI will need to overcome. Efforts will optimally focus on ways of efficiently extracting EMR data to create large databases for AI research. Sample size is important in AI research, as it must be sufficiently large to train, test and validate models. Presently, most outcomes-based research studies include relatively small numbers of patients (between tens and hundreds of patients) that are somewhat heterogeneous as far as patient demographics, genomics, and imaging features are concerned. Though it is sometimes possible to perform AI analyses on datasets of that size, sample sizes in the thousands might be required for many applications. Otherwise, models may be inaccurate, poorly generalizable, and not applicable or reproducible to clinical outcomes. Additionally, although the EMR system has provided the opportunity to extract data into models for AI-based research, a number of variables are recorded as free text, which cannot directly be extracted for data analysis.”
Integration of artificial intelligence in lung cancer: Rise of the machine
Colton Ladbury et al.
Cell Reports Medicine (2023), https://doi.org/10.1016/j.xcrm.2023.100933
“The present is an exciting time for lung cancer treatment, as the available treatment options, and the precision with which we can select them, have improved dramatically in recent years. However, these increasingly tailored treatment options are accompanied by a need for data to inform clinical decisions, and therefore a need to be able to make sense of large volumes of data throughout a hypothetical patients’ treatment course. The overarching field of AI, inclusive of ML, NNs, DL, NLP, XAI, and other domains and methodologies, offers a promising avenue for improving all aspects of lung cancer management with datadriven approaches. Advances in radiomics allow us to derive additional value from existing diagnostic imaging, while ML algorithms help with optimizing treatment selection. Although there are limitations to AI and challenges as discussed, with large databases and suitable platforms AI research will continue to grow and become more reproducible, accurate, and applicable. With the rise in AI-based research over the past decade and increasing interest toward AI in the oncology community, including young trainees, AI-based interventions in lung cancer management will play a key role in the future.”
Integration of artificial intelligence in lung cancer: Rise of the machine
Colton Ladbury et al.
Cell Reports Medicine (2023), https://doi.org/10.1016/j.xcrm.2023.100933

“Clinicians often encounter discrepant measurements of the ascending aorta that impede, complicate, and impair appropriate clinical assessment—including key issues of presence or absence of aortic growth, rate of growth, and need for surgical intervention. These discrepancies may arise within a single modality (computed tomography scan, magnetic resonance imaging, or echocardiography) or between modalities. The authors explore the origins and significance of these discrepancies, revealing that some “truth” usually underlies all the discrepant measurements, which individually look at the ascending aorta with different perspectives and dimensional definitions.”
Discrepancies in Measurement of the Thoracic Aorta
John A. Elefteriades et al.
J A C C V OL . 7 6 , NO . 2 , 2 0 2 0 : 2 0 1 – 1 7
“Aortic measurements can vary substantially between image sets done with or without gating, reflecting the variation of aortic size in the different phases of the aortic cycle. Often (probably, most commonly), ascending aortic aneurysms are identified incidentally on scans done for other reasons. Such scans will often have been done nongated. It would be helpful to have the notification “nongated” routinely included in the official radiographic report.”
Discrepancies in Measurement of the Thoracic Aorta
John A. Elefteriades et al.
J A C C V OL . 7 6 , NO . 2 , 2 0 2 0 : 2 0 1 – 1 7
“USE OF CORONAL IMAGE FOR AORTIC ROOT. The maximal deep sinus to deep sinus dimension can be approximated on the coronal (or sagittal) images by simple hand techniques. One takes the largest diameter that will fit in the aortic root zone on the coronal films. We feel that this dimension has clinical meaning. Furthermore, reading the “fattest” diameter on the coronal images obviates the issue in centerline techniques of identifying the proper caudal to cranial plane along the centerline for measurements to be taken. The maximal diameter is vividly apparent on simple coronal images. Furthermore, as we will see in the next section, the deep sinus to deep sinus dimension resembles very closely the methodology of measurement of aortic root dimension by echocardiography. Echo technicians are taught to orient the echo beam precisely to capture the maximum transverse dimension.”
Discrepancies in Measurement of the Thoracic Aorta
John A. Elefteriades et al.
J A C C V OL . 7 6 , NO . 2 , 2 0 2 0 : 2 0 1 – 1 7
ASCENDING AORTA PROPER. Let us consider first the ascending aorta proper (above the sinotubular junction). For most aortas, which have undergone little lengthening and, thus, little curvature, measurements by simple diameter on axial images will differ very little from double-oblique computerized assessments. When the ascending aorta has elongatedconsiderably, and, thus, become curved, obliquity of the axial plane in respect to the centerline of the aorta will introduce a moderate discrepancy of several millimeters; the axial measurements will “overestimate” the diameter relative to the double oblique measurements. For the practitioner, a simple supplemental diameter measurement on the coronal images will correct for this obliquity.”
Discrepancies in Measurement of the Thoracic Aorta
John A. Elefteriades et al.
J A C C V OL . 7 6 , NO . 2 , 2 0 2 0 : 2 0 1 – 1 7
“In a large sociodemographically diverse cohort of patients with TAA, absolute risk of aortic dissection was low but increased with larger aortic sizes after adjustment for potential confounders and competing risks. Our data support current consensus guidelines recommending prophylactic surgery in nonsyndromic individuals with TAA at a 5.5-cm threshold.”
Association of Thoracic Aortic Aneurysm Size With Long-term Patient Outcomes The KP-TAA Study
Matthew D. Solomon et al.
JAMA Cardiol. 2022;7(11):1160-1169
Key Points
Question: What is the risk of aortic dissection (AD) and all-cause death for nonsyndromic patients with unrepaired ascending thoracic aortic aneurysm (TAA), overall and by TAA size?
Findings: In this cohort study, the overall absolute risk of AD was low. Although the risk of AD and all-cause death was associated with larger aortic sizes, there was an inflection point at 6.0 cm.
Meaning: The findings in this study support consensus guidelines recommending surgical intervention at 5.5 cm in nonsyndromic patients with TAA; earlier prophylactic surgery should be done only selectively in the nonsyndromic population, given the nontrivial risks associated with aortic surgery.
Association of Thoracic Aortic Aneurysm Size With Long-term Patient Outcomes The KP-TAA Study
Matthew D. Solomon et al.
JAMA Cardiol. 2022;7(11):1160-1169
“We identified a large sociodemographically diverse cohort of more than 6300 nonsyndromic adults with TAA, which included substantial follow-up with patients with all TAA sizes, including large aortic sizes that have been previously understudied. Larger aortic size was associated with higher risk of AD and all-cause death after adjustment for potential confounders and competing risks, but absolute risk ofADwas low in the overall cohort with an inflection point at 6.0 cm, supporting current guidelines recommending surgery at 5.5 cm. Earlier prophylactic surgery should be considered selectively in nonsyndromic patients with TAA, given the nontrivial risks associated with aortic surgery.”
Association of Thoracic Aortic Aneurysm Size With Long-term Patient Outcomes The KP-TAA Study
Matthew D. Solomon et al.
JAMA Cardiol. 2022;7(11):1160-1169

“Artificial intelligence (AI) is becoming more widespread within radiology. Capabilities that AI algorithms currently provide include detection, segmentation, classification, and quantification of pathological findings. Artificial intelligence software have created challenges for the traditional United States Food and Drug Administration (FDA) approval process for medical devices given their abilities to evolve over time with incremental data input. Currently, there are 190 FDA-approved radiology AI-based software devices, 42 of which pertain specifically to thoracic radiology. The majority of these algorithms are approved for the detection and/or analysis of pulmonary nodules, for monitoring placement of endotracheal tubes and indwelling catheters, for detection of emergent findings, and for assessment of pulmonary parenchyma; however, as technology evolves, there are many other potential applications that can be explored. For example, evaluation of non-idiopathic pulmonary fibrosis interstitial lung diseases, synthesis of imaging, clinical and/or laboratory data to yield comprehensive diagnoses, and survival or prognosis prediction of certain pathologies. With increasing physician and developer engagement, transparency and frequent communication between developers and regulatory agencies, such as the FDA, AI medical devices will be able to provide a critical supplement to patient management and ultimately enhance physicians’ ability to improve patient care.”
The current status and future of FDA-approved artificial intelligence tools in chest radiology in the United States
M.E. Milam, C.W. Koo
Clinical Radiology 2022 (in press)
“Rib fractures are commonly seen in the setting of thoracic trauma with an estimated 350,000 cases in the United States yearly. Presently, the only FDA-approved AI SaMD for rib fracture detection on CT is the uAI EasyTriage- Rib by Shanghai United Imaging Intelligence Co., Ltd. The tool consists of automatic vertebra localisation, rib segmentation and labelling, and rib fracture detection. There are currently no FDA-approved AI SaMD algorithms to detect rib fractures on chest radiographs.”
The current status and future of FDA-approved artificial intelligence tools in chest radiology in the United States
M.E. Milam, C.W. Koo
Clinical Radiology 2022 (in press)
“Although medical AI holds great promise, several barriers must be overcome before widespread clinical implementation. The process through which a model renders a decision remains unclear to human end-users, preventing trust-building. Although explainability methods, such as saliency maps, feature visualisation, and Shapley plots exist, it is increasingly apparent that these methods may be insufficient and more intuitive methods will need to be developed. In order to obtain buy-in from physicians, a change in physician point of view is needed, from perceiving AI as a rival threatening job security to seeing AI as a beneficial assistant. Physician engagement through training sessions that promote user understanding of software capability and limitations and having physicians as drivers of product development instead of just passive users may be key in turning such challenge into an opportunity. It cannot be emphasised enough that physician acceptance is one of the most important determinants in the successful initial institutional rollout of a medical AI product.”
The current status and future of FDA-approved artificial intelligence tools in chest radiology in the United States
M.E. Milam, C.W. Koo
Clinical Radiology 2022 (in press)
“AI is becoming ever increasingly more integrated within radiology. It has required the FDA to adapt their regulatory and approval process. Currently, there are 42 commercially available FDA-approved AI SaMD that have applications within chest radiology and as technologies continue to progress, more devices with different and enhanced capabilities currently under development will become available. Although several challenges remain for widespread adoption of AI SaMD, with increasing physician buy in, developer engagement, transparency, and frequent communication between developers and regulatory agencies such as the FDA, it will not be long before AI becomes an integral part of patient management and ultimately enhances patient care.”
The current status and future of FDA-approved artificial intelligence tools in chest radiology in the United States
M.E. Milam, C.W. Koo
Clinical Radiology 2022 (in press)

Objectives: To evaluate and compare the diagnostic performances of a commercialized artificial intelligence (AI) algorithm for diagnosing pulmonary embolism (PE) on CT pulmonary angiogram (CTPA) with those of emergency radiologists in routine clinical practice.
Methods: This was an IRB-approved retrospective multicentric study including patients with suspected PE from September to December 2019 (i.e., during a preliminary evaluation period of an approved AI algorithm). CTPA quality and conclusions by emergency radiologists were retrieved from radiological reports. The gold standard was a retrospective review of CTPA, radiological and clinical reports, AI outputs, and patient outcomes. Diagnostic performance metrics for AI and radiologists were assessed in the entire cohort and depending on CTPA quality.
How artificial intelligence improves radiological interpretation in suspected pulmonary embolism
Alexandre Ben Cheikh et al.
European Radiology 2022https://doi.org/10.1007/s00330-022-08645-2
Results: Overall, 1202 patients were included (median age: 66.2 years). PE prevalence was 15.8% (190/1202). The AI algorithm detected 219 suspicious PEs, of which 176 were true PEs, including 19 true PEs missed by radiologists. In the cohort, the highest sensitivity and negative predictive values (NPVs) were obtained with AI (92.6% versus 90% and 98.6% versus 98.1%, respectively), while the highest specificity and positive predictive value (PPV) were found with radiologists (99.1% versus 95.8% and 95% versus 80.4%, respectively). Accuracy, specificity, and PPV were significantly higher for radiologists except in subcohorts with poor-to-average injection quality. Radiologists positively evaluated the AI algorithm to improve their diagnostic comfort (55/79 [69.6%]).
Conclusion: Instead of replacing radiologists, AI for PE detection appears to be a safety net in emergency radiology practice due to high sensitivity and NPV, thereby increasing the self-confidence of radiologists.
How artificial intelligence improves radiological interpretation in suspected pulmonary embolism
Alexandre Ben Cheikh et al.
European Radiology 2022 https://doi.org/10.1007/s00330-022-08645-2
Key Points
• Both the AI algorithm and emergency radiologists showed excellent performance in diagnosing PE on CTPA (sensitivity and specificity ≥ 90%; accuracy ≥ 95%).
• The AI algorithm for PE detection can help increase the sensitivity and NPV of emergency radiologists in clinical practice, especially in cases of poor-to-moderate injection quality.
• Emergency radiologists recommended the use of AI for PE detection in satisfaction surveys to increase their confidence and comfort in their final diagnosis.
How artificial intelligence improves radiological interpretation in suspected pulmonary embolism
Alexandre Ben Cheikh et al.
European Radiology 2022https://doi.org/10.1007/s00330-022-08645-2
“Indeed, in the entire cohort-2019, AIDOC captured 19 PEs that were not diagnosed by radiologists in 19 distinct patients. In other words, the AI algorithm could correct a misdiagnosed PE approximately every 63 CTPAs (≈1202/19). This estimation must be considered in parallel with the high number of CTPAs required by emergency physicians (≈18,000 CTPAs in 2020 in our group—so approximately 285 [≈18000/1202 × 19] true PEs detected by AI but initially misdiagnosed by radiologists in 2020) and with human and financial consequences of missed PEs [32]. Indeed, mortality and recurrence rates for untreated or missed PE range between 5 and 30%.”
How artificial intelligence improves radiological interpretation in suspected pulmonary embolism
Alexandre Ben Cheikh et al.
European Radiology 2022https://doi.org/10.1007/s00330-022-08645-2
“In conclusion, this study confirms the high diagnostic performances of AI algorithms relying on DCNN to diagnose PE on CTPA in a large multicentric retrospective emergency series. It also underscores where and how AI algorithms could better support (or “augment”) radiologists, i.e., for poor- quality examinations and by increasing their diagnostic con- fidence through the high sensitivity and high NPV of AI. Thus, our work provides more scientific ground for the concept of “AI-augmented” radiologists instead of supporting the theory of radiologists’ replacement by AI.”
How artificial intelligence improves radiological interpretation in suspected pulmonary embolism
Alexandre Ben Cheikh et al.
European Radiology 2022https://doi.org/10.1007/s00330-022-08645-2

CONCLUSIONS AND RELEVANCE
In this diagnostic study, an AI algorithm was associated with improved detection of pulmonary nodules on chest radiographs compared with unaided interpretation for different levels of detection difficulty and for readers with different experience.
An Artificial Intelligence–Based Chest X-ray Model on Human Nodule Detection Accuracy From a Multicenter Study
Fatemeh Homayounieh et al.
JAMA Network Open. 2021;4(12):e2141096.
OBJECTIVE To assess if a novel artificial intelligence(AI) algorithm can help detect pulmonary nodules on radiographs at different levels of detection difficulty.
CONCLUSIONS AND RELEVANCE In this diagnostic study, an AI algorithm was associated with improved detection of pulmonary nodules on chest radiographs compared with unaided interpretation for different levels of detection difficulty and for readers with different experience.
An Artificial Intelligence–Based Chest X-ray Model on Human Nodule Detection Accuracy From a Multicenter Study
Fatemeh Homayounieh et al.
JAMA Network Open. 2021;4(12):e2141096.
“This study found that a novel AI algorithm was associated with improved accuracy and AUCs for junior and senior radiologists for detecting pulmonary nodules. Four of 9 radiologists had a lower number of missed and false positive pulmonary nodules with help from AI-aided interpretation of chest radiographs.”
An Artificial Intelligence–Based Chest X-ray Model on Human Nodule Detection Accuracy From a Multicenter Study
Fatemeh Homayounieh et al.
JAMA Network Open. 2021;4(12):e2141096.

“It is time to move beyond studies showing that AI can detect opacities at CT or chest radiography—this is now well established. Instead, there is a great need for AI systems, based on a combination of imaging, laboratory, and clinical information, that provide actionable predictions otherwise unavailable or less accurate without AI.”
Artificial Intelligence of COVID-19 Imaging: A Hammer in Search of a Nail
Ronald M. Summers
Radiology 2020; 00:1–3
“More observer performance experiments are necessary to determine whether AI improves clinical interpretation according to reader experience level and reading paradigm (first, concurrent, or second reader). Prospective outcome studies are necessary to determine whether the use of AI leads to changes in patient care, shortened hospitalizations, and reduced morbidity and mortality. Nonradiology clinical information will need to be routinely incorporated into AI models. Assessment of risk and progression of the chronic sequela of COVID-19 infection is necessary. A prospective randomized controlled trial would be exemplary.”
Artificial Intelligence of COVID-19 Imaging: A Hammer in Search of a Nail
Ronald M. Summers
Radiology 2020; 00:1–3
“How does one put this deluge of articles into context? It seems unlikely that an AI system would detect many patients with COVID-19 who had a negative reverse transcription polymerase chain reaction test. Anecdotes will occur. But from a general perspective, this is unlikely to propel dissemination of the AI technology. What about distinguishing COVID-19 from other viral pneumonias? It seems unlikely that clinical decision mak- ing would depend on the recommendations of AI, given more definitive laboratory tests are available. Could AI lead to a fully automated interpretation? This has not been the focus of COVID-19 imaging AI to date. Multitask approaches that identify multiple abnormalities at chest imaging besides opacities will be needed, such as universal lesion detection. What about mortality prediction? Hazard ratios on the order of 2 to 3, as found in the article by Mushtaq et al, are generally insufficient for clinical decision making. While it is possible that prediction of an adverse outcome could lead to more aggressive treatment, it could also lead to unnecessary costs and adverse effects.”
Artificial Intelligence of COVID-19 Imaging: A Hammer in Search of a Nail
Ronald M. Summers
Radiology 2020; 00:1–3
"What are the current needs of AI systems for COVID-19 and CT and chest radiography? Public challenges or competitions pitting different AI systems against one another would enable “apples-to-apples” comparisons of performance. More observer performance experiments are necessary to determine whether AI improves clinical interpretation according to reader experience level and reading paradigm (first, concurrent, or second reader). Prospective outcome studies are necessary to determine whether the use of AI leads to changes in patient care, shortened hospitalizations, and reduced morbidity and mortality. Nonradiology clinical information will need to be routinely incorporated into AI models. Assessment of risk and progression of the chronic sequela of COVID-19 infection is necessary.”
Artificial Intelligence of COVID-19 Imaging: A Hammer in Search of a Nail
Ronald M. Summers
Radiology 2020; 00:1–3

OBJECTIVE To assess the performance of artificial intelligence(AI) algorithms in realistic radiology workflows by performing an objective comparative evaluation of the preliminary reads of anteroposterior (AP) frontal chest radiographs performed by an AI algorithm and radiology residents.
CONCLUSIONS AND RELEVANCE These findings suggest that it is possible to build AI algorithms that reach and exceed the mean level of performance of third-year radiology residents for full- fledged preliminary read of AP frontal chest radiographs. This diagnostic study also found that while the more complex findings would still benefit from expert overreads, the performance of AI algorithms was associated with the amount of data available for training rather than the level of difficulty of interpretation of the finding. Integrating such AI systems in radiology workflows for preliminary interpretations has the potential to expedite existing radiology workflows and address resource scarcity while improving overall accuracy and reducing the cost of care.
Comparison of Chest Radiograph Interpretations by Artificial Intelligence Algorithm vs Radiology Residents
Joy T. Wu et al.
JAMA Network Open. 2020;3(10):e2022779. doi:10.1001/jamanetworkopen.2020.22779
Question: How does an artificial intelligence (AI) algorithm compare with radiology residents in full-fledged preliminary reads of anteroposterior (AP) frontal chest radiographs?
Findings: This diagnostic study was conducted among 5 third-year radiology residents and an AI algorithm using a study data set of 1998 AP frontal chest radiographs assembled through a triple consensus with adjudication ground truth process covering more than 72 chest radiograph findings. There was no statistically significant difference in sensitivity between the AI algorithm and the radiology residents, but the specificity and positive predictive value were statistically higher for AI algorithm.
Meaning: These findings suggest that well-trained AI algorithms can reach performance levels similar to radiology residents in covering the breadth of findings in AP frontal chest radiographs, which suggests there is the potential for the use of AI algorithms for preliminary interpretations of chest radiographs in radiology workflows to expedite radiology reads, address resource scarcity, improve overall accuracy, and reduce the cost of care.
Comparison of Chest Radiograph Interpretations by Artificial Intelligence Algorithm vs Radiology Residents
Joy T. Wu et al.
JAMA Network Open. 2020;3(10):e2022779. doi:10.1001/jamanetworkopen.2020.22779
“Overall, this study points to the potential use AI systems in future radiology workflows for preliminary interpretations that target the most prevalent findings, leaving the final reads performed by the attending physician to still catch any potential misses from the less-prevalent fine-grained findings. Having attending physicians quickly correct the automatically produced reads, we can expect to significantly expedite current dictation-driven radiology workflows, improve accuracy, and ultimately reduce the overall cost of care.”
Comparison of Chest Radiograph Interpretations by Artificial Intelligence Algorithm vs Radiology Residents
Joy T. Wu et al.
JAMA Network Open. 2020;3(10):e2022779. doi:10.1001/jamanetworkopen.2020.22779

Background: IBM Watson for Oncology (WFO) is a cognitive computing system helping physicians quickly identify key information in a patient’s medical record, surface relevant evidence, and explore treatment options. This study assessed the possibility of using WFO for clinical treatment in lung cancer patients.
Methods: We evaluated the level of agreement between WFO and multidisciplinary team (MDT) for lung cancer. From January to December 2018, newly diagnosed lung cancer cases in Chonnam National University Hwasun Hospital were retrospectively examined using WFO version 18.4 according to four treatment categories (surgery, radiotherapy, chemoradiotherapy, and palliative care). Treatment recommendations were considered concordant if the MDT recommendations were designated ‘recommended’ by WFO. Concordance between MDT and WFO was analyzed by Cohen’s kappa value.
Artificial intelligence and lung cancer treatment decision: agreement with recommendation of multidisciplinary tumor board
Min-Seok Kim et al.
Transl Lung Cancer Res 2020;9(3):507-514
Results: In total, 405 (male 340, female 65) cases with different histology (adenocarcinoma 157, squamous cell carcinoma 132, small cell carcinoma 94, others 22 cases) were enrolled. Concordance between MDT and WFO occurred in 92.4% (k=0.881, P<0.001) of all cases, and concordance differed according to clinical stages. The strength of agreement was very good in stage IV non-small cell lung carcinoma (NSCLC) (100%, k=1.000) and extensive disease small cell lung carcinoma (SCLC) (100%, k=1.000). In stage I NSCLC, the agreement strength was good (92.4%, k=0.855). The concordance was moderate in stage III NSCLC (80.8%, k=0.622) and relatively low in stage II NSCLC (83.3%, k=0.556) and limited disease SCLC (84.6%, k=0.435). There were discordant cases in surgery (7/57, 12.3%), radiotherapy (2/12, 16.7%), and chemoradiotherapy (15/129, 11.6%), but no discordance in metastatic disease patients.
Conclusions: Treatment recommendations made by WFO and MDT were highly concordant for lung cancer cases especially in metastatic stage. However, WFO was just an assisting tool in stage I–III NSCLC and limited disease SCLC; so, patient-doctor relationship and shared decision making may be more important in this stage..
Artificial intelligence and lung cancer treatment decision: agreement with recommendation of multidisciplinary tumor board
Min-Seok Kim et al.
Transl Lung Cancer Res 2020;9(3):507-514
Methods: We evaluated the level of agreement between WFO and multidisciplinary team (MDT) for lung cancer. From January to December 2018, newly diagnosed lung cancer cases in Chonnam National University Hwasun Hospital were retrospectively examined using WFO version 18.4 according to four treatment categories (surgery, radiotherapy, chemoradiotherapy, and palliative care). Treatment recommendations were considered concordant if the MDT recommendations were designated ‘recommended’ by WFO. Concordance between MDT and WFO was analyzed by Cohen’s kappa value.
Conclusions: Treatment recommendations made by WFO and MDT were highly concordant for lung cancer cases especially in metastatic stage. However, WFO was just an assisting tool in stage I–III NSCLC and limited disease SCLC; so, patient-doctor relationship and shared decision making may be more important in this stage..
Artificial intelligence and lung cancer treatment decision: agreement with recommendation of multidisciplinary tumor board
Min-Seok Kim et al.
Transl Lung Cancer Res 2020;9(3):507-514
“In conclusion, treatment decisions made by WFO exhibited a high degree of agreement with those of the MDT tumor board, and the concordance varied by stage. AI-based CDSS is expected to play an assistive role, particularly in the metastatic lung cancer stage with less complex treatment options. However, patient-doctor relationships and shared decision making may be more important in non-metastatic lung cancer because of the complexity to reach at an appropriate decision. Further study is warranted to overcome this gray area for current machine learning algorithms.”
Artificial intelligence and lung cancer treatment decision: agreement with recommendation of multidisciplinary tumor board
Min-Seok Kim et al.
Transl Lung Cancer Res 2020;9(3):507-514
“In this study, we used artificial intelligence (AI) algorithms to integrate chest CT findings with clinical symp- toms, exposure history and laboratory testing to rapidly diagnose patients who are positive for COVID-19. Among a total of 905 patients tested by real-time RT–PCR assay and next-generation sequencing RT–PCR, 419 (46.3%) tested positive for SARS-CoV-2. In a test set of 279 patients, the AI system achieved an area under the curve of 0.92 and had equal sensitivity as compared to a senior thoracic radiologist. The AI system also improved the detection of patients who were positive for COVID-19 via RT–PCR who presented with normal CT scans, correctly identifying 17 of 25 (68%) patients, whereas radiologists classified all of these patients as COVID-19 negative. When CT scans and associated clinical history are available, the proposed AI system can help to rapidly diagnose COVID-19 patients.”
Artificial intelligence–enabled rapid diagnosis of patients with COVID-19
Xueyan Mei et al.
Nat Med (2020). https://doi.org/10.1038/s41591-020-0931-3
“In a test set of 279 patients, the AI system achieved an area under the curve of 0.92 and had equal sensitivity as compared to a senior thoracic radiologist. The AI system also improved the detection of patients who were positive for COVID-19 via RT–PCR who presented with normal CT scans, correctly identifying 17 of 25 (68%) patients, whereas radiologists classified all of these patients as COVID-19 negative. When CT scans and associated clinical history are available, the proposed AI system can help to rapidly diagnose COVID-19 patients.”
Artificial intelligence–enabled rapid diagnosis of patients with COVID-19
Xueyan Mei et al.
Nat Med (2020). https://doi.org/10.1038/s41591-020-0931-3
"We believe implementation of the joint algorithm discussed above could aid in both issues. First, the AI algorithm could evaluate the CT immediately after completion. Second, the algorithm outperformed radiologists in identifying patients positive for COVID-19, demonstrating normal CT results in the early stage. Third, the algorithm performed equally well in sensitivity (P = 0.05) in the diagnosis of COVID-19 as compared to a senior thoracic radiologist. Specifically, the joint algorithm achieved a statistically significant 6% (P = 0.00146) and 12% (P < 1 × 10−4) improvement in AUC as compared to the CNN model using only CT images and the MLP model using only clinical information respectively.”
Artificial intelligence–enabled rapid diagnosis of patients with COVID-19
Xueyan Mei et al.
Nat Med (2020). https://doi.org/10.1038/s41591-020-0931-3
"In conclusion, these results illustrate the potential role for a highly accurate AI algorithm for the rapid identification of COVID-19 patients, which could be helpful in combating the current disease outbreak. We believe the AI model proposed, which combines CT imaging and clinical information and shows equivalent accuracy to a senior chest radiologist, could be a useful screening tool to quickly diagnose infectious diseases such as COVID-19 that does not require radiologist input or physical tests.”
Artificial intelligence–enabled rapid diagnosis of patients with COVID-19
Xueyan Mei et al.
Nat Med (2020). https://doi.org/10.1038/s41591-020-0931-3

“Screening mammography aims to identify breast cancer at earlier stages of the disease, when treatment can be more successful . Despite the existence of screening programmes worldwide, the interpretation of mammograms is affected by high rates of false positives and false negatives .Here we present an artificial intelligence(AI) system that is capable of surpassing human experts in breast cancer prediction. To assess its performance in the clinical setting, we curated a large representative dataset from the UK and a large enriched dataset from the USA. We show an absolute reduction of 5.7% and 1.2% (USA and UK) in false positives and 9.4% and 2.7% in false negatives.”
International evaluation of an AI system for breast cancer screening
Scott Mayer McKinney et al
Nature | Vol577 | 2January2020
“In an independent study of six radiologists, the AI system outperformed all of the human readers: the area under the receiver operating characteristic curve (AUC-ROC) for the AI system was greater than the AUC-ROC for the average radiologist by an absolute margin of 11.5%. We ran a simulation in which the AI system participated in the double-reading process that is used in the UK, and found that the AI system maintained non-inferior performance and reduced the workload of the second reader by 88%. This robust assessment of the AI system paves the way for clinical trials to improve the accuracy and efficiency of breast cancer screening.”
International evaluation of an AI system for breast cancer screening
Scott Mayer McKinney et al
Nature | Vol577 | 2January2020
"The optimal use of the AI system within clinical workflows remains to be determined. The specificity advantage exhibited by the system suggests that it could help to reduce recall rates and unnecessary biopsies. The improvement in sensitivity exhibited in the US data shows that the AI system may be capable of detecting cancers earlier than the standard of care. An analysis of the localization performance of the AI system suggests it holds early promise for flagging suspicious regions for review by experts. Notably, the additional cancers identified by the AI system tended to be invasive rather than in situ disease.”
International evaluation of an AI system for breast cancer screening
Scott Mayer McKinney et al
Nature | Vol577 | 2January2020
"Beyond improving reader performance, the technology described here may have a number of other clinical applications. Through simulation, we suggest how the system could obviate the need for double reading in 88% of UK screening cases, while maintaining a similar level of accuracy to the standard protocol. We also explore how high-confidence operating points can be used to triage high-risk cases and dismiss low-risk cases. These analyses highlight the potential of this technology to deliver screening results in a sustainable manner despite workforce shortages in countries such as the UK . Prospective clinical studies will be required to understand the full extent to which this technology can benefit patient care.”
International evaluation of an AI system for breast cancer screening
Scott Mayer McKinney et al
Nature | Vol577 | 2January2020

Background: Deep learning has the potential to augment the use of chest radiography in clinical radiology, but challenges include poor generalizability, spectrum bias, and difficulty comparing across studies.
Purpose: To develop and evaluate deep learning models for chest radiograph interpretation by using radiologist-adjudicated reference standards.
Conclusion: Expert-level models for detecting clinically relevant chest radiograph findings were developed for this study by using adjudicated reference standards and with population-level performance estimation. Radiologist-adjudicated labels for 2412 ChestX- ray14 validation set images and 1962 test set images are provided.
Chest Radiograph Interpretation with Deep Learning Models: Assessment with Radiologist-adjudicated Reference Standards and Population-adjusted Evaluation
Anna Majkowska et al.
Radiology 2019; 00:1–11 • https://doi.org/10.1148/radiol.2019191293
Materials and Methods: Deep learning models were developed to detect four findings (pneumothorax, opacity, nodule or mass, and fracture) on frontal chest radiographs. This retrospective study used two data sets. Data set 1 (DS1) consisted of 759 611 images from a multicity hospital network and ChestX-ray14 is a publicly available data set with 112 120 images. Natural language process- ing and expert review of a subset of images provided labels for 657 954 training images. Test sets consisted of 1818 and 1962 images from DS1 and ChestX-ray14, respectively. Reference standards were defined by radiologist-adjudicated image review. Performance was evaluated by area under the receiver operating characteristic curve analysis, sensitivity, specificity, and positive predictive value. Four radiologists reviewed test set images for performance comparison. Inverse probability weighting was applied to DS1 to ac- count for positive radiograph enrichment and estimate population-level performance.
Results: In DS1, population-adjusted areas under the receiver operating characteristic curve for pneumothorax, nodule or mass, airspace opacity, and fracture were, respectively, 0.95 (95% confidence interval [CI]: 0.91, 0.99), 0.72 (95% CI: 0.66, 0.77), 0.91 (95% CI: 0.88, 0.93), and 0.86 (95% CI: 0.79, 0.92). With ChestX-ray14, areas under the receiver operating characteristic curve were 0.94 (95% CI: 0.93, 0.96), 0.91 (95% CI: 0.89, 0.93), 0.94 (95% CI: 0.93, 0.95), and 0.81 (95% CI: 0.75, 0.86), respectively.
Chest Radiograph Interpretation with Deep Learning Models: Assessment with Radiologist-adjudicated Reference Standards and Population-adjusted Evaluation
Anna Majkowska et al.
Radiology 2019; 00:1–11 • https://doi.org/10.1148/radiol.2019191293
Materials and Methods: Deep learning models were developed to detect four findings (pneumothorax, opacity, nodule or mass, and fracture) on frontal chest radiographs. This retrospective study used two data sets. Data set 1 (DS1) consisted of 759 611 images from a multicity hospital network and ChestX-ray14 is a publicly available data set with 112 120 images. Natural language process- ing and expert review of a subset of images provided labels for 657 954 training images. Test sets consisted of 1818 and 1962 images from DS1 and ChestX-ray14, respectively. Reference standards were defined by radiologist-adjudicated image review. Performance was evaluated by area under the receiver operating characteristic curve analysis, sensitivity, specificity, and positive predictive value. Four radiologists reviewed test set images for performance comparison. Inverse probability weighting was applied to DS1 to ac- count for positive radiograph enrichment and estimate population-level performance.
Chest Radiograph Interpretation with Deep Learning Models: Assessment with Radiologist-adjudicated Reference Standards and Population-adjusted Evaluation
Anna Majkowska et al.
Radiology 2019; 00:1–11 • https://doi.org/10.1148/radiol.2019191293
“Deep learning models achieved parity to chest radiography interpretations from board-certified radiologists for the detection of pneumothorax, nodule or mass, airspace opacity, and fracture on a diverse multicenter chest radiography data set (areas under the receiver operative characteristic curve, 0.95, 0.72, 0.91, and 0.86 respectively)."
Chest Radiograph Interpretation with Deep Learning Models: Assessment with Radiologist-adjudicated Reference Standards and Population-adjusted Evaluation
Anna Majkowska et al.
Radiology 2019; 00:1–11 • https://doi.org/10.1148/radiol.2019191293
Chest Radiograph Interpretation with Deep Learning Models: Assessment with Radiologist-adjudicated Reference Standards and Population-adjusted Evaluation
Anna Majkowska et al.
Radiology 2019; 00:1–11 • https://doi.org/10.1148/radiol.2019191293
“In conclusion, we developed and evaluated clinically relevant artificial intelligence models for chest radiograph interpretation that performed similar to radiologists by using a diverse set of images. The population-adjusted performance analyses reported here along with the release of adjudicated labels for the publicly available ChestX-ray14 images can provide a useful resource to facilitate the continued development of clinically useful artificial intelligence models for chest radiographs.”
Chest Radiograph Interpretation with Deep Learning Models: Assessment with Radiologist-adjudicated Reference Standards and Population-adjusted Evaluation
Anna Majkowska et al.
Radiology 2019; 00:1–11 • https://doi.org/10.1148/radiol.2019191293

Background: Deep learning has the potential to augment the use of chest radiography in clinical radiology, but challenges include poor generalizability, spectrum bias, and difficulty comparing across studies.
Purpose: To develop and evaluate deep learning models for chest radiograph interpretation by using radiologist-adjudicated reference standards.
Conclusion: Expert-level models for detecting clinically relevant chest radiograph findings were developed for this study by using adjudicated reference standards and with population-level performance estimation. Radiologist-adjudicated labels for 2412 ChestX- ray14 validation set images and 1962 test set images are provided.
Chest Radiograph Interpretation with Deep Learning Models: Assessment with Radiologist-adjudicated Reference Standards and Population-adjusted Evaluation
Anna Majkowska et al.
Radiology 2019; 00:1–11 • https://doi.org/10.1148/radiol.2019191293
Materials and Methods: Deep learning models were developed to detect four findings (pneumothorax, opacity, nodule or mass, and fracture) on frontal chest radiographs. This retrospective study used two data sets. Data set 1 (DS1) consisted of 759 611 images from a multicity hospital network and ChestX-ray14 is a publicly available data set with 112 120 images. Natural language process- ing and expert review of a subset of images provided labels for 657 954 training images. Test sets consisted of 1818 and 1962 images from DS1 and ChestX-ray14, respectively. Reference standards were defined by radiologist-adjudicated image review. Performance was evaluated by area under the receiver operating characteristic curve analysis, sensitivity, specificity, and positive predictive value. Four radiologists reviewed test set images for performance comparison. Inverse probability weighting was applied to DS1 to ac- count for positive radiograph enrichment and estimate population-level performance.
Results: In DS1, population-adjusted areas under the receiver operating characteristic curve for pneumothorax, nodule or mass, airspace opacity, and fracture were, respectively, 0.95 (95% confidence interval [CI]: 0.91, 0.99), 0.72 (95% CI: 0.66, 0.77), 0.91 (95% CI: 0.88, 0.93), and 0.86 (95% CI: 0.79, 0.92). With ChestX-ray14, areas under the receiver operating characteristic curve were 0.94 (95% CI: 0.93, 0.96), 0.91 (95% CI: 0.89, 0.93), 0.94 (95% CI: 0.93, 0.95), and 0.81 (95% CI: 0.75, 0.86), respectively.
Chest Radiograph Interpretation with Deep Learning Models: Assessment with Radiologist-adjudicated Reference Standards and Population-adjusted Evaluation
Anna Majkowska et al.
Radiology 2019; 00:1–11 • https://doi.org/10.1148/radiol.2019191293
Materials and Methods: Deep learning models were developed to detect four findings (pneumothorax, opacity, nodule or mass, and fracture) on frontal chest radiographs. This retrospective study used two data sets. Data set 1 (DS1) consisted of 759 611 images from a multicity hospital network and ChestX-ray14 is a publicly available data set with 112 120 images. Natural language process- ing and expert review of a subset of images provided labels for 657 954 training images. Test sets consisted of 1818 and 1962 images from DS1 and ChestX-ray14, respectively. Reference standards were defined by radiologist-adjudicated image review. Performance was evaluated by area under the receiver operating characteristic curve analysis, sensitivity, specificity, and positive predictive value. Four radiologists reviewed test set images for performance comparison. Inverse probability weighting was applied to DS1 to ac- count for positive radiograph enrichment and estimate population-level performance.
Chest Radiograph Interpretation with Deep Learning Models: Assessment with Radiologist-adjudicated Reference Standards and Population-adjusted Evaluation
Anna Majkowska et al.
Radiology 2019; 00:1–11 • https://doi.org/10.1148/radiol.2019191293
“Deep learning models achieved parity to chest radiography interpretations from board-certified radiologists for the detection of pneumothorax, nodule or mass, airspace opacity, and fracture on a diverse multicenter chest radiography data set (areas under the receiver operative characteristic curve, 0.95, 0.72, 0.91, and 0.86 respectively)."
Chest Radiograph Interpretation with Deep Learning Models: Assessment with Radiologist-adjudicated Reference Standards and Population-adjusted Evaluation
Anna Majkowska et al.
Radiology 2019; 00:1–11 • https://doi.org/10.1148/radiol.2019191293
Chest Radiograph Interpretation with Deep Learning Models: Assessment with Radiologist-adjudicated Reference Standards and Population-adjusted Evaluation
Anna Majkowska et al.
Radiology 2019; 00:1–11 • https://doi.org/10.1148/radiol.2019191293
“In conclusion, we developed and evaluated clinically relevant artificial intelligence models for chest radiograph interpretation that performed similar to radiologists by using a diverse set of images. The population-adjusted performance analyses reported here along with the release of adjudicated labels for the publicly available ChestX-ray14 images can provide a useful resource to facilitate the continued development of clinically useful artificial intelligence models for chest radiographs.”
Chest Radiograph Interpretation with Deep Learning Models: Assessment with Radiologist-adjudicated Reference Standards and Population-adjusted Evaluation
Anna Majkowska et al.
Radiology 2019; 00:1–11 • https://doi.org/10.1148/radiol.2019191293

OBJECTIVES To develop a deep learning–based algorithm that can classify normal and abnormal results from chest radiographs with major thoracic diseases including pulmonary malignant neoplasm, active tuberculosis, pneumonia, and pneumothorax and to validate the algorithm’s performance using independent data sets.
CONCLUSIONS AND RELEVANCE The algorithm consistently outperformed physicians, including thoracic radiologists, in the discrimination of chest radiographs with major thoracic diseases, demonstrating its potential to improve the quality and efficiency of clinical practice.
Development and Validation of a Deep Learning–Based Automated Detection Algorithm for Major Thoracic Diseases on Chest Radiographs
Eui Jin Hwang et al.
JAMA Network Open. 2019;2(3):e191095. doi:10.1001/jamanetworkopen.2019.1095
RESULTS The algorithm demonstrated a median (range) area under the curve of 0.979 (0.973-1.000) for image-wise classification and 0.972 (0.923-0.985) for lesion-wise localization; the algorithm demonstrated significantly higher performance than all 3 physician groups in both image-wise classification (0.983 vs 0.814-0.932; all P < .005) and lesion-wise localization (0.985 vs 0.781-0.907; all P < .001). Significant improvements in both image-wise classification (0.814-0.932 to 0.904-0.958; all P < .005) and lesion-wise localization (0.781-0.907 to 0.873-0.938; all P < .001) were observed in all 3 physician groups with assistance of the algorithm.
Development and Validation of a Deep Learning–Based Automated Detection Algorithm for Major Thoracic Diseases on Chest Radiographs
Eui Jin Hwang et al.
JAMA Network Open. 2019;2(3):e191095. doi:10.1001/jamanetworkopen.2019.1095
Key Points
Question Can a deep learning–based algorithm accurately discriminate abnormal chest radiograph results showing major thoracic diseases from normal chest radiograph results?
Findings In this diagnostic study of54 221 chest radiographs with normal findings and 35 613 with abnormal findings, the deep learning–based algorithm for discrimination of chest radiographs with pulmonary malignant neoplasms, active tuberculosis, pneumonia, or pneumothorax demonstrated excellent and consistent performance throughout 5 independent data sets. The algorithm outperformed physicians, including radiologists, and enhanced physician performance when used as a second reader.
Meaning A deep learning–based algorithm may help improve diagnostic accuracy in reading chest radiographs and assist in prioritizing chest radiographs, thereby increasing workflow efficacy.
Development and Validation of a Deep Learning–Based Automated Detection Algorithm for Major Thoracic Diseases on Chest Radiographs
Eui Jin Hwang et al.
JAMA Network Open. 2019;2(3):e191095. doi:10.1001/jamanetworkopen.2019.1095
Findings In this diagnostic study of54 221 chest radiographs with normal findings and 35 613 with abnormal findings, the deep learning–based algorithm for discrimination of chest radiographs with pulmonary malignant neoplasms, active tuberculosis, pneumonia, or pneumothorax demonstrated excellent and consistent performance throughout 5 independent data sets. The algorithm outperformed physicians, including radiologists, and enhanced physician performance when used as a second reader.
Development and Validation of a Deep Learning–Based Automated Detection Algorithm for Major Thoracic Diseases on Chest Radiographs
Eui Jin Hwang et al.
JAMA Network Open. 2019;2(3):e191095. doi:10.1001/jamanetworkopen.2019.1095
The strengths of our study can be summarized as follows. First, the development data set underwent extensive data curation by radiologists. It has been shown that the performance of deep learning–based algorithms depends not only on the quantity of the training data set, but also on the quality of the data labels. As for CRs, several open-source data sets are currently available; however, those data sets remain suboptimal for the development of deep learning–based algorithms because they are weakly labeled by radiologic reports or lack localization information. In contrast, in the present study, we initially collected data from the radiology reports and clinical diagnosis; then experienced board-certified radiologists meticulously reviewed all of the collected CRs. Furthermore, annotation of the exact location of each abnormal finding was done in 35.6% of CRs with abnormal results, which we believe led to the excellent performance of our DLAD.
Development and Validation of a Deep Learning–Based Automated Detection Algorithm for Major Thoracic Diseases on Chest Radiographs
Eui Jin Hwang et al.
JAMA Network Open. 2019;2(3):e191095. doi:10.1001/jamanetworkopen.2019.1095
“In contrast, in the present study, we initially collected data from the radiology reports and clinical diagnosis; then experienced board-certified radiologists meticulously reviewed all of the collected CRs. Furthermore, annotation of the exact location of each abnormal finding was done in 35.6% of CRs with abnormal results, which we believe led to the excellent performance of our DLAD.”
Development and Validation of a Deep Learning–Based Automated Detection Algorithm for Major Thoracic Diseases on Chest Radiographs
Eui Jin Hwang et al.
JAMA Network Open. 2019;2(3):e191095. doi:10.1001/jamanetworkopen.2019.1095
Third, we compared the performance of our DLAD with the performance of physicians with various levels of experience. The stand-alone performance of a CAD system can be influenced by the difficulty of the test data sets and can be exaggerated in easy test data sets. However, observer performance tests may provide a more objective measure of performance by comparing the performance between the CAD system and physicians. Impressively, the DLAD demonstrated significantly higher performance both in image-wise classification and lesion-wise localization than all physician groups, even the thoracic radiologist group.
Development and Validation of a Deep Learning–Based Automated Detection Algorithm for Major Thoracic Diseases on Chest Radiographs
Eui Jin Hwang et al.
JAMA Network Open. 2019;2(3):e191095. doi:10.1001/jamanetworkopen.2019.1095
“The high performance of the DLAD in classification of CRs with normal and abnormal findings indicative of major thoracic diseases, outperforming even thoracic radiologists, suggests its potential for stand-alone use in select clinical situations. It may also help improve the clinical workflow by prioritizing CRs with suspicious abnormal findings requiring prompt diagnosis and management. It can also improve radiologists’ work efficiency, which would partially alleviate the heavy workload burden that radiologists face today and improve patients’ turnaround time. Furthermore, the improved performance of physicians with the assistance of the DLAD indicates the potential of our DLAD as a second reader. The DLAD can contribute to reducing perceptual error of interpreting physicians by alerting them to the possibility of major thoracic diseases and visualizing the location of the abnormality. In particular, the more obvious increment of performance in less-experienced physicians suggests that our DLAD can help improve the quality of CR interpretations in situations in which expert thoracic radiologists may not be available.”
Development and Validation of a Deep Learning–Based Automated Detection Algorithm for Major Thoracic Diseases on Chest Radiographs
Eui Jin Hwang et al.
JAMA Network Open. 2019;2(3):e191095. doi:10.1001/jamanetworkopen.2019.1095
“The high performance of the DLAD in classification of CRs with normal and abnormal findings indicative of major thoracic diseases, outperforming even thoracic radiologists, suggests its potential for stand-alone use in select clinical situations. It may also help improve the clinical workflow by prioritizing CRs with suspicious abnormal findings requiring prompt diagnosis and management. It can also improve radiologists’ work efficiency, which would partially alleviate the heavy workload burden that radiologists face today and improve patients’ turnaround time.”
Development and Validation of a Deep Learning–Based Automated Detection Algorithm for Major Thoracic Diseases on Chest Radiographs
Eui Jin Hwang et al.
JAMA Network Open. 2019;2(3):e191095. doi:10.1001/jamanetworkopen.2019.1095
“We developed a DLAD algorithm that can classify CRs with normal and abnormal findings indicating major thoracic diseases with consistently high performance, outperforming even radiologists, which may improve the quality and efficiency of the current clinical workflow.”
Development and Validation of a Deep Learning–Based Automated Detection Algorithm for Major Thoracic Diseases on Chest Radiographs
Eui Jin Hwang et al.
JAMA Network Open. 2019;2(3):e191095. doi:10.1001/jamanetworkopen.2019.1095

OBJECTIVE. Diagnostic imaging has traditionally relied on a limited set of qualitative imaging characteristics for the diagnosis and management of lung cancer. Radiomics—the extraction and analysis of quantitative features from imaging—can identify additional imaging characteristics that cannot be seen by the eye. These features can potentially be used to diagnose cancer, identify mutations, and predict prognosis in an accurate and noninvasive fash- ion. This article provides insights about trends in radiomics of lung cancer and challenges to widespread adoption.
CONCLUSION. Radiomic studies are currently limited to a small number of cancer types. Its application across various centers are nonstandardized, leading to difficulties in comparing and generalizing results. The tools available to apply radiomics are specialized and limited in scope, blunting widespread use and clinical integration in the general population. Increasing the number of multicenter studies and consortiums and inclusion of radiomics in resident training will bring more attention and clarity to the growing field of radiomics.
Radiomics in Pulmonary Lesion Imaging
Hassani C et al.
AJR 2019; 212:497–504
OBJECTIVE. Diagnostic imaging has traditionally relied on a limited set of qualitative imaging characteristics for the diagnosis and management of lung cancer. Radiomics—the extraction and analysis of quantitative features from imaging—can identify additional imaging characteristics that cannot be seen by the eye. These features can potentially be used to diagnose cancer, identify mutations, and predict prognosis in an accurate and noninvasive fashion. This article provides insights about trends in radiomics of lung cancer and challenges to widespread adoption.
Radiomics in Pulmonary Lesion Imaging
Hassani C et al.
AJR 2019; 212:497–504
CONCLUSION. Radiomic studies are currently limited to a small number of cancer types. Its application across various centers are nonstandardized, leading to difficulties in comparing and generalizing results. The tools available to apply radiomics are specialized and limited in scope, blunting widespread use and clinical integration in the general population. Increasing the number of multicenter studies and consortiums and inclusion of radiomics in resident training will bring more attention and clarity to the growing field of radiomics.
Radiomics in Pulmonary Lesion Imaging
Hassani C et al.
AJR 2019; 212:497–504
Radiomics is defined as the quantification of the phenotypic features of a lesion from medical imaging (i.e., CT, PET, MRI, ultrasound). These features include lesion shape, volume, texture, attenuation, and many more that are not readily apparent or are too numerous for an individual radiologist to assess visually or qualitatively. In other words, radiomics is the process of creating a set of organized data based on the physical properties of an object of interest.
Radiomics in Pulmonary Lesion Imaging
Hassani C et al.
AJR 2019; 212:497–504
Radiomics in Pulmonary Lesion Imaging
Hassani C et al.
AJR 2019; 212:497–504
Regardless of lesion histology and location, the workflow in radiomics remains similar. Images of the lesion, typically CT images, are acquired. The images are segmented to define the outer limits of a given lesion. Specific phenotypic features are then selected, extracted from the images, and recorded. Finally, data analysis is performed on the recorded data. Image features can be extracted and analyzed in either 2D or 3D: 2D refers to segmentation and analysis of radiomic metrics on a single-slice image, whereas 3D refers to the same process across the entire volume of a tumor (many slices).
Radiomics in Pulmonary Lesion Imaging
Hassani C et al.
AJR 2019; 212:497–504
Image features can be extracted and analyzed in either 2D or 3D: 2D refers to segmentation and analysis of radiomic metrics on a single-slice image, whereas 3D refers to the same process across the entire volume of a tumor (many slices). Therefore, 3D radiomics, by definition, requires analysis of the entire volume of tumor. In general, feature extraction and analysis are easier and faster in 2D than in 3D, but 3D may theoretically carry more information. Two-dimensional radiomics is used more commonly, but 3D radiomics is appealing with regard to analyzing intratumoral heterogeneity in cases in which different parts of a tumor may exhibit differing histologic subtypes.
Radiomics in Pulmonary Lesion Imaging
Hassani C et al.
AJR 2019; 212:497–504
“Segmentation of a lesion is the act of extracting or isolating a lesion of interest (e.g., lung nodule) from the surrounding normal lung. Features are then extracted and are further analyzed directly from the segmented lesion. This can be thought of in distinction to deep learning, where an algorithm must learn to automatically extract features from an unsegmented image.”
Radiomics in Pulmonary Lesion Imaging
Hassani C et al.
AJR 2019; 212:497–504
Lesion segmentation can be done either manually or in an automated fashion. Manual segmentation—that is, segmentation performed by a trained observer who manually outlines the lesion of interest—is time-consuming and is more prone to interobserver variability and subjectivity than semiautomated and fully automated segmentation. Manual segmentation is important when accuracy of the tumor outline (i.e., lesion shape and size) is needed.
Radiomics in Pulmonary Lesion Imaging
Hassani C et al.
AJR 2019; 212:497–504
Shape is one of the feature categories understood as both semantic and agnostic. It is a category of features that includes diameter measurements (e.g., minimum, maximum) and their derivatives including volume, ratio of diameters, surface-to-volume ratio, and compactness. Diameter measurements and their derivatives are among the most commonly assessed features. Semantic descriptions such as round, oval, and spiculated are understood agnostically by a varied lexicon that attempts to determine how irregular the object is. In the shape category, tumor volume has shown the most promise in predicting treatment response.
Radiomics in Pulmonary Lesion Imaging
Hassani C et al.
AJR 2019; 212:497–504
“Texture in radiomics broadly refers to the variation in the gray-scale intensities of adjacent pixels or voxels in an image. Depending on the technique involved, texture features are categorized into first, second, and higher-order statistical measures. The first-order statistical measures are composed of features that account for variations in gray-scale intensities without accounting for their spatial location or orientation on the image. For example, a histogram of pixel or voxel intensities, which is a visual representation of the distribution of gray-scale intensity values on an image, is the most common technique to derive the first-order texture measures.”
Radiomics in Pulmonary Lesion Imaging
Hassani C et al.
AJR 2019; 212:497–504
Second-order texture metrics encompass hundreds of features derived by evaluating the relationship of adjacent pixels in an ROI or across the entire lesion. These metrics account for both the intensity of a gray-scale value and its location or orientation in the image. CT images are formed from a 3D matrix of data that is used to determine the amount of gray-level color to display for a given image pixel. Texture or heterogeneity refers to analysis of adjacent pixels of gray color to determine the relationship between them; if there are wide variances in the amount of gray color in a given area, then a lesion is considered more heterogeneous or to have a coarse texture.
Radiomics in Pulmonary Lesion Imaging
Hassani C et al.
AJR 2019; 212:497–504
“Texture has shown the most promise in predicting the presence of malignancy and prognosis. Local binary patterns (LBPs) and gray-level co-occurrence matrices (GLCMs) are most often used in this. However, evaluations of nodule heterogeneity or texture are not limited to LBPs or GLCMs. Numerous alternative methods that attempt to extract patterns from an image via a series of mathematic transformations or filters applied to the image, including Laws’ energy descriptors, fractal analysis, and wavelet analysis, are being increasing applied. This latter group of texture metrics includes higher-order statistical measures. Texture analysis has practical applications; for example, Parmar and colleagues showed that texture features in lung cancer were significantly associated with tumor stage and patient survival.”
Radiomics in Pulmonary Lesion Imaging
Hassani C et al.
AJR 2019; 212:497–504
Radiomics in Pulmonary Lesion Imaging
Hassani C et al.
AJR 2019; 212:497–504
“Segmentation and feature recognition currently rely on the initial identification of a nodule by a radiologist. Thus, the near-term and medium-term role of radiomics is likely to be as a support tool in which radiomics is integrated with traditional radiologic and invasive histologic information. We should note that many prior studies achieved highest accuracy when radiomic data were viewed in light of genetic and clinical information.”
Radiomics in Pulmonary Lesion Imaging
Hassani C et al.
AJR 2019; 212:497–504
“Most importantly, the study of radiomics must be drastically expanded to account for the numerous clinical and radiologic presentations of lung cancer. Radiomics is predicated on creating tools to more accurately diagnose lung cancer and determine prognosis of patients with lung cancer in a noninvasive fashion. However, the tools available to practice radiomics are specialized and limited in scope, blunting wide-spread use and clinical integration in the general population. Looking forward, we believe that increasing the number of multicenter studies and consortiums and inclusion of radiomics in resident training will bring more attention to the growing field of radiomics.”
Radiomics in Pulmonary Lesion Imaging
Hassani C et al.
AJR 2019; 212:497–504
“ Other challenges for radiomics include advancing interinstitutional standards for image acquisition and reconstruction parameters and the development of a unified lexicon. Radiomic data are affected by different image acquisition and reconstruction parameters (e.g., contrast timing, slice thickness, reconstruction algorithm, tube voltage, tube current, and so on) that can affect the reproducibility of radiomic features . Many radiomic studies have relied on a heterogeneous dataset of imaging using a mixture of these parameters. Standardized imaging parameters, including consistent contrast dose, timing, and radiation dose levels, will likely need to be implemented for radiomic studies.”
Radiomics in Pulmonary Lesion Imaging
Hassani C et al.
AJR 2019; 212:497–504
Furthermore, radiomics can be performed in 2D or 3D. Two-dimensional radiomics is applied to a single image slice, and the resulting radiomic features can vary from slice to slice. Three-dimensional radiomics is applied to the entire volume of a tumor. The potential differences between these two fundamentally different approaches require further evaluation. In addition, radiomics is a multidisciplinary field with experts from different backgrounds who approach radiomics in different ways. These experts often collaborate and have to understand and incorporate the methods and rationale of sometimes unfamiliar disciplines. For example, computer science researchers may have limited knowledge and experience with medical image acquisition and reconstruction. A unified lexicon will be necessary to maintain consistency, especially for researchers who have limited experience with medical imaging.”
Radiomics in Pulmonary Lesion Imaging
Hassani C et al.
AJR 2019; 212:497–504

Accurate identification and localization of abnormalities from radiology images play an integral part in clinical diagnosis and treatment planning. Building a highly accurate prediction model for these tasks usually requires a large number of images manually annotated with labels and finding sites of abnormalities. In reality, however, such annotated data are expensive to acquire, especially the ones with location annotations. We need methods that can work well with only a small amount of location annotations. To address this challenge, we present a unified approach that simultaneously performs disease identification and localization through the same underlying model for all images. We demonstrate that our approach can effectively leverage both class information as well as limited location annotation, and significantly outperforms the comparative reference baseline in both classification and localization tasks.
Thoracic Disease Identification and Localization with Limited Supervision
Zhe Li et al arVIX March 2018 (in press)
“We propose a unified model that jointly models disease identification and localization with limited localization annotation data. This is achieved through the same underlying prediction model for both tasks. Quantitative and qualitative results demonstrate that our method significantly outperforms the state-of-the-art algorithm”
Thoracic Disease Identification and Localization with Limited Supervision
Zhe Li et al arVIX March 2018 (in press)
Thoracic Disease Identification and Localization with Limited Supervision
Zhe Li et al
arVIX March 2018 (in press)c
Thoracic Disease Identification and Localization with Limited Supervision
Zhe Li et al
arVIX March 2018 (in press)c

Purpose: To develop and validate a deep learning–based automatic detection algorithm (DLAD) for malignant pulmonary nodules on chest radiographs and to compare its performance with physicians including thoracic radiologists.
Conclusion: This deep learning–based automatic detection algorithm outperformed physicians in radiograph classification and nodule detection performance for malignant pulmonary nodules on chest radiographs, and it enhanced physicians’ performances when used as a second reader.
Development and Validation of Deep Learning–based Automatic Detection Algorithm for Malignant Pulmonary Nodules on Chest Radiographs
Ju Gang Nam et al.
Radiology 2018; 00:1–11 • https://doi.org/10.1148/radiol.2018180237
Materials and Methods: For this retrospective study, DLAD was developed by using 43 292 chest radiographs (normal radiograph– to–nodule radiograph ratio, 34 067:9225) in 34 676 patients (healthy-to-nodule ratio, 30 784:3892; 19 230 men [mean age, 52.8 years; age range, 18–99 years]; 15 446 women [mean age, 52.3 years; age range, 18–98 years]) obtained between 2010 and 2015, which were labeled and partially annotated by 13 board-certified radiologists, in a convolutional neural network. Radiograph classification and nodule detection performances of DLAD were validated by using one internal and four external data sets from three South Korean hospitals and one U.S. hospital. For internal and external validation, radiograph classification and nodule detection performances of DLAD were evaluated by using the area under the receiver operating characteristic curve (AUROC) and jackknife alternative free-response receiver-operating characteristic (JAFROC) figure of merit (FOM), respectively. An observer performance test involving 18 physicians, including nine board-certified radiologists, was conducted by using one of the four external validation data sets. Performances of DLAD, physicians, and physicians assisted with DLAD were evaluated and compared.
Development and Validation of Deep Learning–based Automatic Detection Algorithm for Malignant Pulmonary Nodules on Chest Radiographs
Ju Gang Nam et al.
Radiology 2018; 00:1–11 • https://doi.org/10.1148/radiol.2018180237
Results: According to one internal and four external validation data sets, radiograph classification and nodule detection performances of DLAD were a range of 0.92–0.99 (AUROC) and 0.831–0.924 (JAFROC FOM), respectively. DLAD showed a higher AUROC and JAFROC FOM at the observer performance test than 17 of 18 and 15 of 18 physicians, respectively (P , .05), and all physicians showed improved nodule detection performances with DLAD (mean JAFROC FOM improvement, 0.043; range, 0.006–0.190; P , .05).
Development and Validation of Deep Learning–based Automatic Detection Algorithm for Malignant Pulmonary Nodules on Chest Radiographs
Ju Gang Nam et al.
Radiology 2018; 00:1–11 • https://doi.org/10.1148/radiol.2018180237
Summary: Our deep learning–based automatic detection algorithm outper- formed physicians in radiograph classification and nodule detection performance for malignant pulmonary nodules on chest radiographs, and when used as a second reader, it enhanced physicians’ performances.
Development and Validation of Deep Learning–based Automatic Detection Algorithm for Malignant Pulmonary Nodules on Chest Radiographs
Ju Gang Nam et al.
Radiology 2018; 00:1–11 • https://doi.org/10.1148/radiol.2018180237
Implications for Patient Care
- Our deep learning–based automatic detection algorithm showed excellent detection performances on both a per-radiograph and per-nodule basis in one internal and four external validation data sets.
- Our deep learning–based automatic detection algorithm demonstrated higher performance than the thoracic radiologist group.
- When accompanied by our deep learning–based automatic detection algorithm, all physicians improved their nodule detection performances.
AI and Pathology in Lung Cancer
Visual inspection of histopathology slides is one of the main methods used by pathologists to assess the stage, type and subtype of lung tumors. Adenocarcinoma (LUAD) and squamous cell carcinoma (LUSC) are the most prevalent subtypes of lung cancer, and their distinction requires visual inspection by an experienced pathologist. In this study, we trained a deep convolutional neural network (inception v3) on whole-slide images obtained from The Cancer Genome Atlas to accurately and automatically classify them into LUAD, LUSC or normal lung tissue. The performance of our method is comparable to that of pathologists, with an average area under the curve (AUC) of 0.97. Our model was validated on independent datasets of frozen tissues, formalin-fixed paraffin-embedded tissues and biopsies. Furthermore, we trained the network to predict the ten most commonly mutated genes in LUAD. We found that six of them—STK11, EGFR, FAT1, SETBP1, KRAS and TP53—can be predicted from pathology images, with AUCs from 0.733 to 0.856 as measured on a held-out population. These findings suggest that deep-learning models can assist pathologists in the detection of cancer subtype or gene mutations.
In this study, we trained a deep convolutional neural network (inception v3) on whole-slide images obtained from The Cancer Genome Atlas to accurately and automatically classify them into LUAD, LUSC or normal lung tissue. The performance of our method is comparable to that of pathologists, with an average area under the curve (AUC) of 0.97. Our model was validated on independent datasets of frozen tissues, formalin-fixed paraffin-embedded tissues and biopsies. Furthermore, we trained the network to predict the ten most commonly mutated genes in LUAD. We found that six of them—STK11, EGFR, FAT1, SETBP1, KRAS and TP53—can be predicted from pathology images, with AUCs from 0.733 to 0.856 as measured on a held-out population. These findings suggest that deep-learning models can assist pathologists in the detection of cancer subtype or gene mutations.