Deep Learning: Pathology and Deep Learning Imaging Pearls - Educational Tools | CT Scanning | CT Imaging

Deep Learning: Pathology and Deep Learning Imaging Pearls - Educational Tools | CT Scanning | CT Imaging | CT Scan Protocols - CTisus

Imaging Pearls ❯ Deep Learning ❯ Pathology and Deep Learning

View Pearls by Month:
-- OR --
View Pearls by Topic:
View Pearls by Subsection:

BACKGROUND The diagnosis of celiac disease (CD), an autoimmune disorder with an estimated global prevalence of around 1%, generally relies on the histologic examination of duodenal biopsies. However, interpathologist agreement for CD diagnosis is estimated at no more than 80%. We aim to improve CD diagnosis by developing an accurate, machine-learning-based diagnostic classifier.
METHODS We present a machine learning model that diagnoses the presence or absence of CD from a set of duodenal biopsies representative of real-world clinical data. Our model was trained on a diverse dataset of 3383 whole-slide images of hematoxylin- and eosin-stained duodenal biopsies from four hospitals featuring five different WSI scanners along with their clinical diagnoses. We trained our model using the multiple-instance-learning paradigm in a weakly supervised manner with cross-validation. We evaluated it on an independent test set featuring 644 unseen scans from a different regional NHS trust. In addition, we compared the model’s predictions with independent diagnoses from four specialist pathologists on a subset of the test data.
Machine Learning Achieves Pathologist-Level Celiac Disease Diagnosis
Florian Jaeckle , James Denholm , Benjamin Schreiber et al.
NEJM AI 2025;2(4)
RESULTS Our model diagnosed CD in an independent test set from a previously unseen source with accuracy, sensitivity, and specificity exceeding 95% and an area under the receiver operating characteristic curve exceeding 99%. These results indicate that the model has the potential to outperform pathologists. In comparing the model’s predictions with diagnoses on unseen test data from four independent pathologists, we found statistically indistinguishable results between pathologist–pathologist and pathologist–model interobserver agreement (P>96%).
CONCLUSIONS Our model achieved pathologist-level performance in diagnosing the presence or absence of CD from a representative set of duodenal biopsies, including biopsies from a previously unseen hospital. We concluded that our model has the potential to accurately identify or rule out CD, thereby significantly reducing the time required for pathologists to make a diagnosis.
Machine Learning Achieves Pathologist-Level Celiac Disease Diagnosis
Florian Jaeckle , James Denholm , Benjamin Schreiber et al.
NEJM AI 2025;2(4)
Our machine learning model, which achieved human-level performance, demonstrates significant potential in addressing these critical issues. It achieved an accuracy, sensitivity, and specificity exceeding 95% on an independent test set composed of cases from sources not included in the training and validation data. This study shows AI achieving human-level performance in CD diagnosis on a genuine, multicenter, clinically representative cohort of patient samples. This level of generalizability is crucial for deploying AI models in real-world clinical environments, where variability in staining protocols and scanner technology can significantly impact diagnostic accuracy. We also show that our model worked equally well for patients of all sexes and ages above 19 years of age.
Machine Learning Achieves Pathologist-Level Celiac Disease Diagnosis
Florian Jaeckle , James Denholm , Benjamin Schreiber et al.
NEJM AI 2025;2(4)
In conclusion, we present a machine learning diagnostic tool for CD that achieves human-level accuracy on images of duodenal biopsies from a hospital that was not used for training. We demonstrate that the concordance between the model and pathologists is similar to the agreement among pathologists. Furthermore, we demonstrate that our model is unbiased and performs equally well for male and female patients for all ages over 19 years.
Machine Learning Achieves Pathologist-Level Celiac Disease Diagnosis
Florian Jaeckle , James Denholm , Benjamin Schreiber et al.
NEJM AI 2025;2(4)

Radiology till the late 1990’s
- Film (8x10, 10x14, 14x17)
- Film jackets (one copy of a study)
- Film Processors and Film Alternators
- Kodak was the king of imaging
Radiology and Pathology Change is Similar
- Film to digital display (Radiology)
- Glass slides to digital display (Pathology)
Both were held back in part by the attitude of the changes are not acceptable
“Histopathology image evaluation is indispensable for cancer diagnoses and subtype classification. Standard artificial intelligence methods for histopathology image analyses have focused on optimizing specialized models for each diagnostic task. Although such methods have achieved some success, they often have limited generalizability to images generated by different digitization protocols or samples collected from different populations3. Here, to address this challenge, we devised the Clinical Histopathology Imaging Evaluation Foundation (CHIEF) model, a general purpose weakly supervised machine learning framework to extract pathology imaging features for systematic cancer evaluation. CHIEF leverages two complementary pretraining methods to extract diverse pathology representations: unsupervised pretraining for tile-level feature identification and weakly supervised pretraining for whole-slide pattern recognition.”
A pathology foundation model for cancer diagnosis and prognosis prediction.
Wang X, Zhao J, Marostica E, et al.
Nature. 2024 Sep 4. doi: 10.1038/s41586-024-07894-z. Epub ahead of print. PMID: 39232164.
“We developed CHIEF using 60,530 whole-slide images spanning 19 anatomical sites. Through pretraining on 44 terabytes of high resolution pathology imaging datasets, CHIEF extracted microscopic representations useful for cancer cell detection, tumour origin identification, molecular profile characterization and prognostic prediction. We successfully validated CHIEF using whole-slide images from 32 independent slide sets collected from 24 hospitals and cohorts internationally. Overall, CHIEF outperformed the state-of-the-art deep learning methods by up to 36.1%, showing its ability to address domain shifts observed in samples from diverse populations and processed by different slide preparation methods. CHIEF provides a generalizable foundation for efficient digital pathology evaluation for patients with cancer.”
A pathology foundation model for cancer diagnosis and prognosis prediction.
Wang X, Zhao J, Marostica E, et al.
Nature. 2024 Sep 4. doi: 10.1038/s41586-024-07894-z. Epub ahead of print. PMID: 39232164.
”We established the CHIEF model, a general-purpose machine learning framework for weakly supervised histopathological image analyses. Unlike commonly used self-supervised feature extractors, CHIEF leveraged two types of pretraining procedure: unsupervised pretraining on 15 million unlabelled tile images and weakly supervised pretraining on more than 60,000 WSIs. Tile-level unsupervised pretraining established a general feature extractor for haematoxylin–eosin-stained histopathological images collected from heterogeneous publicly available databases, which captured diverse manifestations of microscopic cellular morphologies. Subsequent WSI-level weakly supervised pretraining constructed a general-purpose model by characterizing thesimilarities and differences between cancer types.”
A pathology foundation model for cancer diagnosis and prognosis prediction.
Wang X, Zhao J, Marostica E, et al.
Nature. 2024 Sep 4. doi: 10.1038/s41586-024-07894-z. Epub ahead of print. PMID: 39232164.
CHIEF consistently attained superior performance in a variety of cancer identification tasks using either biopsy or surgical resection slides CHIEF achieved a macro-average area under the receiver operating characteristic curve (AUROC) of 0.9397 across 15 datasets representing 11 cancer types, which is approximately 10% higher than that attained by DSMIL (a macro-average AUROC of 0.8409), 12% higher than that of ABMIL (a macro-average AUROC of 0.8233) and 14% higher than that of CLAM (a macro-average AUROC of 0.8016). In all five biopsy datasets collected from independent cohorts, CHIEF possessed AUROCs of greater than 0.96 across several cancer types, including oesophagus (CUCH-Eso), stomach (CUCH-Sto), colon (CUCH-Colon) and prostate (Diagset-B and CUCH-Pros). On independent validation with seven surgical resection slide sets spanning five cancer types (that is, colon (Dataset-PT), breast (DROID-Breast), endometrium (SMCH-Endo and CPTAC-uterine corpus endometrial carcinoma (UCEC)), lung (CPTAC-lung squamous cell carcinoma (LUSC)) and cervix (SMCH-Cervix and TissueNet)), CHIEF attained AUROCs greater than 0.90
A pathology foundation model for cancer diagnosis and prognosis prediction.
Wang X, Zhao J, Marostica E, et al.
Nature. 2024 Sep 4. doi: 10.1038/s41586-024-07894-z. Epub ahead of print. PMID: 39232164.
“The CHIEF framework successfully characterized tumour origins, predicted clinically important genomic profiles, and stratified patients into longer-term survival and shorter-term survival groups. Furthermore, our approach established a general pathology feature extractor capable of a wide range of prediction tasks even with small sample sizes. Our results showed that CHIEF is highly adaptable to diverse pathology samplesobtained from several centres, digitized by various scanners, and obtained from different clinical procedures (that is, biopsy and surgicalresection). This new framework substantially enhanced model generalizability, a critical barrier to the clinical penetrance of conventional computational pathology models.”
A pathology foundation model for cancer diagnosis and prognosis prediction.
Wang X, Zhao J, Marostica E, et al.
Nature. 2024 Sep 4. doi: 10.1038/s41586-024-07894-z. Epub ahead of print. PMID: 39232164.
“In conclusion, CHIEF is a foundation model useful for a wide range of pathology evaluation tasks across several cancer types. We have demonstrated the generalizability of this foundation model across several clinical applications using samples collected from 24 hospitals and patient cohorts worldwide. CHIEF required minimal image annotations and extracted detailed quantitative features from WSIs, which enabled systematic analyses of the relationships among morphological patterns, molecular aberrations and important clinical outcomes. Accurate, robust and rapid pathology sample assessment provided by CHIEF will contribute to the development of personalized cancer management.”
A pathology foundation model for cancer diagnosis and prognosis prediction.
Wang X, Zhao J, Marostica E, et al.
Nature. 2024 Sep 4. doi: 10.1038/s41586-024-07894-z. Epub ahead of print. PMID: 39232164.
Application of AI to an array of diagnostic tasks using WSIs has rapidly expanded in recent years . Successes in AI for digital pathology can be found for many disease types, but particularly in examples applied to cancer. An important early study in 2017 by Bejnordi et al. described 32 AI models developed for breast cancer detection in lymph nodes through the CAMELYON16 grand challenge. The best model achieved an area under the curve (AUC) of 0.994 (95% CI 0.983–0.999), demonstrating similar performance to the human in this controlled environment.
Artificial intelligence in digital pathology: a systematic review and meta-analysis of diagnostic test accuracy.
McGenity C, Clarke EL, Jennings C, et al..
NPJ Digit Med. 2024 May 4;7(1):114. doi: 10.1038/s41746-024-01106-8. PMID: 38704465;
Following recent prominent discoveries in deep learning techniques, wider artificial intelligence (AI) applications have emerged for many sectors, including in healthcare. Pathology AI is of broad importance in areas across medicine, with implications not only in diagnostics, but in cancer research, clinical trials and AI-enabled therapeutic targeting4. Access to digital pathology through scanning of whole slide images (WSIs) has facilitated greater interest in AI that can be applied to these images. WSIs are created by scanning glass microscope slides to produce a high resolution digital image, which is later reviewed by a pathologist to determine the diagnosis. Opportunities for pathologists have arisen from this technology, including remote and flexible working, obtaining second opinions, easier collaboration and training, and applications in research, such as AI..
Artificial intelligence in digital pathology: a systematic review and meta-analysis of diagnostic test accuracy.
McGenity C, Clarke EL, Jennings C, et al..
NPJ Digit Med. 2024 May 4;7(1):114. doi: 10.1038/s41746-024-01106-8. PMID: 38704465;
”Despite the many developments in pathology AI, examples of routine clinical use of these technologies remain rare and there are concerns around the performance, evidence quality and risk of bias for medical AI studies in general. Although, in the face of an increasing pathology workforce crisis, the prospect of tools that can assist and automate tasks is appealing. Challenging workflows and long waiting lists mean that substantial patient benefit could be realised if AI was successfully harnessed to assist in the pathology laboratory.”
Artificial intelligence in digital pathology: a systematic review and meta-analysis of diagnostic test accuracy.
McGenity C, Clarke EL, Jennings C, et al..
NPJ Digit Med. 2024 May 4;7(1):114. doi: 10.1038/s41746-024-01106-8. PMID: 38704465;
“AI has been extensively promoted as a useful tool that will transform medicine, with examples of innovation in clinical imaging, electronic health records (EHR), clinical decision making, genomics, wearables, drug development and robotics. The potential of AI in digital pathology has been identified by many groups, with discoveries frequently emerging and attracting considerable interest. Tools have not only been developed for diagnosis and prognostication, but also for predicting treatment response and genetic mutations from the H&E image alone. Various models have now received regulatory approval for applications in pathology, with some examples being trialled in clinical settings.”
Artificial intelligence in digital pathology: a systematic review and meta-analysis of diagnostic test accuracy.
McGenity C, Clarke EL, Jennings C, et al..
NPJ Digit Med. 2024 May 4;7(1):114. doi: 10.1038/s41746-024-01106-8. PMID: 38704465;
Of studies of other disease types included in the meta-analysis, AI models in liver cancer, lymphoma, melanoma, pancreatic cancer, brain cancer, lung cancer and rhabdomyosarcoma all demonstrated a high sensitivity and specificity. This emphasises the breadth of potential diagnostic tools for clinical applications with a high diagnostic accuracy in digital pathology. T
Artificial intelligence in digital pathology: a systematic review and meta-analysis of diagnostic test accuracy.
McGenity C, Clarke EL, Jennings C, et al..
NPJ Digit Med. 2024 May 4;7(1):114. doi: 10.1038/s41746-024-01106-8. PMID: 38704465;
Artificial intelligence in digital pathology: a systematic review and meta-analysis of diagnostic test accuracy.
McGenity C, Clarke EL, Jennings C, et al..
NPJ Digit Med. 2024 May 4;7(1):114. doi: 10.1038s41746-024-01106-8. PMID: 38704465;
“Performing an external validation on data from an alternative source to that on which an AI model was trained, providing details on the process for case selection and using large, diverse datasets would help to reduce the risk of bias of these studies. Overall, better quality study design, transparency, reporting quality and addressing substantial areas of bias is needed to improve the evidence quality in pathology AI and to therefore harness the benefits of AI for patients and clinicians.”
Artificial intelligence in digital pathology: a systematic review and meta-analysis of diagnostic test accuracy.
McGenity C, Clarke EL, Jennings C, et al..
NPJ Digit Med. 2024 May 4;7(1):114. doi: 10.1038/s41746-024-01106-8. PMID: 38704465;
“There are many promising applications for AI models in WSIs to assist the pathologist. This systematic review has outlined a high diagnostic accuracy for AI across multiple disease types. A larger body of evidence is available for gastrointestinal pathology, urological pathology and breast pathology. Many other disease areas are underrepresented and should be explored further in future. To improve the quality of future studies, reporting of sensitivity, specificity and raw data (true positives, false positives, false negatives, true negatives) for pathology AI models would help with transparency in comparing diagnostic performance between studies.”
Artificial intelligence in digital pathology: a systematic review and meta-analysis of diagnostic test accuracy.
McGenity C, Clarke EL, Jennings C, et al..
NPJ Digit Med. 2024 May 4;7(1):114. doi: 10.1038/s41746-024-01106-8. PMID: 38704465;
Digital pathology, also referred to as whole slide imaging, is a sub-field of pathology, in which tissue specimens are digitized with a scanner before being examined. Biopsy or sample collection techniques, laboratory workflow and final reporting with treatment decisions remain largely unchanged, but the slide review phase of the pathology process happens in a digital way using a display and viewing software, in addition to or in combination with a microscope.
Digital pathology enables the creation of digital images of glass slides that can be securely shared electronically with other pathologists to view, reducing transportation needs and speeding testing and results reporting. It also has the advantage of extending access to expert consults to geographic areas where pathologists are in short supply, such as in parts of rural America and internationally. It may also help alleviate workforce pressures due to a shortage of pathologists and histotechnologists, the skilled laboratory professionals who prepare tissue slides. Digital pathology enables the creation of digital images of glass slides that can be securely shared electronically with other pathologists to view, reducing transportation needs and speeding testing and results reporting. It also has the advantage of extending access to expert consults to geographic areas where pathologists are in short supply, such as in parts of rural America and internationally. It may also help alleviate workforce pressures due to a shortage of pathologists and histotechnologists, the skilled laboratory professionals who prepare tissue slides.
The analysis of histopathology images with artificial intelligence aims to enable clinical decision support systems and precision medicine. The success of such applications depends on the ability to model the diverse patterns observed in pathology images. To this end, we present Virchow, the largest foundation model for computational pathology to date. In addition to the evaluation of biomarker prediction and cell identification, we demonstrate that a large foundation model enables pan-cancer detection, achieving 0.95 specimen-level area under the (receiver operating characteristic) curve across nine common and seven rare cancers. Furthermore, we show that with less training data, the pan-cancer detector built on Virchow can achieve similar performance to tissue-specific clinical-grade models in production and outperform them on some rare variants of cancer. Virchow’s performance gains highlight the value of a foundation model and open possibilities for many high-impact applications with limited amounts of labeled training data.
A foundation model for clinical-grade computational pathology and rare cancers detection
Eugene Vorontsov, Alican Bozkurt, Adam Casson et al.
Nat Med. 2024 Oct;30(10):2924-2935.
Computational pathology applies artificial intelligence (AI) to digitized WSIs to support the diagnosis, characterization and understanding of disease . Initial work has focused on clinical decision support tools to enhance current workflows, and in 2021 the first Food and Drug Administration-approved AI pathology system was launched. However, given the incredible gains in performance of computer vision, a subfield of AI focused on images, more recent studies attempt to unlock new insights from routine WSIs and reveal undiscovered outcomes such as prognosis and therapeutic response. If successful, such efforts would enhance the utility of hematoxylin and eosin (H&E)-stained WSIs and reduce reliance on specialized and often expensive immunohistochemistry (IHC) or genomic testing.
A foundation model for clinical-grade computational pathology and rare cancers detection
Eugene Vorontsov, Alican Bozkurt, Adam Casson et al.
Nat Med. 2024 Oct;30(10):2924-2935.
A major factor in the performance gains of computer vision models has been the creation of large-scale deep neural networks, termed foundation models. Foundation models are trained on enormous . datasets—orders of magnitude greater than any used historically for computational pathology—using a family of algorithms, referred to as self-supervised learning, which do not require curated labels. Foundation models generate data representations, called embeddings, that can generalize well to diverse predictive tasks. This offers a distinct advantage over current diagnostic-specific methods in computational pathology, which, limited to a subset of pathology images, are less likely to reflect the full spectrum of variations in tissue morphology and laboratory preparations necessary for adequate generalization in practice. The value of generalization from large datasets is even greater for applications with inadequate quantities of data to develop bespoke models, as is the case for the detection of uncommon or rare tumor types, as well as for less common diagnostic tasks such as the prediction of specific genomic alterations, clinical outcomes and therapeutic response
A foundation model for clinical-grade computational pathology and rare cancers detection
Eugene Vorontsov, Alican Bozkurt, Adam Casson et al.
Nat Med. 2024 Oct;30(10):2924-2935.
“Here, we present a million-image-scale pathology foundation model, Virchow, named in honor of Rudolf Virchow, who is regarded as the father of modern pathology and proposed the first theory of cellular pathology45. Virchow is trained on data from approximately 100,000 patients corresponding to approximately 1.5 million H&E stained WSIs acquired from Memorial Sloan Kettering Cancer Center (MSKCC), which is 4–10× more WSIs than in prior training datasets in pathology. The training data are composed of cancerous and benign tissues, collected via biopsy (63%) and resection (37%), from 17 high-level tissues.”
A foundation model for clinical-grade computational pathology and rare cancers detection
Eugene Vorontsov, Alican Bozkurt, Adam Casson et al.
Nat Med. 2024 Oct;30(10):2924-2935.
A key aim of our work was to develop a single model to detect cancer, including rare cancers (defined by the National Cancer Institute (NCI) as cancers with an annual incidence in the United States of fewer than 15 people per 100,000 (ref. 46)), across various tissues. The pan-cancer detection model infers the presence of cancer using Virchow embeddings as input. For evaluation, slides from MSKCC and slides submitted for consultation to MSKCC from numerous external sites globally are used. Stratified performance across nine common and seven rare cancer types is reported. Embeddings generated by Virchow, UNI41, Phikon37 and CTransPath35 are evaluated. Pan-cancer aggregators are trained using specimen-level labels, maintaining the same training protocol for all embeddings .
A foundation model for clinical-grade computational pathology and rare cancers detection
Eugene Vorontsov, Alican Bozkurt, Adam Casson et al.
Nat Med. 2024 Oct;30(10):2924-2935.
“Recent advances in computational pathology have been supported by increased dataset scale and reduced reliance on labels. Using multiple-instance learning with labels at the level of groups of slides has enabled clinically relevant diagnostics by scaling to training datasets on the order of 10,000 WSIs. These earlier works typically initialized the model’s embedding parameters using pretrained model weights, often those trained on ImageNet in a supervised setting. This process, called transfer learning, was motivated by the observation that model performance critically depends on the model’s ability to capture image features. In-domain transfer learning was not possible given the limited availability of labeled pathology datasets. Now self-supervised learning is enabling in-domain transfer by removing the label requirement, driving a second wave of scaling to tens of thousands of WSIs to inform image representation. Virchow marks a major increase in training data scale to 1.5 million WSIs—a volume of data that is over 3,000 times the size of ImageNet as measured by the total number of pixels. This large scale of data in turn motivates large models that can capture the diversity of image features in WSIs. In this work, we have demonstrated that this approach can form the foundation for clinical-grade models in cancer pathology.”
A foundation model for clinical-grade computational pathology and rare cancers detection
Eugene Vorontsov, Alican Bozkurt, Adam Casson et al.
Nat Med. 2024 Oct;30(10):2924-2935.
There has been a sharp rise in the development and application of AI tools, including image-based algorithms for the use in pathology service, and it is expected to dominate the field of pathology in the coming years. The deployment of computational pathology and application of pathology-related AI tools can be considered a paradigm shift that will change the way pathology services are managed and make them not only more efficient but also capable of meeting the needs of this era of precision medicine. Development of pathology-based AI tools needs input from a multidisciplinary team in which pathologists and users should have great input to improve the adoption of these technology-driven applications.
Current and future applications of artificial intelligence in pathology: a clinical perspective
Rakha EA, Toss M, Shiino S, et al.
J Clin Pathol 2021;74:409–414.
A combination of AI and pathologists can yield results that are more accurate, consistent, timely and useful beyond a human’s ability. AI can provide analytical tools to streamline the complex, multistep pathology case life cycle in pathology laboratories, from accession to archiving. This can provide not only workflow automation but also analytics dashboard and data repository that can improve efficiency by self-learning from previous experience and help understand laboratory productivity, quality and efficiency, in addition to helping to allocate future resources to areas in more need. AI can also improve streamlining the whole process by aligning the laboratory technical components of the case pathway with the pathologists reporting components.
Current and future applications of artificial intelligence in pathology: a clinical perspective
Rakha EA, Toss M, Shiino S, et al.
J Clin Pathol 2021;74:409–414.
“AI can also improve streamlining the whole process by aligning the laboratory technical components of the case pathway with the pathologists reporting components. Improving the efficiency of pathology service workflow, trainee and junior pathologists reporting, timely reporting by pathologists, costefficient diagnostic, prognostic/predictive algorithms and production of multidimensional output of pathology reports, and combining with image and genomic/genetic data are some of the expected benefits of AI technology application in routine practice.”
Current and future applications of artificial intelligence in pathology: a clinical perspective
Rakha EA, Toss M, Shiino S, et al.
J Clin Pathol 2021;74:409–414.
AI applications will also lead to an advanced diagnostic, enabling researchers and clinical teams to share knowledge and use computational algorithms to assess and contribute valuable insights that can ultimately lead to a more informed and detailed pathology diagnosis. This integration will help advance the future of precision oncology and can result in personalised care plans.
Current and future applications of artificial intelligence in pathology: a clinical perspective
Rakha EA, Toss M, Shiino S, et al.
J Clin Pathol 2021;74:409–414.
► Using Computational Pathology and Artificial Intelligence in clinical histopathology service is expected to extend in the near future.
► Understanding the current limitation and challenges of AI applications will help to improve its performance and applicability.
► These new technologies are aiming to complement the human resources rather than replacing them.
Current and future applications of artificial intelligence in pathology: a clinical perspective
Rakha EA, Toss M, Shiino S, et al.
J Clin Pathol 2021;74:409–414.
“But what AI is, and how exactly it might prove useful in pathology, is still not clear to many of us. It feels new and mysterious, triggering a lot of anxiety about what it means for our profession, our practices, and our patients. I’ve heard a lot of fear that AI may eventually even replace us.”
Donald S. Karcher, MD
President, The College of American Pathologists
Sept 2024
”Like many other “disruptive” technologies, AI is just another tool that will allow us to make better and more actionable diagnoses—nothing more and nothing less. It will not take our place and it will not eliminate the need for our expertise. But we should be prepared. There are loads of pathology AI tools that are on the verge of being FDA approved, and I expect they’ll start coming out by the dozens in the next few years. Image analysis tools will be quite common (there are already FDA-approved AI programs for image analysis in pathology) and so too will be products to analyze “big data” across many clinical sources, including laboratory data. We will have to know how to assess them, select the best ones for our practices, and implement them.”
Donald S. Karcher, MD
President, The College of American Pathologists
Sept 2024
Objective.—To review use cases of GAI in CP, with a particular focus on large language models. Specific examples are provided for the applications of GAI in the subspecialties of clinical chemistry, microbiology, hematopathology, and molecular diagnostics. Additionally, the review addresses potential pitfalls of GAI paradigms
Conclusions.—GAI is a powerful tool with the potential to revolutionize health care for patients and practitioners alike. However, GAI must be implemented with much caution considering various shortcomings of the technology such as biases, hallucinations, practical challenges of implementing GAI in existing CP workflows, and end-user acceptance. Human-in-the-loop models of GAI implementation have the potential to revolutionize CP by delivering deeper, meaningful insights into patient outcomes both at an individual and population level.
Evaluating Use of Generative Artificial Intelligence in Clinical Pathology Practice
Peter McCaffrey, Ronald Jackups, Jansen Seheult et al.
Arch Pathol Lab Med. doi: 10.5858/arpa.2024-0208-RA)
“Within the field of pathology, the use of AI/machine learning has been confined to nongenerative settings and much attention has rightfully been paid to whole slide imaging classification and image segmentation tasks. This has led to more widespread digitization, which, in turn, has enabled image-based AI to access pathology workflows much in the same way that AI has advanced through the specialty of radiology.”
Evaluating Use of Generative Artificial Intelligence in Clinical Pathology Practice
Peter McCaffrey, Ronald Jackups, Jansen Seheult et al.
Arch Pathol Lab Med. doi: 10.5858/arpa.2024-0208-RA)
However, ignorance and nonparticipation in AI technologies is not an option for laboratories (and pathologists) to remain competitive in the marketplace. If we collectively maintain an open dialogue about challenges and successes while adapting this transformational technology in a collaborative manner, there will be ample room for everyone to be successful. In the famous words of Peter Drucker, “The best way to predict the future is to create it.”
Evaluating Use of Generative Artificial Intelligence in Clinical Pathology Practice
Peter McCaffrey, Ronald Jackups, Jansen Seheult et al.
Arch Pathol Lab Med. doi: 10.5858/arpa.2024-0208-RA)
BACKGROUND While previous studies of artificial intelligence (AI) have shown its potential for diagnosing diseases using imaging data, clinical implementation lags behind. AI models require training with large numbers of examples, which are only available for common diseases. In clinical reality, however, the majority of diseases are less frequent, and current AI models overlook or misclassify them. An effective, comprehensive technique is needed for the full spectrum of real-world diagnoses.
AI-Based Anomaly Detection for Clinical-Grade Histopathological Diagnostics
Jonas Dippel, Niklas Prenißl, Julius Hense, et al
NEJM AI 2024;1(11)
Methods We collected two large real-world datasets of gastrointestinal (GI) biopsies, which are prototypical of the problem. Herein, the 10 most common findings accounted for approximately 90% of cases, whereas the remaining 10% contained 56 disease entities, including many cancers. Seventeen million histological images from 5423 cases were used for training and evaluation. We propose a deep anomaly detection (AD) approach that only requires training data from common diseases to also detect all less frequent diseases.
Results Without specific training for the diseases, our best-performing model reliably detected a broad spectrum of infrequent (“anomalous”) pathologies with 95.0% (stomach) and 91.0% (colon) area under the receiver operating characteristic curve (AUROC) and was able to generalize between scanners and hospitals. Cancers were detected with 97.7% (stomach) and 96.9% (colon) AUROC. Heatmaps reliably highlighted anomalous areas and can guide pathologists during the diagnostic process.
AI-Based Anomaly Detection for Clinical-Grade Histopathological Diagnostics
Jonas Dippel, Niklas Prenißl, Julius Hense, et al
NEJM AI 2024;1(11)
“In this study, we establish the first effective clinical application of AI-based AD in histopathology and demonstrate high performance on a unique real-world collection of GI biopsies. The proposed novel AD can flag anomalous cases, facilitate case prioritization, and reduce missed diagnoses, providing critical support for pathologists. By design, it can be expected to detect any pathological alteration including rare primary or metastatic cancers in GI biopsies. To our knowledge, no other published AI tool is capable of zero-shot pan-cancer detection. AD may enhance the safety of AI models in histopathology, thereby driving AI adoption and automation in routine diagnostics and beyond.”
AI-Based Anomaly Detection for Clinical-Grade Histopathological Diagnostics
Jonas Dippel, Niklas Prenißl, Julius Hense, et al
NEJM AI 2024;1(11)
“Diagnostic pathology faces serious challenges due to a shortage of pathologists in many parts of the world and too few doctors entering the profession. Meanwhile, both the diagnostic workload and cancer burden are rising. Diagnostic procedures are increasing in complexity due to the demands of precision medicine. Studies have shown significant diagnostic errors in a range of 0.1% to 10% of cases.”
AI-Based Anomaly Detection for Clinical-Grade Histopathological Diagnostics
Jonas Dippel, Niklas Prenißl, Julius Hense, et al
NEJM AI 2024;1(11)
In this study, we hypothesize that infrequent findings in histopathological images can be detected using AI-based anomaly detection (AD). In contrast to supervised learning, AD assumes that certain data inputs are too infrequent to be sufficiently represented during model training. Instead of trying to learn insufficiently represented patterns, AD methods aim to very precisely characterize the frequent findings, which in our setting includes normal cases and common pathologies that can be learned by supervised methods. Samples deviating from the learned common characteristics are consequently deemed “anomalies.” Since only frequent findings are used for AD model training, there is no need for extensive data collection or annotation gathering of rarer instances from the tail of the disease distribution. is no need for extensive data collection or annotation gathering of rarer instances from the tail of the disease distribution.
AI-Based Anomaly Detection for Clinical-Grade Histopathological Diagnostics
Jonas Dippel, Niklas Prenißl, Julius Hense, et al
NEJM AI 2024;1(11)
We were largely successful in this task, as almost all diseases from various diagnostic groups resulted in considerably elevated anomaly scores. Importantly, malignant tumors of very different morphology and histogenesis, such as carcinomas, neuroendocrine tumors, lymphomas, metastatic melanomas, or sarcomas, were reliably assigned high anomaly scores. In fact, of all the diagnostic groups, slide-AUROCs were highest for malignancies, with 97.72% for stomach and 96.97% for colon, respectively (with the overall best-performing OE model). This is crucial, as detecting malignancy is the most consequential task in histopathological diagnostics. Infrequent benign and precancerous neoplastic changes were also reliably detected (slide-AUROC of 88.45% for stomach, 95.72% for colon). Additionally, the AD model effectively recognized inflammation of the colon (slide-AUROC of 94.42%). For stomach, most types of inflammation are frequent and therefore nonanomalous.
AI-Based Anomaly Detection for Clinical-Grade Histopathological Diagnostics
Jonas Dippel, Niklas Prenißl, Julius Hense, et al
NEJM AI 2024;1(11)
“Our AI AD can be implemented in the clinical workflow in two main ways. First, it can be used as a standalone clinical AI assistant that prescreens every stomach and colon sample, identifies and prioritizes “suspicious” cases, and creates corresponding warning labels. The heatmaps can then further guide the pathologists during their assessment. This has the potential to substantially improve diagnostic efficiency and quality while reducing missed diagnoses. Critically, the AI AD’s design ensures the reliable detection of any kind of primary or metastatic cancer in stomach and colon samples, even beyond those evaluated here. To our knowledge, no other published AI tool is capable of this in a zero-shot manner, even across other tissues. Second, the AI AD can be integrated with the supervised detection of common findings (e.g., for GI samples) .”
AI-Based Anomaly Detection for Clinical-Grade Histopathological Diagnostics
Jonas Dippel, Niklas Prenißl, Julius Hense, et al
NEJM AI 2024;1(11)
“In this proposed workflow, common pathologies are classified automatically by a supervised module, while AI-AD flags anomalous cases for manual review by a pathologist. Our results indicate that with the current performance, already up to a third of biopsies with frequent findings could be automatically diagnosed in this way without the risk of missing any less frequent and potentially severe diseases. This fraction can be expected to grow with future model improvements. Ultimately, only a subset of cases may require manual review, drastically reducing pathologists’ workloads and enabling largely automated and safe AI-based histopathological diagnostics.”
AI-Based Anomaly Detection for Clinical-Grade Histopathological Diagnostics
Jonas Dippel, Niklas Prenißl, Julius Hense, et al
NEJM AI 2024;1(11)

“Histopathology image evaluation is indispensable for cancer diagnoses and subtype classification. Standard artificial intelligence methods for histopathology image analyses have focused on optimizing specialized models for each diagnostic task. Although such methods have achieved some success, they often have limited generalizability to images generated by different digitization protocols or samples collected from different populations3. Here, to address this challenge, we devised the Clinical Histopathology Imaging Evaluation Foundation (CHIEF) model, a general purpose weakly supervised machine learning framework to extract pathology imaging features for systematic cancer evaluation. CHIEF leverages two complementary pretraining methods to extract diverse pathology representations: unsupervised pretraining for tile-level feature identification and weakly supervised pretraining for whole-slide pattern recognition.”
A pathology foundation model for cancer diagnosis and prognosis prediction.
Wang X, Zhao J, Marostica E, et al.
Nature. 2024 Sep 4. doi: 10.1038/s41586-024-07894-z. Epub ahead of print. PMID: 39232164.
“We developed CHIEF using 60,530 whole-slide images spanning 19 anatomical sites. Through pretraining on 44 terabytes of high resolution pathology imaging datasets, CHIEF extracted microscopic representations useful for cancer cell detection, tumour origin identification, molecular profile characterization and prognostic prediction. We successfully validated CHIEF using whole-slide images from 32 independent slide sets collected from 24 hospitals and cohorts internationally. Overall, CHIEF outperformed the state-of-the-art deep learning methods by up to 36.1%, showing its ability to address domain shifts observed in samples from diverse populations and processed by different slide preparation methods. CHIEF provides a generalizable foundation for efficient digital pathology evaluation for patients with cancer.”
A pathology foundation model for cancer diagnosis and prognosis prediction.
Wang X, Zhao J, Marostica E, et al.
Nature. 2024 Sep 4. doi: 10.1038/s41586-024-07894-z. Epub ahead of print. PMID: 39232164.
”We established the CHIEF model, a general-purpose machine learning framework for weakly supervised histopathological image analyses. Unlike commonly used self-supervised feature extractors, CHIEF leveraged two types of pretraining procedure: unsupervised pretraining on 15 million unlabelled tile images and weakly supervised pretraining on more than 60,000 WSIs. Tile-level unsupervised pretraining established a general feature extractor for haematoxylin–eosin-stained histopathological images collected from heterogeneous publicly available databases, which captured diverse manifestations of microscopic cellular morphologies. Subsequent WSI-level weakly supervised pretraining constructed a general-purpose model by characterizing thesimilarities and differences between cancer types.”
A pathology foundation model for cancer diagnosis and prognosis prediction.
Wang X, Zhao J, Marostica E, et al.
Nature. 2024 Sep 4. doi: 10.1038/s41586-024-07894-z. Epub ahead of print. PMID: 39232164.
CHIEF consistently attained superior performance in a variety of cancer identification tasks using either biopsy or surgical resection slides CHIEF achieved a macro-average area under the receiver operating characteristic curve (AUROC) of 0.9397 across 15 datasets representing 11 cancer types, which is approximately 10% higher than that attained by DSMIL (a macro-average AUROC of 0.8409), 12% higher than that of ABMIL (a macro-average AUROC of 0.8233) and 14% higher than that of CLAM (a macro-average AUROC of 0.8016). In all five biopsy datasets collected from independent cohorts, CHIEF possessed AUROCs of greater than 0.96 across several cancer types, including oesophagus (CUCH-Eso), stomach (CUCH-Sto), colon (CUCH-Colon) and prostate (Diagset-B and CUCH-Pros). On independent validation with seven surgical resection slide sets spanning five cancer types (that is, colon (Dataset-PT), breast (DROID-Breast), endometrium (SMCH-Endo and CPTAC-uterine corpus endometrial carcinoma (UCEC)), lung (CPTAC-lung squamous cell carcinoma (LUSC)) and cervix (SMCH-Cervix and TissueNet)), CHIEF attained AUROCs greater than 0.90
A pathology foundation model for cancer diagnosis and prognosis prediction.
Wang X, Zhao J, Marostica E, et al.
Nature. 2024 Sep 4. doi: 10.1038/s41586-024-07894-z. Epub ahead of print. PMID: 39232164.
“The CHIEF framework successfully characterized tumour origins, predicted clinically important genomic profiles, and stratified patients into longer-term survival and shorter-term survival groups. Furthermore, our approach established a general pathology feature extractor capable of a wide range of prediction tasks even with small sample sizes. Our results showed that CHIEF is highly adaptable to diverse pathology samplesobtained from several centres, digitized by various scanners, and obtained from different clinical procedures (that is, biopsy and surgicalresection). This new framework substantially enhanced model generalizability, a critical barrier to the clinical penetrance of conventional computational pathology models.”
A pathology foundation model for cancer diagnosis and prognosis prediction.
Wang X, Zhao J, Marostica E, et al.
Nature. 2024 Sep 4. doi: 10.1038/s41586-024-07894-z. Epub ahead of print. PMID: 39232164.
“In conclusion, CHIEF is a foundation model useful for a wide range of pathology evaluation tasks across several cancer types. We have demonstrated the generalizability of this foundation model across several clinical applications using samples collected from 24 hospitals and patient cohorts worldwide. CHIEF required minimal image annotations and extracted detailed quantitative features from WSIs, which enabled systematic analyses of the relationships among morphological patterns, molecular aberrations and important clinical outcomes. Accurate, robust and rapid pathology sample assessment provided by CHIEF will contribute to the development of personalized cancer management.”
A pathology foundation model for cancer diagnosis and prognosis prediction.
Wang X, Zhao J, Marostica E, et al.
Nature. 2024 Sep 4. doi: 10.1038/s41586-024-07894-z. Epub ahead of print. PMID: 39232164.

“Artificial intelligence (AI) is an area of enormous interest that is transforming health care and biomedical research. AI systems have shown the potential to support patients, clinicians, and health-care infrastructure. AI systems could provide rapid and accurate image interpretation, disease diagnosis and prognosis, improved workflow, reduced medical errors, and lead to more efficient and accessible care. Incorporation of patient-reported outcome measures (PROMs), could advance AI systems by helping to incorporate the patient voice alongside clinical data.”
Embedding patient-reported outcomes at the heart of artificial intelligence health-care technologies
Samantha Cruz Rivera et al.
Lancet Digit Health 2023; 5: e168–73
“Pancreatic ductal adenocarcinoma (PDAC) has been left behind in the evolution of personalized medicine. Predictive markers of response to therapy are lacking in PDAC despite various histological and transcriptional classification schemes. We report an artificial intelligence (AI) approach to histologic feature examination that extracts a signature predictive of disease-specific survival (DSS) in patients with PDAC receiving adjuvant gemcitabine. We demonstrate that this AI-generated histologic signature is associated with outcomes following adjuvant gemcitabine, while three previously developed transcriptomic classification systems are not (n = 47). We externally validate this signature in an independent cohort of patients treated with adjuvant gemcitabine (n = 46). Finally, we demonstrate that the signature does not stratify survival outcomes in a third cohort of untreated patients (n = 161), suggesting that the signature is specifically predictive of treatment-related outcomes but is not generally prognostic. This imaging analysis pipeline has promise in the development of actionable markers in other clinical settings where few biomarkers currently exist.”
Development of an artificial intelligence-derived histologic signature associated with adjuvant gemcitabine treatment outcomes in pancreatic cancer
Vivek Nimgaonkar et al.
Cell Reports Medicine 4, 101013, April 18, 2023
“In summary, this study identifies an AI-based histologic signature that stratifies disease-related outcomes among patients who have received adjuvant gemcitabine after resection of PDAC, where transcriptional profiling-based sub-typing fails to do so. This signature, if validated in prospective cohorts, has the potential to become one of the first clinically applicable predictive biomarkers in PDAC. Finally, if validated in PDAC, the imaging analysis platform underlying this signature may be generalized to other clinical settings, thereby facilitating the emergence of biomarkers to predict treatment response in diseases for which few actionable biomarkers currently exist.”
Development of an artificial intelligence-derived histologic signature associated with adjuvant gemcitabine treatment outcomes in pancreatic cancer
Vivek Nimgaonkar et al.
Cell Reports Medicine 4, 101013, April 18, 2023

Objective: In this study we evaluate the accuracy of the newest version of a smartphone application (SA) for risk assessment of skin lesions.
Methods: This SA uses a machine learning algorithm to compute a risk rating. The algorithm is trained on 131,873 images taken by 31,449 users in multiple countries between January 2016 and August 2018 and rated for risk by dermatologists. To evaluate the sensitivity of the algorithm we use 285 histopathologically validated skin cancer cases (including 138 malignant melanomas), from two previously published clinical studies (195 cases) and from the SA user database (90 cases). We calculate the specificity on a separate set from the SA user database containing 6000 clinically validated benign cases.
Results: The algorithm scored a 95.1% (95% CI, 91.9% - 97.3%) sensitivity in detecting (pre)malignant conditions (93% for malignant melanoma and 97% for keratinocyte carcinomas and precursors). This level of sensitivity was achieved with a 78.3% (95% CI, 77.2%-79.3%) specificity.
Conclusions: This SA provides a high sensitivity to detect skin cancer, however there is still room for improvement in terms of specificity. Future studies are needed to assess the impact of this SA on the health systems and its users.
Accuracy of a smartphone application for triage of skin lesions based on machine learning algorithms
Udrea A et al.
J European Academy of Dermatology (in press 2019)
Objective: In this study we evaluate the accuracy of the newest version of a smartphone application (SA) for risk assessment of skin lesions.
Methods: This SA uses a machine learning algorithm to compute a risk rating. The algorithm is trained on 131,873 images taken by 31,449 users in multiple countries between January 2016 and August 2018 and rated for risk by dermatologists. To evaluate the sensitivity of the algorithm we use 285 histopathologically validated skin cancer cases (including 138 malignant melanomas), from two previously published clinical studies (195 cases) and from the SA user database (90 cases). We calculate the specificity on a separate set from the SA user database containing 6000 clinically validated benign cases.
Accuracy of a smartphone application for triage of skin lesions based on machine learning algorithms
Udrea A et al.
J European Academy of Dermatology (in press 2019)
Results: The algorithm scored a 95.1% (95% CI, 91.9% - 97.3%) sensitivity in detecting (pre)malignant conditions (93% for malignant melanoma and 97% for keratinocyte carcinomas and precursors). This level of sensitivity was achieved with a 78.3% (95% CI, 77.2%-79.3%) specificity.
Conclusions: This SA provides a high sensitivity to detect skin cancer, however there is still room for improvement in terms of specificity. Future studies are needed to assess the impact of this SA on the health systems and its users.
Accuracy of a smartphone application for triage of skin lesions based on machine learning algorithms
Udrea A et al.
J European Academy of Dermatology (in press 2019)

Objective: In this study we evaluate the accuracy of the newest version of a smartphone application (SA) for risk assessment of skin lesions.
Methods: This SA uses a machine learning algorithm to compute a risk rating. The algorithm is trained on 131,873 images taken by 31,449 users in multiple countries between January 2016 and August 2018 and rated for risk by dermatologists. To evaluate the sensitivity of the algorithm we use 285 histopathologically validated skin cancer cases (including 138 malignant melanomas), from two previously published clinical studies (195 cases) and from the SA user database (90 cases). We calculate the specificity on a separate set from the SA user database containing 6000 clinically validated benign cases.
Results: The algorithm scored a 95.1% (95% CI, 91.9% - 97.3%) sensitivity in detecting (pre)malignant conditions (93% for malignant melanoma and 97% for keratinocyte carcinomas and precursors). This level of sensitivity was achieved with a 78.3% (95% CI, 77.2%-79.3%) specificity.
Conclusions: This SA provides a high sensitivity to detect skin cancer, however there is still room for improvement in terms of specificity. Future studies are needed to assess the impact of this SA on the health systems and its users.
Accuracy of a smartphone application for triage of skin lesions based on machine learning algorithms
Udrea A et al.
J European Academy of Dermatology (in press 2019)
Objective: In this study we evaluate the accuracy of the newest version of a smartphone application (SA) for risk assessment of skin lesions.
Methods: This SA uses a machine learning algorithm to compute a risk rating. The algorithm is trained on 131,873 images taken by 31,449 users in multiple countries between January 2016 and August 2018 and rated for risk by dermatologists. To evaluate the sensitivity of the algorithm we use 285 histopathologically validated skin cancer cases (including 138 malignant melanomas), from two previously published clinical studies (195 cases) and from the SA user database (90 cases). We calculate the specificity on a separate set from the SA user database containing 6000 clinically validated benign cases.
Accuracy of a smartphone application for triage of skin lesions based on machine learning algorithms
Udrea A et al.
J European Academy of Dermatology (in press 2019)
Results: The algorithm scored a 95.1% (95% CI, 91.9% - 97.3%) sensitivity in detecting (pre)malignant conditions (93% for malignant melanoma and 97% for keratinocyte carcinomas and precursors). This level of sensitivity was achieved with a 78.3% (95% CI, 77.2%-79.3%) specificity.
Conclusions: This SA provides a high sensitivity to detect skin cancer, however there is still room for improvement in terms of specificity. Future studies are needed to assess the impact of this SA on the health systems and its users.
Accuracy of a smartphone application for triage of skin lesions based on machine learning algorithms
Udrea A et al.
J European Academy of Dermatology (in press 2019)

The traditional solution is for doctors to ask colleagues, or to laboriously browse reference textbooks or online resources, hoping to find an image with similar visual characteristics. The general computer vision solution to problems like this is termed content-based image retrieval (CBIR), one example of which is the “reverse image search” feature in Google Images, in which users can search for similar images by using another image as input.
“The tool allows a user to select a region of interest, and obtain visually-similar matches. We tested SMILY’s ability to retrieve images along a pre-specified axis of similarity (e.g. histologic feature or tumor grade), using images of tissue from the breast, colon, and prostate (3 of the most common cancer sites). We found that SMILY demonstrated promising results despite not being trained specifically on pathology images or using any labeled examples of histologic features or tumor grades.”
Building SMILY, a Human-Centric, Similar-Image Search Tool for Pathology
July 19, 2019
Narayan Hegde, Software Engineer, Google Health and Carrie J. Cai, Research Scientist, Google Research
However, a problem emerged when we observed how pathologists interacted with SMILY. Specifically, users were trying to answer the nebulous question of “What looks similar to this image?” so that they could learn from past cases containing similar images. Yet, there was no way for the tool to understand the intent of the search: Was the user trying to find images that have a similar histologic feature, glandular morphology, overall architecture, or something else? In other words, users needed the ability to guide and refine the search results on a case-by-case basis in order to actually find what they were looking for. Furthermore, we observed that this need for iterative search refinement was rooted in how doctors often perform “iterative diagnosis”—by generating hypotheses, collecting data to test these hypotheses, exploring alternative hypotheses, and revisiting or retesting previous hypotheses in an iterative fashion. It became clear that, for SMILY to meet real user needs, it would need to support a different approach to user interaction.
Building SMILY, a Human-Centric, Similar-Image Search Tool for Pathology
July 19, 2019
Narayan Hegde, Software Engineer, Google Health and Carrie J. Cai, Research Scientist, Google Research
“Furthermore, we observed that this need for iterative search refinement was rooted in how doctors often perform “iterative diagnosis”—by generating hypotheses, collecting data to test these hypotheses, exploring alternative hypotheses, and revisiting or retesting previous hypotheses in an iterative fashion. It became clear that, for SMILY to meet real user needs, it would need to support a different approach to user interaction.”
Building SMILY, a Human-Centric, Similar-Image Search Tool for Pathology
July 19, 2019
Narayan Hegde, Software Engineer, Google Health and Carrie J. Cai, Research Scientist, Google Research
Through careful human-centered research described in our second paper, we designed and augmented SMILY with a suite of interactive refinement tools that enable end-users to express what similarity means on-the-fly: 1) refine-by-region allows pathologists to crop a region of interest within the image, limiting the search to just that region; 2) refine-by-example gives users the ability to pick a subset of the search results and retrieve more results like those; and 3) refine-by-concept sliders can be used to specify that more or less of a clinical concept be present in the search results (e.g., fused glands). Rather than requiring that these concepts be built into the machine learning model, we instead developed a method that enables end-users to create new concepts post-hoc, customizing the search algorithm towards concepts they find important for each specific use case. This enables new explorations via post-hoc tools after a machine learning model has already been trained, without needing to re-train the original model for each concept or application of interest.
Building SMILY, a Human-Centric, Similar-Image Search Tool for Pathology
July 19, 2019
Narayan Hegde, Software Engineer, Google Health and Carrie J. Cai, Research Scientist, Google Research
“Interestingly, these refinement tools appeared to have supported pathologists’ decision-making process in ways beyond simply performing better on similarity searches. For example, pathologists used the observed changes to their results from iterative searches as a means of progressively tracking the likelihood of a hypothesis. When search results were surprising, many re-purposed the tools to test and understand the underlying algorithm, for example, by cropping out regions they thought were interfering with the search or by adjusting the concept sliders to increase the presence of concepts they suspected were being ignored. Beyond being passive recipients of ML results, doctors were empowered with the agency to actively test hypotheses and apply their expert domain knowledge, while simultaneously leveraging the benefits of automation.”
Building SMILY, a Human-Centric, Similar-Image Search Tool for Pathology
July 19, 2019
Narayan Hegde, Software Engineer, Google Health and Carrie J. Cai, Research Scientist, Google Research
With these interactive tools enabling users to tailor each search experience to their desired intent, we are excited for SMILY’s potential to assist with searching large databases of digitized pathology images. One potential application of this technology is to index textbooks of pathology images with descriptive captions, and enable medical students or pathologists in training to search these textbooks using visual search, speeding up the educational process. Another application is for cancer researchers interested in studying the correlation of tumor morphologies with patient outcomes, to accelerate the search for similar cases. Finally, pathologists may be able to leverage tools like SMILY to locate all occurrences of a feature (e.g. signs of active cell division, or mitosis) in the same patient’s tissue sample to better understand the severity of the disease to inform cancer therapy decisions. Importantly, our findings add to the body of evidence that sophisticated machine learning algorithms need to be paired with human-centered design and interactive tooling in order to be most useful.”
Building SMILY, a Human-Centric, Similar-Image Search Tool for Pathology
July 19, 2019
Narayan Hegde, Software Engineer, Google Health and Carrie J. Cai, Research Scientist, Google Research
“Importantly, our findings add to the body of evidence that sophisticated machine learning algorithms need to be paired with human-centered design and interactive tooling in order to be most useful.”
Building SMILY, a Human-Centric, Similar-Image Search Tool for Pathology
July 19, 2019
Narayan Hegde, Software Engineer, Google Health and Carrie J. Cai, Research Scientist, Google Research
“However, no algorithm can perfectly capture an expert's ideal notion of similarity for every case: an image that is algorithmically determined to be similar may not be medically relevant to a doctor's specific diagnostic needs.In this paper, we identified the needs of pathologists when searching for similar images retrieved using a deep learning algorithm , and developed tools that empower users to cope with the search algorithm on-the-fly, communicating what types of similarity are most important at different moments in time. In two evaluations with pathologists, we found that these refinement tools increased the diagnostic utility of images found and increased user trust in the algorithm.”
Human-Centered Tools for Coping with Imperfect Algorithms During Medical Decision Making
Cai CJ et al.
ACM ISBN 978-1-4503-5970-2/ 19/ 05. https://doi.org/10.1145/3290605.3300234
In this paper, we found that refinement tools not only in- creased trust and utility, but were also used for critical decision - making purposes beyond guiding an algorithm. Our work brings to light the dual challenges and opportunities of ML: although black -box ML algorithms can be difficult to un- derstand, off-the-shelf image embeddings from DNNs could enable new, lightweight ways of creating interactive refine- ment and exploration mechanisms. Ultimately, refinement too ls gave doctors the agency to hypothesis-test and apply their domain knowledge, while simultaneously leveraging the benefits of automation. Taken together, this work provides implications for how ML-based systems can augment, rather than replace, expert intelligence during critical decision-making, an area that will likely continue to rise in importance in the coming years.
Human-Centered Tools for Coping with Imperfect Algorithms During Medical Decision Making
Cai CJ et al.
ACM ISBN 978-1-4503-5970-2/ 19/ 05. https://doi.org/10.1145/3290605.3300234
Ultimately, refinement too ls gave doctors the agency to hypothesis-test and apply their domain knowledge, while simultaneously leveraging the benefits of automation. Taken together, this work provides implications for how ML-based systems can augment, rather than replace, expert intelligence during critical decision-making, an area that will likely continue to rise in importance in the coming years.
Human-Centered Tools for Coping with Imperfect Algorithms During Medical Decision Making
Cai CJ et al.
ACM ISBN 978-1-4503-5970-2/ 19/ 05. https://doi.org/10.1145/3290605.3300234
July 2019
“The diagnosis of most cancers is made by a board-certified pathologist based on a tissue biopsy under the microscope. Recent research reveals a high discordance between individual pathologists. For melanoma, the literature reports on 25-26% of discordance for classifying a benign nevus versus malignant melanoma. A recent study indicated the potential of deep learning to lower these discordances. However, the performance of deep learning in classifying histopathologic melanoma images was never compared directly to human experts. The aim of this study is to perform such a first direct comparison.”
Deep learning outperformed 11 pathologists in the classification of histopathological melanoma images
Achim Hekler et al
European Journal of Cancer 118 (2019) 91e96
Findings: The CNN achieved a mean sensitivity/specificity/accuracy of 76%/60%/68% over 11 test runs. In comparison, the 11 pathologists achieved a mean sensitivity/specificity/accuracy of 51.8%/66.5%/59.2%. Thus, the CNN was significantly (p Z 0.016) superior in classifying the cropped images.
Interpretation: With limited image information available, a CNN was able to outperform 11 histopathologists in the classification of histopathological melanoma images and thus shows promise to assist human melanoma diagnoses..”
Deep learning outperformed 11 pathologists in the classification of histopathological melanoma images
Achim Hekler et al
European Journal of Cancer 118 (2019) 91e96
“With limited image information available, a CNN was able to systematically outperform 11 histopathologists in the classification of histopathological melanoma images and thus shows great potential to assist human melanoma diagnoses. Prospective studies that use whole slides for testing are necessary to confirm this preliminary finding.”
Deep learning outperformed 11 pathologists in the classification of histopathological melanoma images
Achim Hekler et al
European Journal of Cancer 118 (2019) 91e96

“It is well known that there is fundamental prognostic data embedded in pathology images. The ability to mine "sub-visual" image features from digital pathology slide images, features that may not be visually discernible by a pathologist, offers the opportunity for better quantitative modeling of disease appearance and hence possibly improved prediction of disease aggressiveness and patient outcome.”

 Image analysis and machine learning in digital pathology: Challenges and opportunities. Madabhushi A, Lee G Med Image Anal. 2016 Oct;33:170-5.
“Image analysis and computer assisted detection and diagnosis tools previously developed in the context of radiographic images are woefully inadequate to deal with the data density in high resolution digitized whole slide images. Additionally there has been recent substantial interest in combining and fusing radiologic imaging and proteomics and genomics based measurements with features extracted from digital pathology images for better prognostic prediction of disease aggressiveness and patient outcome. Again there is a paucity of powerful tools for combining disease specific features that manifest across multiple different length scales.”

 Image analysis and machine learning in digital pathology: Challenges and opportunities. Madabhushi A, Lee G Med Image Anal. 2016 Oct;33:170-5.
“It is clear that molecular changes in gene expression solicit a structural and vascular change in phenotype that is in turn observable on the imaging modality under consideration. For instance, tumor morphology in standard H&E tissue specimens reflects the sum of all molecular pathways in tumor cells. By the same token radiographic imaging modalities such as MRI and CT are ultimately capturing structural and functional attributes reflective of the biological pathways and cellular morphology characterizing the disease. Historically the concept and importance of radiology-pathology fusion has been around and recognized.”

 Image analysis and machine learning in digital pathology: Challenges and opportunities. Madabhushi A, Lee G Med Image Anal. 2016 Oct;33:170-5.

“In the setting of a challenge competition, some deep learning algorithms achieved better diagnostic performance than a panel of 11 pathologists participating in a simulation exercise designed to mimic routine pathology workflow; algorithm performance was comparable with an expert pathologist interpreting whole-slide images without time constraints. Whether this approach has clinical utility will require evaluation in a clinical setting.” 

Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer Bejnordi BE et al.  JAMA. 2017;318(22):2199–2210
Question
What is the discriminative accuracy of deep learning algorithms compared with the diagnoses of pathologists in detecting lymph node metastases in tissue sections of women with breast cancer?

 Finding
In cross-sectional analyses that evaluated 32 algorithms submitted as part of a challenge competition, 7 deep learning algorithms showed greater discrimination than a panel of 11 pathologists in a simulated time-constrained diagnostic setting, with an area under the curve of 0.994 (best algorithm) vs 0.884 (best pathologist). 

Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer Bejnordi BE et al.  JAMA. 2017;318(22):2199–2210
“Radiology, having converted to digital images more than 25 years ago, is well-positioned to deploy AI for diagnostics. Several studies have shown considerable opportunity to sup- port radiologists in evaluating a variety of scan types including mammography for breast lesions, computed tomographic scans for pulmonary nodules and infections, and magnetic resonance images for brain tumors including the molecular classification of brain tumors.” 

Deep Learning Algorithms for Detection of Lymph Node Metastases From Breast Cancer: Helping Artificial Intelligence Be Seen.  Golden JA JAMA. 2017;318(22):2184–2186
“Another challenge to deploying digital pathology was recently addressed. In April 2017, Philips received US Food and Drug Administration clearance for its Philips IntelliSite Pathology Solution to be used for primary pathology diagnostics. This device is used for scanning glass pathology slides and for reviewing these slides on computer monitors. Furthermore, the Philips IntelliSite Pathology Solution has been established as a predicate device that could pave the way for a host of other whole-slide scanners available today to use a 510(k) process for approval rather than a premarket analysis. Many new Food and Drug Administration–approved scanners for primary diagnosis are expected to become avail- able in the coming years.” 

Deep Learning Algorithms for Detection of Lymph Node Metastases From Breast Cancer: Helping Artificial Intelligence Be Seen.  Golden JA JAMA. 2017;318(22):2184–2186
“Even though some reimbursement codes exist for computational analyses, they are not widely used and often are rejected. With national health care reimbursement trends moving to quality and safety metrics for value-based care rather than fee for service, the recognition of AI as part of reimbursement strategies that reward value-based care would provide important incentives to develop and implement validated algorithms.”

 Deep Learning Algorithms for Detection of Lymph Node Metastases From Breast Cancer: Helping Artificial Intelligence Be Seen.  Golden JA JAMA. 2017;318(22):2184–2186
“AI may be just what pathology has been waiting for. While still requiring evaluation within a normal surgical pathology workflow, deep learning has the opportunity to assist pathologists by improving the efficiency of their work, standardizing quality, and providing better prognostication. Like electron microscopy, immunohistochemistry, and molecular diagnostics ahead of AI, there is little risk of pathologists being replaced. Although their workflow is likely to change, the contributions of pathologists to patient care will continue to be critically important.” 

Deep Learning Algorithms for Detection of Lymph Node Metastases From Breast Cancer: Helping Artificial Intelligence Be Seen.  Golden JA JAMA. 2017;318(22):2184–2186

Purpose: To develop a machine learning model that allows high- risk breast lesions (HRLs) diagnosed with image-guided needle biopsy that require surgical excision to be distinguished from HRLs that are at low risk for upgrade to cancer at surgery and thus could be surveilled.  
Conclusion: This study provides proof of concept that a machine learn- ing model can be applied to predict the risk of upgrade of HRLs to cancer. Use of this model could decrease unnecessary surgery by nearly one-third and could help guide clinical decision making with regard to surveillance versus surgical excision of HRLs.

 High-risk Breast lesions: A Machine Learning Model to Predict Pathologic Upgrade and Reduce Unnecessary Surgical Excision  Manisha Bahl et al. Radiology (in press)
“Instead of surgical excision of all HRLs, if HRLs categorized with our model to be at low risk for upgrade to cancer were sur- veilled and the remainder were excised, then 97.4% (37 of 38) of malignancies would be diag- nosed at surgery, and 30.6% (91 of 297) of surgeries of benign lesions could be avoided.”  

High-risk Breast lesions: A Machine Learning Model to Predict Pathologic Upgrade and Reduce Unnecessary Surgical Excision  Manisha Bahl et al. Radiology (in press)
“Machine learning could inform shared decision making by the patient and the provider re- garding surveillance versus sur- gical excision of HRLs and thus could support more targeted, personalized approaches to patient care.”

 High-risk Breast lesions: A Machine Learning Model to Predict Pathologic Upgrade and Reduce Unnecessary Surgical Excision  Manisha Bahl et al. Radiology (in press)
“In conclusion, machine learning can be applied as a risk prediction method to identify patients with biopsy-proven HRLs that have the potential for follow-up rather than surgical excision. Future work includes incorporation of mammographic images and histopathologic slides into the machine learning model.Use of our model based on tra- ditional structural features with an additional feature of biopsy pathologic report text has the potential to decrease unnecessary surgery by nearly one-third in women with HRLs and supports shared decision making regarding surveillance versus surgical excision of HRLs.”  

High-risk Breast lesions: A Machine Learning Model to Predict Pathologic Upgrade and Reduce Unnecessary Surgical Excision  Manisha Bahl et al. Radiology (in press)