Inhyuk Park, Sungeun Kim, and Jongbin Ryu
This paper introduces the generative self-supervised learning method in medical image recognition. We use the generative models in two main ways: 1) creating diversified training data and 2) learning domain-aligned pretext knowledge for self-supervised learning. In general, gathering real-world medical data can be quite difficult, so we generate synthetic training data using the diffusion model with elaborated prompts. We also propose a domain-aligned generative approach for our self-supervised learning algorithm. Our approach learns the robust visual representation from the masked autoencoder model with adaptive instance normalization. It minimizes the domain gap between our synthetic training data and real-world data when training the masked autoencoder model. In this self-supervised learning process, we rely solely on generative data, allowing our approach to achieve state-of-the-art performance without utilizing any real-world medical data. We demonstrate that our approach surpasses the previous best results by significant margins of CheXpert, COVIDx, and ChestX-ray14 datasets. These results highlight the potential of generated data in medical image recognition, a field that has historically faced data scarcity. We open-source our implementation of the generative self-supervised learning method at: https://github.com/inhyukpark2/gen-ssl.