• MEDS - An Emerging Data Standard and Ecosystem for Health AI Research

     Matthew B. A. McDermott, Ph.D., Ethan Steinberg, Ph.D., Jason A. Fries, Ph.D., Robin P. van de Water, M.S., Chao Pang, Ph.D., Patrick Rockenschaub, Ph.D., Pawel Renc, Ph.D., Jungwoo Oh, Ph.D., Kamilė Stankevičiūtė, M.Sc., M.A., Justin Xu, Tom J. Pollard, Ph.D., Nassim Oufattole, S.M., Michael Wornow, Ph.D., Teya S. Bergamaschi, M.S.E., Hyewon Jeong, M.D., M.S., Simon A. Lee, M.S., Vincent Jeanselme, Ph.D., Kiril V. Klein, Ph.D., Mikkel Odgaard, Ph.D., Maria E. Montgomery, M.S., Arkadiusz Sitek, Ph.D., Mads Nielsen, Ph.D., Jeffrey N. Chiang, Ph.D., Noa Dagan, M.D., Ph.D., M.P.H., Isaac Kohane, M.D., Ph.D., Shalmali Joshi, Ph.D., Edward Choi, Ph.D., and Nigam H. Shah, Ph.D., M.B.B.S. 

    Abstract

    While data standards have been well adopted and highly impactful for observational health informatics, the emerging application of artificial intelligence (AI) to electronic health record (EHR) data � known broadly as health AI � still lacks broadly adopted data standards. This gap limits reproducibility and the development of open-source ecosystems for health AI, as well as limiting the ability to collaboratively develop and characterize emerging capabilities such as the use of foundation models. The Medical Event Data Standard (MEDS) is an open-source data standard and ecosystem designed and promulgated by us (and others) to address this gap. MEDS was first presented at a workshop in May 2024 and has since been described in online materials and preprints. Here, we describe MEDS in detail and review its use in the community, highlighting its strengths and weaknesses, and the extent of its adoption to date. 

    We designed MEDS to emphasize simplicity, algorithm transportability, and support for workflows used in training foundation models, and to offer complementary strengths compared with existing health data standards. As of March 2026, we find usage of MEDS across 21 institutions, in at least 27 academic papers and preprints, and to support work with 17 datasets and 12 AI algorithms. Tools in the MEDS ecosystem have reported computational speedups that range from 1.9 to around 40,000 times faster than prior tools or common individual workflows. In novel case study comparisons, we find that codebases leveraging MEDS show reductions in necessary lines of functional Python code by up to 70%. Further, the MEDS standard has supported the development of key frontier models for EHR data. (Supported by Canadian Institutes of Health Research and others.)