Speakers: Sam Windham, MD, MS
Sam Windham, MD, MS
Assistant Professor of Medicine
Infectious Diseases and Pulmonary/Critical Care Medicine
University of Kansas Medical Center
Scaling Clinical Phenotype Discovery from Single-Site to Federated EHR Networks
Critical illness syndromes contain biologically distinct subgroups, but discovery efforts have excluded immunocompromised patients. We applied Gaussian mixture modeling to hemodynamic and laboratory data from neutropenic septic shock patients in MIMIC-IV (N=373) and identified two phenotypes with a 30 percentage-point mortality separation. External validation at Washington University (N=365) confirmed this structure with consistent mortality differences. Three lines of evidence suggest distinct pathophysiology rather than a simple severity gradient: differential catecholamine responsiveness, similar neutrophil counts between phenotypes, and severity-independent prognostic value. Moving from single-site discovery to population-scale analysis presents unique data engineering and methodological challenges. We describe our experience scaling this phenotyping pipeline to Epic COSMOS, a federated EHR research network spanning over 1,800 hospitals and 300 million patients. We discuss practical considerations including data structure, variable harmonization, missingness patterns, and computational approaches for applying unsupervised learning methods at this scale. This work illustrates how routinely collected clinical data can support phenotype discovery and validation across diverse care settings, and highlights both the opportunities and the challenges of federated EHR networks for clinical research.
