From Data to Diagnosis: Systematic Detection of Rare Disease Patients in EHRs for Outcome Evaluation

Rare diseases collectively represent an extensive global health burden, affecting roughly 25-30 million individuals in the United States, and over 300 million worldwide (3.5% to 5.9% of the global population). Despite the identification of over 10,000 rare diseases, only about 5% of these have approved treatments in large part because of the high financial and technical challenges of developing drugs in small patient groups. This challenge highlights the need for approaches that reflect RDs collectively rather than individually, which enables prospects for shared interventions like basket trials and drug repurposing.

Electronic health records (EHRs) offer a valuable resource for investigating RDs at scale; however, recognizing RD patients remains challenging due to heterogeneous coding standards and the lack of comprehensive RD-specific code lists. To address this gap, the study has created a semi-automated phenotyping algorithm to detect patients with rare diseases in EHRs through standardized codes (ICD-10-CM and SNOMED-CT). The strategy-combined resources include MONDO, ORPHANET, and GARD and use different filtering steps to obtain a cleaned set of rare disease-specific codes out of a primary list of 12,003 conditions.

The resulting dataset included SNOMED-CT codes and 357 ICD-10 codes, demonstrating 6,342 unique rare diseases. Tests against a manually curated subset demonstrated good performance, with 88.4% of the identified codes being true positive rare diseases. This method improved coverage and specificity while reducing the need for extensive manual curation. Moreover, these diseases were labeled using ORPHANET linearization to assist grouped analysis based on shared etiologies or affected systems.

To demonstrate its utility, the algorithm was applied to the N3C database (>21 million patients) that identified rare disease cases prior to COVID-19. Patients with preexisting RDs have higher hospitalization rates (10.72% vs. 4.90%) and mortality rates (5.45% vs. 1.96%) than non-RD patients along with significant demographic differences.

Further stratification by RD categories revealed that certain groups, such as rare neoplastic and respiratory diseases, were associated with the highest mortality rates, while endocrine diseases showed the highest hospitalization rates. Logistic regression analyses, adjusted for age and body mass index, confirmed that most RD categories were associated with significantly increased risks of severe COVID-19 outcomes. Notably, rare cardiac, respiratory, and otorhinolaryngologic diseases demonstrated the highest odds of mortality and hospitalization.

The study has certain limitations despite its strengths that include misclassified cases due to reliance on standardized codes, potential exclusion of undiagnosed cases, limited representativeness of the validation dataset, possible bias from data heterogeneity and OMOP model constraints, and lack of key variables like vaccination status.

In conclusion, the study introduces a scalable method to identify rare disease patients in EHRs using standardized codes that enables improved clinical insights and large-scale analysis. The results, especially during COVID-19, indicate that patients with rare diseases are more susceptible to severe outcomes and require targeted clinical attention and interventions.

Reference: Yadaw AS, Sid E, Sidky H, et al. Systematic identification of rare disease patients in electronic health records enables evaluation of clinical outcomes. Sci Rep. 2026. doi:10.1038/s41598-026-43020-x

Latest Posts

Free CME credits

Both our subscription plans include Free CME/CPD AMA PRA Category 1 credits.

Digital Certificate PDF

On course completion, you will receive a full-sized presentation quality digital certificate.

medtigo Simulation

A dynamic medical simulation platform designed to train healthcare professionals and students to effectively run code situations through an immersive hands-on experience in a live, interactive 3D environment.

medtigo Points

medtigo points is our unique point redemption system created to award users for interacting on our site. These points can be redeemed for special discounts on the medtigo marketplace as well as towards the membership cost itself.
 
  • Registration with medtigo = 10 points
  • 1 visit to medtigo’s website = 1 point
  • Interacting with medtigo posts (through comments/clinical cases etc.) = 5 points
  • Attempting a game = 1 point
  • Community Forum post/reply = 5 points

    *Redemption of points can occur only through the medtigo marketplace, courses, or simulation system. Money will not be credited to your bank account. 10 points = $1.

All Your Certificates in One Place

When you have your licenses, certificates and CMEs in one place, it's easier to track your career growth. You can easily share these with hospitals as well, using your medtigo app.

Our Certificate Courses