Skip to main content

Electronic Health Records


“The Electronic Health Record (EHR) is a longitudinal electronic record of patient health information generated by one or more encounters in any care delivery setting. Included in this information are patient demographics, progress notes, problems, medications, vital signs, past medical history, immunizations, laboratory data and radiology reports. The EHR automates and streamlines the clinician's workflow. The EHR has the ability to generate a complete record of a clinical patient encounter - as well as supporting other care-related activities directly or indirectly via interface - including evidence-based decision support, quality management, and outcomes reporting.
Yamada Y. The electronic health record as a primary source of clinical phenotype for genetic epidemiological studies. Genomic Med 2008;2(1-2):5.


The following section is adapted from Wu et al.[1]. Research with EHR presents several challenges that should be considered:
  • Obtaining data
    • Access to Personally Identifiable Information (PII) is tightly regulated by federal law, industry standard, and institutional policy.
    • Synthetic datasets such as MIMIC-IV have fewer restrictions on acccess.
  • Bias in data
    • Data may be gathered with an original purpose other than research or scientific discovery:
      • Billing
      • Documentation
      • Patient management
      • Preparation of legal documents
      • Use by health care personnel
    • Research-relevant data might not be collected.
    • Patients might be excluded based on difficulty of measurement.
    • There may be High missing rates, possibly correlated to condition.
  • Considerations for assessing data quality
    • Attribute domain constraints (Are any negative pulse oximetry values present?)
    • Relational integrity rules (Could there be multiple primary keys for a patient?)
    • Historical data rules (Is the same format used over time?)
    • State-dependent rules (Do records exists after time of death?)
    • Attribute dependency rules (Is pregnancy in males observed?)
Much of the effort spent in an EHR study involves preparing the data for use.
[1] Ed. Hulin Wu et al. Statistics and machine learning methods for EHR data: from data extraction to data analytics. CRC Press 2021; ISBN 978-0-367-44239-2

Key Readings

  • Secondary Analysis of Electronic Health Records [Internet]. Cham (CH): Springer; 2016. Available from: doi: 10.1007/978-3-319-43742-2