Health Data
Last updated
Last updated
"The capability of handling big data is becoming an enabler to carry out unprecedented research studies and to implement new models of healthcare delivery." []
The term "big data" has been used since the early 1990s []. Big data are characterized by the "3 Vs": Volume (size), Velocity (speed of generation), and Variety (different types) []. This has expanded to additional Vs (5 Vs, 10 Vs, 14 Vs, etc.) such as: Veracity, Value, Validity, Variability, and Vocabulary.
There are many sources of big data in biomedicine and health care []. These include Electronic Health Records (EHR) [], Health Information Exchanges (HIE) [], All-Payer Claims Databases (APCD) [], biological and biomedical databases [], and public health surveys [].
Health data can be broadly categorized as "structured" (e.g., demographics, diagnoses, procedures, and medications) or "unstructured" (e.g., clinical reports and notes) []. Use of established is critical for sharing and exchange of health data within and across organizations to support and .
Bellazzi R. Big data and biomedical informatics: a challenging opportunity. Yearb Med Inform. 2014 May 22;9(1):8-13. doi: 10.15265/IY-2014-0024. PMID: 24853034; PMCID: .
Lohr S. The Origins of ‘Big Data': An Etymological Detective Story. The New York Times. 2013 Feb 1. [ ]
Healthcare Big Data and the Promise of Value-Based Care. Catalyst Carryover. 2018 Jan 1. [ ]
Ehrenstein V, Kharrazi H, Lehmann H, et al. Obtaining Data From Electronic Health Records. In: Gliklich RE, Leavy MB, Dreyer NA, editors. Tools and Technologies for Registry Interoperability, Registries for Evaluating Patient Outcomes: A User’s Guide, 3rd Edition, Addendum 2 [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2019 Oct. Available from:
Sarkar IN. Health Information Exchange as a Global Utility. Chest. 2023 May;163(5):1023-1025. doi: 10.1016/j.chest.2022.12.001. PMID: .
Love D, Custer W, Miller P. All-payer claims databases: state initiatives to improve health care transparency. Issue Brief (Commonw Fund). 2010 Sep;99:1-14. PMID: .
Sayers EW, Beck J, Bolton EE, et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2024 Jan 5;52(D1):D33-D43. doi: 10.1093/nar/gkad1044. PMID: ; PMCID: PMC10767890.
Blewett LA, Call KT, Turner J, Hest R. Data Resources for Conducting Health Services and Policy Research. Annu Rev Public Health. 2018 Apr 1;39:437-452. doi: 10.1146/annurev-publhealth-040617-013544. Epub 2017 Dec 22. PMID: ; PMCID: PMC5880724.
Weber GM, Mandl KD, Kohane IS. Finding the missing link for big biomedical data. JAMA. 2014 Jun 25;311(24):2479-80. doi: 10.1001/jama.2014.4228. PMID: .
NIH Pragmatic Trials Collaboratory Rethinking Clinical Trials
Ehrenstein V, Kharrazi H, Lehmann H, et al. Obtaining Data From Electronic Health Records. In: Gliklich RE, Leavy MB, Dreyer NA, editors. Tools and Technologies for Registry Interoperability, Registries for Evaluating Patient Outcomes: A User’s Guide, 3rd Edition, Addendum 2 [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2019 Oct. Available from:
Secondary Analysis of Electronic Health Records [Internet]. Cham (CH): Springer; 2016. Available from: doi: 10.1007/978-3-319-43742-2.
Sarkar IN. Transforming Health Data to Actionable Information: Recent Progress and Future Opportunities in Health Information Exchange. Yearb Med Inform. 2022 Aug;31(1):203-214. doi: 10.1055/s-0042-1742519. Epub 2022 Dec 4. PMID: ; PMCID: PMC9719753.
Sarkar IN. Health Information Exchange as a Global Utility. Chest. 2023 May;163(5):1023-1025. doi: 10.1016/j.chest.2022.12.001. PMID: .