Health Data

"The capability of handling big data is becoming an enabler to carry out unprecedented research studies and to implement new models of healthcare delivery." [1]

The term "big data" has been used since the early 1990s [2]. Big data are characterized by the "3 Vs": Volume (size), Velocity (speed of generation), and Variety (different types) [1]. This has expanded to additional Vs (5 Vs, 10 Vs, 14 Vs, etc.) such as: Veracity, Value, Validity, Variability, and Vocabulary.

There are many sources of big data in biomedicine and health care [3]. These include Electronic Health Records (EHR) [4], Health Information Exchanges (HIE) [5], All-Payer Claims Databases (APCD) [6], biological and biomedical databases [7], and public health surveys [8].

Health data can be broadly categorized as "structured" (e.g., demographics, diagnoses, procedures, and medications) or "unstructured" (e.g., clinical reports and notes) [9]. Use of established Health Data Standards is critical for sharing and exchange of health data within and across organizations to support Artificial Intelligence in Health and Observational Health Research.

See CODIAC for Health chapter on Health Data and Data Standards (forthcoming) for more information.

References

  1. Bellazzi R. Big data and biomedical informatics: a challenging opportunity. Yearb Med Inform. 2014 May 22;9(1):8-13. doi: 10.15265/IY-2014-0024. PMID: 24853034; PMCID: PMC4287065.

  2. Lohr S. The Origins of ‘Big Data': An Etymological Detective Story. The New York Times. 2013 Feb 1. [ Link ]

  3. Healthcare Big Data and the Promise of Value-Based Care. Catalyst Carryover. 2018 Jan 1. [ Link ]

  4. Ehrenstein V, Kharrazi H, Lehmann H, et al. Obtaining Data From Electronic Health Records. In: Gliklich RE, Leavy MB, Dreyer NA, editors. Tools and Technologies for Registry Interoperability, Registries for Evaluating Patient Outcomes: A User’s Guide, 3rd Edition, Addendum 2 [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2019 Oct. Available from: https://www.ncbi.nlm.nih.gov/books/NBK551878/

  5. Sarkar IN. Health Information Exchange as a Global Utility. Chest. 2023 May;163(5):1023-1025. doi: 10.1016/j.chest.2022.12.001. PMID: 37164575.

  6. Love D, Custer W, Miller P. All-payer claims databases: state initiatives to improve health care transparency. Issue Brief (Commonw Fund). 2010 Sep;99:1-14. PMID: 20830868.

  7. Sayers EW, Beck J, Bolton EE, et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2024 Jan 5;52(D1):D33-D43. doi: 10.1093/nar/gkad1044. PMID: 37994677; PMCID: PMC10767890.

  8. Blewett LA, Call KT, Turner J, Hest R. Data Resources for Conducting Health Services and Policy Research. Annu Rev Public Health. 2018 Apr 1;39:437-452. doi: 10.1146/annurev-publhealth-040617-013544. Epub 2017 Dec 22. PMID: 29272166; PMCID: PMC5880724.

  9. Weber GM, Mandl KD, Kohane IS. Finding the missing link for big biomedical data. JAMA. 2014 Jun 25;311(24):2479-80. doi: 10.1001/jama.2014.4228. PMID: 24854141.

Resources

Books/Chapters

Articles

  • Sarkar IN. Transforming Health Data to Actionable Information: Recent Progress and Future Opportunities in Health Information Exchange. Yearb Med Inform. 2022 Aug;31(1):203-214. doi: 10.1055/s-0042-1742519. Epub 2022 Dec 4. PMID: 36463879; PMCID: PMC9719753.

  • Sarkar IN. Health Information Exchange as a Global Utility. Chest. 2023 May;163(5):1023-1025. doi: 10.1016/j.chest.2022.12.001. PMID: 37164575.

Last updated