Health Data

"The capability of handling big data is becoming an enabler to carry out unprecedented research studies and to implement new models of healthcare delivery." [1]

The term "big data" has been used since the early 1990s [2]. Big data are characterized by the "3 Vs": Volume (size), Velocity (speed of generation), and Variety (different types) [1]. This has expanded to additional Vs (5 Vs, 10 Vs, 14 Vs, etc.) such as: Veracity, Value, Validity, Variability, and Vocabulary.

There are many sources of big data in biomedicine and health care [3]. These include Electronic Health Records (EHR) [4], Health Information Exchanges (HIE) [5], All-Payer Claims Databases (APCD) [6], biological and biomedical databases [7], and public health surveys [8].

Health data can be broadly categorized as "structured" (e.g., demographics, diagnoses, procedures, and medications) or "unstructured" (e.g., clinical reports and notes) [9]. Use of established Health Data Standardsarrow-up-right is critical for sharing and exchange of health data within and across organizations to support Artificial Intelligence in Healtharrow-up-right and Observational Health Researcharrow-up-right.

circle-info

See CODIAC for Health chapter on Health Data and Data Standards (forthcoming) for more information.

References

  1. Bellazzi R. Big data and biomedical informatics: a challenging opportunity. Yearb Med Inform. 2014 May 22;9(1):8-13. doi: 10.15265/IY-2014-0024. PMID: 24853034; PMCID: PMC4287065arrow-up-right.

  2. Lohr S. The Origins of ‘Big Data': An Etymological Detective Story. The New York Times. 2013 Feb 1. [ Linkarrow-up-right ]

  3. Healthcare Big Data and the Promise of Value-Based Care. Catalyst Carryover. 2018 Jan 1. [ Linkarrow-up-right ]

  4. Ehrenstein V, Kharrazi H, Lehmann H, et al. Obtaining Data From Electronic Health Records. In: Gliklich RE, Leavy MB, Dreyer NA, editors. Tools and Technologies for Registry Interoperability, Registries for Evaluating Patient Outcomes: A User’s Guide, 3rd Edition, Addendum 2 [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2019 Oct. Available from: https://www.ncbi.nlm.nih.gov/books/NBK551878/arrow-up-right

  5. Sarkar IN. Health Information Exchange as a Global Utility. Chest. 2023 May;163(5):1023-1025. doi: 10.1016/j.chest.2022.12.001. PMID: 37164575arrow-up-right.

  6. Love D, Custer W, Miller P. All-payer claims databases: state initiatives to improve health care transparency. Issue Brief (Commonw Fund). 2010 Sep;99:1-14. PMID: 20830868arrow-up-right.

  7. Sayers EW, Beck J, Bolton EE, et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2024 Jan 5;52(D1):D33-D43. doi: 10.1093/nar/gkad1044. PMID: 37994677arrow-up-right; PMCID: PMC10767890.

  8. Blewett LA, Call KT, Turner J, Hest R. Data Resources for Conducting Health Services and Policy Research. Annu Rev Public Health. 2018 Apr 1;39:437-452. doi: 10.1146/annurev-publhealth-040617-013544. Epub 2017 Dec 22. PMID: 29272166arrow-up-right; PMCID: PMC5880724.

  9. Weber GM, Mandl KD, Kohane IS. Finding the missing link for big biomedical data. JAMA. 2014 Jun 25;311(24):2479-80. doi: 10.1001/jama.2014.4228. PMID: 24854141arrow-up-right.

Resources

Books/Chapters

Articles

  • Sarkar IN. Transforming Health Data to Actionable Information: Recent Progress and Future Opportunities in Health Information Exchange. Yearb Med Inform. 2022 Aug;31(1):203-214. doi: 10.1055/s-0042-1742519. Epub 2022 Dec 4. PMID: 36463879arrow-up-right; PMCID: PMC9719753.

  • Sarkar IN. Health Information Exchange as a Global Utility. Chest. 2023 May;163(5):1023-1025. doi: 10.1016/j.chest.2022.12.001. PMID: 37164575arrow-up-right.

Last updated