Research Pipeline
Below is an example pipeline or process for conducting research with Health Data such as EHR data. The list is by no means exhaustive. However, it is a good place to start. A page could be written on each of the steps in the pipeline - and most likely will be in future releases of CODIAC for Health.
Conduct a literature review.
Explicitly describe the research question.
Form an interdisciplinary team that can guide and perform each step of the study.
Fully specify the research protocol in advance of executing the study.
Apply for IRB approval of the study.
Apply for an Institutional Reliance Agreement, if necessary.
Execute a Data (Transfer and) Use Agreement (DUA, DTUA), as required
Comply with any application and approval procedures set forth by the data provider.
Request access to / Set up computing infrastructure, as necessary.
Assess the suitability (strengths and weaknesses) of the dataset(s) to be used in the study.
Assess the quality of the dataset(s).
Define the study cohort (and matching cases, if applicable).
Create standard code sets for each clinical concept in the cohort definition and every independent and dependent variable.
Compose a computable data request / data extraction specification.
Clean and stage extracted data for analysis; handle missing values according to protocol.
Characterize the study cohort (and matching cases, if applicable).
Adjust for any bias or confounders in the data.
Analyze the data according to protocol.
Produce research products.
Comply with any review procedures required by the data provider.
Publish your work!
Resources
Books
Secondary Analysis of Electronic Health Records [Internet]. Cham (CH): Springer; 2016. Available from: https://www.ncbi.nlm.nih.gov/books/NBK543630/ doi: 10.1007/978-3-319-43742-2
Ed. Hulin Wu et al. Statistics and machine learning methods for EHR data: from data extraction to data analytics. CRC Press 2021; ISBN 978-0-367-44239-2
O’Neil ST, Beasley W, Loomba J, Patrick S, Wilkins KJ, Crowley KM., Anzalone, AJ (Eds.) (2023). The Researcher’s Guide to N3C: A National Resource for Analyzing Real-World Health Data. DOI: 10.5281/zenodo.7749367
Articles
Blewett LA, Call KT, Turner J, Hest R. Data Resources for Conducting Health Services and Policy Research. Annu Rev Public Health. 2018 Apr 1;39:437-452. doi: 10.1146/annurev-publhealth-040617-013544. Epub 2017 Dec 22. PMID: 29272166; PMCID: PMC5880724.
Sayers EW, Beck J, Bolton EE, et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2021 Jan 8;49(D1):D10-D17. doi: 10.1093/nar/gkaa892. PMID: 33095870; PMCID: PMC7778943.
Shang N, Weng C, Hripcsak G. A conceptual framework for evaluating data suitability for observational studies. J Am Med Inform Assoc. 2018 Mar 1;25(3):248-258. doi: 10.1093/jamia/ocx095. PMID: 29024976; PMCID: PMC7378879.
Weber GM, Mandl KD, Kohane IS. Finding the Missing Link for Big Biomedical Data. JAMA. 2014;311(24):2479–2480. doi:10.1001/jama.2014.4228
Last updated