# Best Practices

<mark style="color:$primary;">"</mark>*<mark style="color:$primary;">Clinicians and data scientists must apply the same level of academic rigor when analyzing research from clinical databases as they do with more traditional methods of clinical research</mark>*<mark style="color:$primary;">."</mark> \[[1](#references)]&#x20;

Conducting research with data from the Electronic Health Record (EHR) requires a structured process and a team science approach. The process should include protocols and standards for requesting or extracting the data, assessing the quality of the data, cleaning, standardizing, and analyzing the data, and maintaining security to ensure the confidentiality of the data. A multi-disciplinary team capable of overseeing and performing each specialized step of the process is essential.

* Research questions, cohorts, and methodologies must be clearly defined using standard clinical definitions and terminologies. Attention must be paid to the timing of exposure and outcome events.
* Understand the limitations of EHR data. Data may be incomplete, and the quality of the data can vary across sources. Use robust data management strategies to ensure "clean" data and properly handle missing values.
* Observational studies are susceptible to confounding due to non-random assignment of treatments or exposures. Adjust for confounding factors and be aware of bias that may be inherent in the data.
* Practice Open Science! Ensure transparency in the study design, data processing, and analysis to promote reproducibility. For NIH-funded studies, comply with the [2023 NIH Data Management and Sharing Policy](https://oir.nih.gov/sourcebook/intramural-program-oversight/intramural-data-sharing/2023-nih-data-management-sharing-policy).
* Practice Team Science! Collaborate with clinicians, biomedical informaticians, biostatisticians, and data scientists to ensure the study is both methodologically sound and clinically meaningful.&#x20;

<mark style="color:$primary;">"</mark>*<mark style="color:$primary;">There may come a time when data can be aggregated automatically from multiple EHR environments to answer a particular question without relying on a human to understand the particular idiosyncrasies of each institution’s data and EHR system. Until that day, effective EHR data set analysis requires collaboration with clinicians and scientists who have knowledge of the diseases being studied and the practices of their particular health care systems; informaticians with experience in the underlying structures of biomedical record repositories at their own institutions and the characteristics of their data; data harmonization experts to help with data transformation, standardization, integration, and computability; statisticians and epidemiologists well versed in the limitations and opportunities of EHR data sets and related sources of potential bias; machine learning experts; and at least one expert in regulatory and ethical standards</mark>*<mark style="color:$primary;">."</mark> \[[2](#references)]

## References

1. Lokhandwala S, Rush B. [**Objectives of the Secondary Analysis of Electronic Health Record Data**](https://www.ncbi.nlm.nih.gov/books/NBK543655/). 2016 Sep 10. In: Secondary Analysis of Electronic Health Records \[Internet]. Cham (CH): Springer; 2016. Chapter 1.&#x20;
2. Kohane IS, Aronow BJ, Avillach P, Beaulieu-Jones BK, Bellazzi R, Bradford RL, Brat GA, Cannataro M, Cimino JJ, García-Barrio N, Gehlenborg N, Ghassemi M, Gutiérrez-Sacristán A, Hanauer DA, Holmes JH, Hong C, Klann JG, Loh NHW, Luo Y, Mandl KD, Daniar M, Moore JH, Murphy SN, Neuraz A, Ngiam KY, Omenn GS, Palmer N, Patel LP, Pedrera-Jiménez M, Sliz P, South AM, Tan ALM, Taylor DM, Taylor BW, Torti C, Vallejos AK, Wagholikar KB; Consortium For Clinical Characterization Of COVID-19 By EHR (4CE); Weber GM, Cai T. [**What Every Reader Should Know About Studies Using Electronic Health Record Data but May Be Afraid to Ask**](https://pubmed.ncbi.nlm.nih.gov/33600347/). J Med Internet Res. 2021 Mar 2;23(3):e22219. doi: 10.2196/22219. PMID: [33600347](https://pubmed.ncbi.nlm.nih.gov/33600347/); PMCID: PMC7927948.

## Resources

### Books

* [**Secondary Analysis of Electronic Health Records**](https://www.ncbi.nlm.nih.gov/books/NBK543630/) \[Internet]. Cham (CH): Springer; 2016.  doi: 10.1007/978-3-319-43742-2

### Articles

* Agniel D, Kohane IS, Weber GM. [**Biases in electronic health record data due to processes within the healthcare system: retrospective observational study**](https://pubmed.ncbi.nlm.nih.gov/29712648/). BMJ. 2018 Apr 30;361:k1479. doi: 10.1136/bmj.k1479. Erratum in: BMJ. 2018 Oct 18;363:k4416. doi: 10.1136/bmj.k4416. PMID: 29712648; PMCID: PMC5925441.
* Callahan A, Shah NH, Chen JH. [**Research and Reporting Considerations for Observational Studies Using Electronic Health Record Data**](https://pubmed.ncbi.nlm.nih.gov/32479175/). Ann Intern Med. 2020 Jun 2;172(11 Suppl):S79-S84. doi: 10.7326/M19-0873. PMID: 32479175; PMCID: PMC7413106.
* Shang N, Weng C, Hripcsak G. [**A conceptual framework for evaluating data suitability for observational studies**](https://pubmed.ncbi.nlm.nih.gov/29024976/). J Am Med Inform Assoc. 2018 Mar 1;25(3):248-258. doi: 10.1093/jamia/ocx095. PMID: 29024976; PMCID: PMC7378879.

### Links

* [**NIH Data management and Sharing Policy**](https://sharing.nih.gov/data-management-and-sharing-policy)
* [**The Researcher's Guide to N3C: Best Practices for the Research Life Cycle**](https://national-covid-cohort-collaborative.github.io/guide-to-n3c-v1/chapters/practices.html)
* [**The Book of OHDSI, Chapter 19: Study Steps**](https://ohdsi.github.io/TheBookOfOhdsi/StudySteps.html)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.bcbi.brown.edu/codiac-for-health/observational-research/best-practices.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
