Loading...
Loading...
Loading...
Loading...
Loading...
The Unified Research data Sharing and Access (URSA) Initiative was launched in 2015 with the overall goal of making electronic health record (EHR) and other health data accessible and usable for research purposes across Rhode Island. This initiative is supported by Advance RI-CTR and coordinated by the Advance RI-CTR Biomedical Informatics, Bioinformatics, and Cyberinfrastructure Enhancement (BIBCE) Core with leadership and expertise provided by the Brown Center for Biomedical Informatics (BCBI).
The Advance RI-CTR BIBCE Core collaborates with Brown University's Office of Information Technology and the Division of Research, including Research Integrity and Research Agreements & Contracting, and health data partners across Rhode Island to:
Coordinate processes for health data sharing and secure access within and across institutions;
Develop the requisite legal, ethical, and technical infrastructure between Brown and health data sharing partners;
Establish cross-institutional governance, including standard policies, procedures, and protocols for appropriate sharing and use of health data; and,
Provide documentation and training for health data requests, access, and use.
Through the URSA Initiative, the BIBCE Core provides expertise and infrastructure for conducting research using large-scale health datasets. The BIBCE Core is available to help researchers navigate the process of identifying options and solutions for storing, managing, and analyzing data from health data partners or other data sources.
Complete Advance RI-CTR's Service Request Form to schedule a consultation with the BIBCE Core.
Visit other chapters in CODIAC for Health using the Table of Contents or menu in the upper left corner.
The Brown Center for Biomedical Informatics (BCBI) and the Advance RI-CTR Biomedical Informatics, Bioinformatics and Cyberinfrastructure Enhancement (BIBCE) Core serve as liaisons between researchers and health data partners to create and submit accurate and complete data requests as well as oversee the governance of health data transferred to and accessed at Brown University. Each health data partner has unique administrative processes for requesting their data as well as guidelines for working with the data and sharing results.
Please submit an Advance RI-CTR Service Request Form to start the process.
Aggregate Statistics – basic counts and descriptive statistics to characterize a population of interest (e.g., number of patients or encounters meeting specified criteria).
De-Identified Dataset – as defined by the HIPAA Privacy Rule, excludes the 18 protected health information (PHI) elements that could be used to identify individuals or the individual's relatives, employers, or household members.
Limited Dataset – as defined by the HIPAA Privacy Rule, includes a limited set of identifiable information that excludes 16 direct identifiers but may include city, state, zip code, elements of date, and other numbers, characteristics, or codes not listed as direct identifiers.
Identified Dataset – in some cases, direct identifiers may be needed (e.g., street addresses for geocoding or demographics for linking datasets) that may be subsequently removed to create a limited or de-identified dataset.
The URSA Data Request Form (UDRF) was designed by the Advance RI-CTR BIBCE Core and BCBI for requests made to Rhode Island health data partners. The UDRF is used to structure the cohort definition and dataset extraction fields in a computable manner. It can also be used to document a research study extraction specification or communicate extraction details to research team programmers. Each section should be reviewed and approved by the study PI.
Contact the Advance RI-CTR BIBCE Core via the Advance RI-CTR Service Request Form to initiate a new data request. The BIBCE Core will provide a copy of the URSA Data Request Form as well as additional data request process details.
This page describes various synthetic and de-identified health datasets available to researchers.
The SyntheticRI datasets were generated by the Brown Center for Biomedical Informatics (BCBI) for use in research and education. These datasets contain realistic but fictional residents of the state of Rhode Island. The synthetic population aims to statistically mirror the real population in terms of demographics, disease burden, vaccinations, medical visits, and social determinants.
SyntheticRI Demo: synthetic data representing 1,188 Rhode Island individuals of all ages
SyntheticRI Adult: synthetic data representing 145,010 Rhode Island adults, ages 19-99
SyntheticRI Peds: synthetic data representing 145,010 Rhode Island children, ages 0-18
The SyntheticRI datasets were generated using , an open-source, synthetic patient generator. The Synthea-generated datasets are in .csv file format.
Each dataset was also transformed to the OHDSI OMOP Common Data Model (CDM) using the Observational Health Data Sciences and Informatics (OHDSI) Consortium's program . These datasets can be accessed though direct database queries or with such as ATLAS and HADES.
For more information, please email ursa-help@brown.edu.
MIMIC-IV (Medical Information Mart for Intensive Care) is a large, freely-available relational database comprising deidentified health-related data from real patients who were admitted to the critical care units of the Beth Israel Deaconess Medical Center in Boston, Massachusetts, USA.
MIMIC-IV contains comprehensive information from 2008-2019 for over 60,000 hospitalized patients. The database is intended to support a wide variety of research in healthcare. MIMIC-IV builds upon the success of MIMIC-III, and incorporates numerous improvements over MIMIC-III.
Refer to for more information about MIMIC-IV. Researchers interested in accessing the complete MIMIC-IV dataset should follow .
HCUP offers the following databases:
National (Nationwide) Inpatient Sample (NIS): largest publicly available all-payer hospital inpatient care database in the United States
Kids' Inpatient Database (KID): hospital inpatient stays for children and is specifically designed to allow researchers to study a broad range of conditions and procedures related to children's health
Nationwide Emergency Department Sample (NEDS): emergency department (ED) visits that do not result in an admission as well as ED visits that result in an admission to the same hospital
Nationwide Readmissions Database: designed to support various types of analyses of national readmission rates for all payers and uninsured individuals
State Inpatient Databases (SID): inpatient discharge abstracts from participating States, translated into a uniform format to facilitate multi-State comparisons and analyses
State Ambulatory Surgery and Services Databases (SASD): encounter-level data for ambulatory surgery and other outpatient services from hospital-owned facilities
State Emergency Department Databases (SEDD): discharge information on all emergency department visits that do not result in an admission
SEER offers the following datasets:
SEER Research Data
Register with any valid email
Excludes geography, month and year of diagnosis, and other demographic fields
SEER Research Plus and NCCR Data
Requires user authentication through eRA Commons or an HHS account.
Includes geography, month, and year of diagnosis, other demographic fields
Complete SytheticMass data sets: "SyntheticMass Data Version 2 (24 May, 2017)". This ZIP file is quite large (21GB), so make sure you move the file to a location with enough storage before attempting to unzip.
Sample data sets (<100MB) containing 100 or 1,000 patient records
Specialized data sets that have been generated using Synthea by other study teams. These include COVID-19 data sets, a Childhood Obesity data set and more.
The following versions are available for each data set.
C-CDA (xml files)
FHIR (json files)
The (HCUP) is a family of databases, software tools, and related products developed through a Federal-State-Industry partnership and sponsored by the Agency for Healthcare Research and Quality (AHRQ). HCUP databases are derived from administrative data and contain encounter-level, clinical, and nonclinical information including all-listed diagnoses and procedures, discharge status, patient demographics, and charges for all patients, regardless of payer, beginning in 1988.
Please note: Access to HCUP databases is not free. Database releases must be purchased through the .
Visit for a more information about the databases. Learn more about HCUP on the .
The Surveillance, Epidemiology, and End Results (SEER) Program provides information on cancer statistics in an effort to reduce the cancer burden among the U.S. population. SEER collects cancer incidence data from population-based cancer registries covering approximately 47.9 percent of the U.S. population. The SEER datasets include data on patient demographics, primary tumor site, tumor morphology, stage at diagnosis, and first course of treatment.
Includes excluding geography
Visit for a deeper comparison of the datasets. Learn more about SEER and SEER Datasets on the .
SyntheticMass is a Synthea-generated data set that contains realistic but fictional residents of the state of Massachusetts. The synthetic population aims to statistically mirror the state population in terms of demographics, disease burden, vaccinations, medical visits, and social determinants. Refer to the for more information.
There are several data sets available on the .
CSV ( describing all CSV tables)
MIT Laboratory for Computational Physiology. (n.d.). About MIMIC. Retrieved May 1, 2024, from
Agency for Healthcare Research and Quality. (n.d.). Healthcare Cost and Utilization Project (HCUP). Retrieved May 1, 2024, from
National Cancer Institute. (n.d.). SEER Data. Retrieved May 1, 2024, from
MITRE Corporation. (n.d.). About Synthea. Retrieved May 1, 2024, from
Johnson, A.E.W., Bulgarelli, L., Shen, L. et al. MIMIC-IV, a freely accessible electronic health record dataset. Sci Data 10, 1 (2023).
Paris N, Lamer A, Parrot A. Transformation and Evaluation of the MIMIC Database in the OMOP Common Data Model: Development and Usability Study. JMIR Med Inform. 2021 Dec 14;9(12):e30970. doi: 10.2196/30970. PMID: 34904958; PMCID:
Johnson AE, Stone DJ, Celi LA, Pollard TJ. The MIMIC Code Repository: enabling reproducibility in critical care research. J Am Med Inform Assoc. 2018 Jan 1;25(1):32-39. doi: 10.1093/jamia/ocx084. PMID: 29036464; PMCID: .
Walonoski J, Klaus S, Granger E, Hall D, Gregorowicz A, Neyarapally G, Watson A, Eastman J. Synthea™ Novel coronavirus (COVID-19) model and synthetic data set. Intell Based Med. 2020 Nov;1:100007. doi: 10.1016/j.ibmed.2020.100007. Epub 2020 Oct 2. PMID: 33043312; PMCID: .
Walonoski J, Kramer M, Nichols J, Quina A, Moesel C, Hall D, Duffett C, Dube K, Gallagher T, McLachlan S. Synthea: An approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record. J Am Med Inform Assoc. 2018 Mar 1;25(3):230-238. doi: 10.1093/jamia/ocx079. Erratum in: J Am Med Inform Assoc. 2018 Jul 1;25(7):921. PMID: 29025144; PMCID: .
Below is a list of current Health Data Partners (HDPs) engaged with the URSA Initiative.
To learn more about requesting data from any of these HDPs, please submit the Advance RI-CTR Service Request Form.
Care New England uses the Epic and Cerner electronic health record (EHR) systems for outpatient and inpatient respectively. Both systems have a suite of reporting and analytic tools for use within the health systems. Reports and data extracts can also be requested.
Since March 2015, Brown University Health has used LifeChart built on the Epic EHR platform. There are a variety of reporting and analytic tools, such as SlicerDicer and Reporting Workbench, which can be used within Brown University Health. Reports or data extracts can also be requested.
The Rhode Island Department of Health supports and maintains a range of health datasets and systems that can be used for research.
Rhode Island’s All-Payer Claims Database (RI APCD or HealthFacts RI) is a large-scale database that systematically collects healthcare claims data from a variety of payer sources, including Medicare, Medicaid, and RI’s nine largest commercial payers. Through the URSA Initiative, licensed researchers may access and analyze RI APCD data in the secure URSA Stronghold computing environment.
Founded in 2001, the Rhode Island Quality Institute (RIQI) is the state-designated Regional Health Information Organization (RHIO). RIQI manages CurrentCare, Rhode Island’s state-designated Health Information Exchange (HIE). CurrentCare contains EHR and health data from all acute care hospital systems in Rhode Island and from many ambulatory and laboratory facilities across the state. RIQI Data Analytics and Reporting has provided public health and research data to external partners such as the Rhode Island Department of Health and Brown University. More information about the data in CurrentCare can be found in the CurrentCare Data Guide.
A core component of the URSA Initiative is Brown's Stronghold. Stronghold is a secure computing and storage environment that enables Brown researchers and associates to analyze sensitive data while complying with regulatory or contractual requirements.
Stronghold is maintained by the Brown University's Center for Computation and Visualization (CCV) in the Office of Information Technology (OIT). To learn more about Stronghold, you may refer to CCV's Stronghold Documentation.
URSA Stronghold is one of the research "tenants" within Stronghold. It is collaboratively managed by CCV, the Brown Center for Biomedical Informatics (BCBI), and the Biomedical Informatics, Bioinformatics, and Cyberinfrastructure Enhancement (BIBCE) Core of Advance RI-CTR. URSA Stronghold offers both Linux and Windows computing platforms; database management systems such as Microsoft SQL, PostgreSQL, and MySQL; and, a broad range of data analysis tools including Julia, Python, R, SAS, and Stata.
Researchers may work within the URSA tenant or request their own dedicated Stronghold tenant. Refer to Brown CCV's documentation for more information about available features and how to request a Stronghold tenant.
Oscar (Ocean State Center for Advanced Resources) is Brown University's high performance computing cluster. Oscar is maintained and supported by Brown's Center for Computation and Visualization (CCV).
Brown’s Data Risk Classifications are used to determine data access and storage options. Stronghold is the institutionally-designated environment for Risk Level 3, identified datasets that include protected health information (PHI) or personally identifiable information (PII). With approval from the data provider, researchers may leverage Brown's Oscar high-performance computing environment or other Brown-managed computers for the storage and analysis of (Brown Risk Level 2) datasets. In addition, Brown’s File Service for Researchers can be used for storage of de-identified data and other research files. Data analysis is initiated "locally" on a researcher's computer while the data remain secure.