Only this pageAll pages
Powered by GitBook
1 of 4

URSA Initiative in Rhode Island

Loading...

Loading...

Loading...

Loading...

Introduction

The Unified Research data Sharing and Access (URSA) Initiative was launched in 2015 with the overall goal of making electronic health record (EHR) and other health data accessible and usable for research purposes across Rhode Island. This initiative is supported by Advance RI-CTR and coordinated by the Advance RI-CTR Biomedical Informatics, Bioinformatics, and Cyberinfrastructure Enhancement (BIBCE) Core with leadership and expertise provided by the Brown Center for Biomedical Informatics (BCBI).

The Advance RI-CTR BIBCE Core collaborates with Brown University's Office of Information Technology and the Division of Research, including Research Integrity and Research Agreements & Contracting, and health data partners across Rhode Island to:

  • Coordinate processes for health data sharing and secure access within and across institutions;

  • Develop the requisite legal, ethical, and technical infrastructure between Brown and health data sharing partners;

  • Establish cross-institutional governance, including standard policies, procedures, and protocols for appropriate sharing and use of health data; and,

  • Provide documentation and training for health data requests, access, and use.

Through the URSA Initiative, the BIBCE Core provides expertise and infrastructure for conducting research using large-scale health datasets. The BIBCE Core is available to help researchers navigate the process of identifying options and solutions for storing, managing, and analyzing data from health data partners or other data sources.

Complete Advance RI-CTR's to schedule a consultation with the BIBCE Core.

Visit other chapters in CODIAC for Health using the Table of Contents or menu in the upper left corner.

Service Request Form
A stylized bear with the ursa major constellation. Unified Research data Sharing and Access Initiative in Rhode Island

Datasets

This page describes various synthetic and de-identified health datasets available to researchers.

SyntheticRI

The SyntheticRI datasets were generated by the Brown Center for Biomedical Informatics (BCBI) for use in research and education. These datasets contain realistic but fictional residents of the state of Rhode Island. The synthetic population aims to statistically mirror the real population in terms of demographics, disease burden, vaccinations, medical visits, and social determinants.

  • SyntheticRI Demo: synthetic data representing 1,188 Rhode Island individuals of all ages

  • SyntheticRI Adult: synthetic data representing 145,010 Rhode Island adults, ages 19-99

  • SyntheticRI Peds: synthetic data representing 145,010 Rhode Island children, ages 0-18

The SyntheticRI datasets were generated using , an open-source, synthetic patient generator. The Synthea-generated datasets are in .csv file format.

Each dataset was also transformed to the OHDSI OMOP Common Data Model (CDM) using the Observational Health Data Sciences and Informatics (OHDSI) Consortium's program . These datasets can be accessed though direct database queries or with such as ATLAS and HADES.

For more information, please email [email protected].

MIMIC-IV (Medical Information Mart for Intensive Care) is a large, freely-available relational database comprising deidentified health-related data from real patients who were admitted to the critical care units of the Beth Israel Deaconess Medical Center in Boston, Massachusetts, USA.

MIMIC-IV contains comprehensive information from 2008-2019 for over 60,000 hospitalized patients. The database is intended to support a wide variety of research in healthcare. MIMIC-IV builds upon the success of MIMIC-III, and incorporates numerous improvements over MIMIC-III.

Refer to for more information about MIMIC-IV. Researchers interested in accessing the complete MIMIC-IV dataset should follow .

The (HCUP) is a family of databases, software tools, and related products developed through a Federal-State-Industry partnership and sponsored by the Agency for Healthcare Research and Quality (AHRQ). HCUP databases are derived from administrative data and contain encounter-level, clinical, and nonclinical information including all-listed diagnoses and procedures, discharge status, patient demographics, and charges for all patients, regardless of payer, beginning in 1988.

HCUP offers the following databases:

  • National (Nationwide) Inpatient Sample (NIS): largest publicly available all-payer hospital inpatient care database in the United States

  • Kids' Inpatient Database (KID): hospital inpatient stays for children and is specifically designed to allow researchers to study a broad range of conditions and procedures related to children's health

  • Nationwide Emergency Department Sample (NEDS): emergency department (ED) visits that do not result in an admission as well as ED visits that result in an admission to the same hospital

Please note: Access to HCUP databases is not free. Database releases must be purchased through the .

Visit for a more information about the databases. Learn more about HCUP on the .

The Surveillance, Epidemiology, and End Results (SEER) Program provides information on cancer statistics in an effort to reduce the cancer burden among the U.S. population. SEER collects cancer incidence data from population-based cancer registries covering approximately 47.9 percent of the U.S. population. The SEER datasets include data on patient demographics, primary tumor site, tumor morphology, stage at diagnosis, and first course of treatment.

SEER offers the following datasets:

  • SEER Research Data

    • Register with any valid email

    • Excludes geography, month and year of diagnosis, and other demographic fields

Visit for a deeper comparison of the datasets. Learn more about SEER and SEER Datasets on the .

SyntheticMass is a Synthea-generated data set that contains realistic but fictional residents of the state of Massachusetts. The synthetic population aims to statistically mirror the state population in terms of demographics, disease burden, vaccinations, medical visits, and social determinants. Refer to the for more information.

There are several data sets available on the .

  • Complete SytheticMass data sets: "SyntheticMass Data Version 2 (24 May, 2017)". This ZIP file is quite large (21GB), so make sure you move the file to a location with enough storage before attempting to unzip.

  • Sample data sets (<100MB) containing 100 or 1,000 patient records

  • Specialized data sets that have been generated using Synthea by other study teams. These include COVID-19 data sets, a Childhood Obesity data set and more.

The following versions are available for each data set.

  • CSV ( describing all CSV tables)

  • C-CDA (xml files)

  • FHIR (json files)

  1. MIT Laboratory for Computational Physiology. (n.d.). . Retrieved May 1, 2024.

  2. Agency for Healthcare Research and Quality. (n.d.). . Retrieved May 1, 2024.

  3. National Cancer Institute. (n.d.).. Retrieved May 1, 2024.

  • Johnson, A.E.W., Bulgarelli, L., Shen, L. et al. . Sci Data 10, 1 (2023).

  • Paris N, Lamer A, Parrot A. . JMIR Med Inform. 2021 Dec 14;9(12):e30970. doi: 10.2196/30970. PMID: 34904958; PMCID: PMC8715361

  • Johnson AE, Stone DJ, Celi LA, Pollard TJ. . J Am Med Inform Assoc. 2018 Jan 1;25(1):32-39. doi: 10.1093/jamia/ocx084. PMID: 29036464; PMCID: PMC6381763.

Nationwide Readmissions Database: designed to support various types of analyses of national readmission rates for all payers and uninsured individuals

  • State Inpatient Databases (SID): inpatient discharge abstracts from participating States, translated into a uniform format to facilitate multi-State comparisons and analyses

  • State Ambulatory Surgery and Services Databases (SASD): encounter-level data for ambulatory surgery and other outpatient services from hospital-owned facilities

  • State Emergency Department Databases (SEDD): discharge information on all emergency department visits that do not result in an admission

  • SEER Research Plus and NCCR Data
    • Requires user authentication through eRA Commons or an HHS account.

    • Includes geography, month, and year of diagnosis, other demographic fields

    • Includes excluding geography

    MITRE Corporation. (n.d.). . Retrieved May 1, 2024.

    Walonoski J, Klaus S, Granger E, Hall D, Gregorowicz A, Neyarapally G, Watson A, Eastman J. . Intell Based Med. 2020 Nov;1:100007. doi: 10.1016/j.ibmed.2020.100007. Epub 2020 Oct 2. PMID: 33043312; PMCID: PMC7531559.

  • Walonoski J, Kramer M, Nichols J, Quina A, Moesel C, Hall D, Duffett C, Dube K, Gallagher T, McLachlan S. . J Am Med Inform Assoc. 2018 Mar 1;25(3):230-238. doi: 10.1093/jamia/ocx079. Erratum in: J Am Med Inform Assoc. 2018 Jul 1;25(7):921. PMID: 29025144; PMCID: PMC7651916.

  • MIMIC-IV

    HCUP

    SEER

    SyntheticMass

    References

    Resources

    Articles

    Links

    Synthea
    ETL-Synthea
    OHDSI software tools
    [1]
    MIT's MIMIC documentation
    MIT's "Getting Started" instructions
    Healthcare Cost and Utilization Project
    [2]
    Online HCUP Central Distributor
    AHRQ HCUP
    HCUP website
    [3]
    SEER
    SEER website
    [4]
    SyntheticMass website
    SyntheticMass downloads page
    data dictionary
    About MIMIC
    Healthcare Cost and Utilization Project (HCUP)
    SEER Data
    MIMIC-IV, a freely accessible electronic health record dataset
    Transformation and Evaluation of the MIMIC Database in the OMOP Common Data Model: Development and Usability Study
    The MIMIC Code Repository: enabling reproducibility in critical care research
    HCUP Website
    HCUP Databases
    SEER Website
    SEER Data Products
    National Childhood Cancer Registry (NCCR) data
    About Synthea
    Synthea™ Novel coronavirus (COVID-19) model and synthetic data set
    Synthea: An approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record
    SEER Data Access Request Process
    SyntheticMass Website

    Health Data Partners

    Below is a list of current Health Data Partners (HDPs) engaged with the URSA Initiative.

    To learn more about requesting data from any of these HDPs, please submit the Advance RI-CTR Service Request Form.

    Care New England

    Care New England uses the Epic and Cerner electronic health record (EHR) systems for outpatient and inpatient respectively. Both systems have a suite of reporting and analytic tools for use within the health systems. Reports and data extracts can also be requested.

    Brown University Health

    Since March 2015, Brown University Health has used LifeChart built on the Epic EHR platform. There are a variety of reporting and analytic tools, such as SlicerDicer and Reporting Workbench, which can be used within Brown University Health. Reports or data extracts can also be requested.

    Rhode Island Department of Health

    The Rhode Island Department of Health supports and maintains a range of health datasets and systems that can be used for research.

    Rhode Island’s All-Payer Claims Database (RI APCD or HealthFacts RI) is a large-scale database that systematically collects healthcare claims data from a variety of payer sources, including Medicare, Medicaid, and RI’s nine largest commercial payers.

    Founded in 2001, the Rhode Island Quality Institute (RIQI) is the state-designated Regional Health Information Organization (RHIO). RIQI manages , Rhode Island’s state-designated Health Information Exchange (HIE). CurrentCare contains EHR and health data from all acute care hospital systems in Rhode Island and from many ambulatory and laboratory facilities across the state. RIQI has provided public health and research data to external partners such as the Rhode Island Department of Health and Brown University. More information about the data in CurrentCare can be found in the .

    Computing Environments

    A core component of the URSA Initiative is Brown's Stronghold. Stronghold is a secure computing and storage environment that enables Brown researchers and associates to analyze sensitive data while complying with regulatory or contractual requirements.

    Stronghold is maintained by the Brown University's Center for Computation and Visualization () in the Office of Information Technology (). To learn more about Stronghold, you may refer to CCV's .

    URSA Stronghold is one of the research "tenants" within Stronghold. It is collaboratively managed by CCV, the Brown Center for Biomedical Informatics (), and the Biomedical Informatics, Bioinformatics, and Cyberinfrastructure Enhancement () Core of . URSA Stronghold offers both Linux and Windows computing platforms; database management systems such as Microsoft SQL, PostgreSQL, and MySQL; and, a broad range of data analysis tools including Julia, Python, R, SAS, and Stata.

    Researchers may work within the URSA tenant or request their own dedicated Stronghold tenant. Refer to Brown CCV's documentation for more information about available features and how to .

    Oscar (Ocean State Center for Advanced Resources) is Brown University's high performance computing cluster. Oscar is maintained and supported by Brown's Center for Computation and Visualization (

    The Rhode Island Quality Institute

    CurrentCare
    Data Analytics and Reporting
    CurrentCare Data Guide
    ).

    Brown’s Data Risk Classifications are used to determine data access and storage options. Stronghold is the institutionally-designated environment for Risk Level 3, identified datasets that include protected health information (PHI) or personally identifiable information (PII). With approval from the data provider, researchers may leverage Brown's Oscar high-performance computing environment or other Brown-managed computers for the storage and analysis of (Brown Risk Level 2) datasets. In addition, Brown’s File Service for Researchers can be used for storage of de-identified data and other research files. Data analysis is initiated "locally" on a researcher's computer while the data remain secure.

    • Brown OIT Data Risk Classifications

    • Brown CCV Computing

    • Stronghold Documentation

    • Oscar Documentation

    Stronghold

    URSA Stronghold

    Oscar

    CCV
    OIT
    Stronghold Documentation
    BCBI
    BIBCE
    Advance RI-CTR
    request a Stronghold tenant

    Data Storage at Brown

    Resources

    Links

    CCV
    de-identified