arrow-left
Only this pageAll pages
gitbookPowered by GitBook
1 of 5

URSA Initiative in Rhode Island

Loading...

Loading...

Loading...

Loading...

Loading...

Computing Environments

hashtag
Stronghold

A core component of the URSA Initiative is Brown's Stronghold. Stronghold is a secure computing and storage environment that enables Brown researchers and associates to analyze sensitive data while complying with regulatory or contractual requirements.

Stronghold is maintained by the Brown University's Center for Computation and Visualization (CCVarrow-up-right) in the Office of Information Technology (OITarrow-up-right). To learn more about Stronghold, you may refer to CCV's Stronghold Documentationarrow-up-right.

hashtag
URSA Stronghold

URSA Stronghold is one of the research "tenants" within Stronghold. It is collaboratively managed by CCV, the Brown Center for Biomedical Informatics (), and the Biomedical Informatics, Bioinformatics, and Cyberinfrastructure Enhancement () Core of . URSA Stronghold offers both Linux and Windows computing platforms; database management systems such as Microsoft SQL, PostgreSQL, and MySQL; and, a broad range of data analysis tools including Julia, Python, R, SAS, and Stata.

Researchers may work within the URSA tenant or request their own dedicated Stronghold tenant. Refer to Brown CCV's documentation for more information about available features and how to .

hashtag
Oscar

Oscar (Ocean State Center for Advanced Resources) is Brown University's high performance computing cluster. Oscar is maintained and supported by Brown's Center for Computation and Visualization ().

hashtag
Data Storage at Brown

Brown’s are used to determine data access and storage options. Stronghold is the institutionally-designated environment for Risk Level 3, identified datasets that include protected health information () or personally identifiable information (PII). With approval from the data provider, researchers may leverage Brown's high-performance computing environment or other Brown-managed computers for the storage and analysis of (Brown Risk Level 2) datasets. In addition, Brown’s can be used for storage of de-identified data and other research files. Data analysis is initiated "locally" on a researcher's computer while the data remain secure.

hashtag
Resources

hashtag
Links

Introduction

The Unified Research data Sharing and Access (URSA) Initiative was launched in 2015 with the overall goal of making electronic health record (EHR) and other health data accessible and usable for research purposes across Rhode Island. This initiative is supported by Advance RI-CTR and coordinated by the Advance RI-CTR Biomedical Informatics, Bioinformatics, and Cyberinfrastructure Enhancement (BIBCEarrow-up-right) Core with leadership and expertise provided by the Brown Center for Biomedical Informatics (BCBIarrow-up-right).

The Advance RI-CTR BIBCE Core collaborates with Brown University's Office of Information Technology and the Division of Research, including Research Integrity and Research Agreements & Contracting, and health data partners across Rhode Island to:

  • Coordinate processes for health data sharing and secure access within and across institutions;

  • Develop the requisite legal, ethical, and technical infrastructure between Brown and health data sharing partners;

  • Establish cross-institutional governance, including standard policies, procedures, and protocols for appropriate sharing and use of health data; and,

  • Provide documentation and training for health data requests, access, and use.

Through the URSA Initiative, the BIBCE Core provides expertise and infrastructure for conducting research using large-scale health datasets. The BIBCE Core is available to help researchers navigate the process of identifying options and solutions for storing, managing, and analyzing data from health data partners or other data sources.

Complete Advance RI-CTR's to schedule a consultation with the BIBCE Core.

circle-info

Visit other chapters in CODIAC for Health using the or menu in the upper left corner.

Health Data Partners

Below is a list of current Health Data Partners (HDPs) engaged with the URSA Initiative.

To learn more about requesting data from any of these HDPs, please submit the .

hashtag
Care New England

uses the Epic and Cerner electronic health record (EHR) systems for outpatient and inpatient respectively. Both systems have a suite of reporting and analytic tools for use within the health systems. Reports and data extracts can also be requested.

Data Requests

The Brown Center for Biomedical Informatics (BCBI) and the Advance RI-CTR Biomedical Informatics, Bioinformatics and Cyberinfrastructure Enhancement (BIBCE) Core serve as liaisons between researchers and health data partners to create and submit accurate and complete data requests as well as oversee the governance of health data transferred to and accessed at Brown University. Each health data partner has unique administrative processes for requesting their data as well as guidelines for working with the data and sharing results.

Please submit an to start the process.

hashtag
Types of Data Requests

Aggregate Statistics – basic counts and descriptive statistics to characterize a population of interest (e.g., number of patients or encounters meeting specified criteria).

hashtag
Brown University Health

Since March 2015, Brown University Healtharrow-up-right has used LifeChartarrow-up-right built on the Epic EHR platform. There are a variety of reporting and analytic tools, such as SlicerDicer and Reporting Workbench, which can be used within Brown University Health. Reports or data extracts can also be requested.

hashtag
Rhode Island Department of Health

The Rhode Island Department of Healtharrow-up-right supports and maintains a range of health datasets and systems that can be used for research.

Rhode Island’s All-Payer Claims Databasearrow-up-right (RI APCD or HealthFacts RI) is a large-scale database that systematically collects healthcare claims data from a variety of payer sources, including Medicare, Medicaid, and RI’s nine largest commercial payers. Through the URSA Initiative, licensed researchers may access and analyze RI APCD data in the secure URSA Stronghold computing environment.

hashtag
The Rhode Island Quality Institute

Founded in 2001, the Rhode Island Quality Institute (RIQI) is the state-designated Regional Health Information Organization (RHIO). RIQI manages CurrentCarearrow-up-right, Rhode Island’s state-designated Health Information Exchange (HIE). CurrentCare contains EHR and health data from all acute care hospital systems in Rhode Island and from many ambulatory and laboratory facilities across the state. RIQI Data Analytics and Reportingarrow-up-right has provided public health and research data to external partners such as the Rhode Island Department of Health and Brown University. More information about the data in CurrentCare can be found in the CurrentCare Data Guidearrow-up-right.

Advance RI-CTR Service Request Formarrow-up-right
Care New Englandarrow-up-right
De-Identified Dataset – as defined by the HIPAA Privacy Rule, excludes the 18 protected health information (PHIarrow-up-right) elements that could be used to identify individuals or the individual's relatives, employers, or household members.

Limited Dataset – as defined by the HIPAA Privacy Rule, includes a limited set of identifiable information that excludes 16 direct identifiers but may include city, state, zip code, elements of date, and other numbers, characteristics, or codes not listed as direct identifiers.

Identified Dataset – in some cases, direct identifiers may be needed (e.g., street addresses for geocoding or demographics for linking datasets) that may be subsequently removed to create a limited or de-identified dataset.

hashtag
URSA Data Request Form

The URSA Data Request Form (UDRF) was designed by the Advance RI-CTR BIBCE Core and BCBI for requests made to Rhode Island health data partners. The UDRF is used to structure the cohort definition and dataset extraction fields in a computable manner. It can also be used to document a research study extraction specification or communicate extraction details to research team programmers. Each section should be reviewed and approved by the study PI.

Contact the Advance RI-CTR BIBCE Core via the Advance RI-CTR Service Request Formarrow-up-right to initiate a new data request. The BIBCE Core will provide a copy of the URSA Data Request Form as well as additional data request process details.

Advance RI-CTR Service Request Formarrow-up-right
Example Health Research Process
Service Request Formarrow-up-right
Table of Contentsarrow-up-right
BCBIarrow-up-right
BIBCEarrow-up-right
Advance RI-CTRarrow-up-right
request a Stronghold tenantarrow-up-right
CCVarrow-up-right
Data Risk Classificationsarrow-up-right
PHIarrow-up-right
Oscararrow-up-right
File Service for Researchersarrow-up-right
Brown OIT Data Risk Classificationsarrow-up-right
Brown CCV Computingarrow-up-right
Stronghold Documentationarrow-up-right
Oscar Documentationarrow-up-right

Datasets

This page describes various synthetic and de-identified health datasets available to researchers.

hashtag
SyntheticRI

The SyntheticRI datasets were generated by the Brown Center for Biomedical Informatics (BCBI) for use in research and education. These datasets contain realistic but fictional residents of the state of Rhode Island. The synthetic population aims to statistically mirror the real population in terms of demographics, disease burden, vaccinations, medical visits, and social determinants.

  • SyntheticRI Demo: synthetic data representing 1,188 Rhode Island individuals of all ages

  • SyntheticRI Adult: synthetic data representing 145,010 Rhode Island adults, ages 19-99

  • SyntheticRI Peds: synthetic data representing 145,010 Rhode Island children, ages 0-18

The SyntheticRI datasets were generated using , an open-source, synthetic patient generator. The Synthea-generated datasets are in .csv file format.

Each dataset was also transformed to the OHDSI OMOP Common Data Model (CDM) using the Observational Health Data Sciences and Informatics (OHDSI) Consortium's program . These datasets can be accessed though direct database queries or with such as ATLAS and HADES.

For more information, please email [email protected].

hashtag
MIMIC-IV

MIMIC-IV (Medical Information Mart for Intensive Care) is a large, freely-available relational database comprising deidentified health-related data from real patients who were admitted to the critical care units of the Beth Israel Deaconess Medical Center in Boston, Massachusetts, USA.

MIMIC-IV contains comprehensive information from 2008-2019 for over 60,000 hospitalized patients. The database is intended to support a wide variety of research in healthcare. MIMIC-IV builds upon the success of MIMIC-III, and incorporates numerous improvements over MIMIC-III.

Refer to for more information about MIMIC-IV. Researchers interested in accessing the complete MIMIC-IV dataset should follow .

hashtag
HCUP

The (HCUP) is a family of databases, software tools, and related products developed through a Federal-State-Industry partnership and sponsored by the Agency for Healthcare Research and Quality (AHRQ). HCUP databases are derived from administrative data and contain encounter-level, clinical, and nonclinical information including all-listed diagnoses and procedures, discharge status, patient demographics, and charges for all patients, regardless of payer, beginning in 1988.

HCUP offers the following databases:

  • National (Nationwide) Inpatient Sample (NIS): largest publicly available all-payer hospital inpatient care database in the United States

  • Kids' Inpatient Database (KID): hospital inpatient stays for children and is specifically designed to allow researchers to study a broad range of conditions and procedures related to children's health

  • Nationwide Emergency Department Sample (NEDS): emergency department (ED) visits that do not result in an admission as well as ED visits that result in an admission to the same hospital

Please note: Access to HCUP databases is not free. Database releases must be purchased through the .

Visit for a more information about the databases. Learn more about HCUP on the .

hashtag
SEER

The Surveillance, Epidemiology, and End Results (SEER) Program provides information on cancer statistics in an effort to reduce the cancer burden among the U.S. population. SEER collects cancer incidence data from population-based cancer registries covering approximately 47.9 percent of the U.S. population. The SEER datasets include data on patient demographics, primary tumor site, tumor morphology, stage at diagnosis, and first course of treatment.

SEER offers the following datasets:

  • SEER Research Data

    • Register with any valid email

    • Excludes geography, month and year of diagnosis, and other demographic fields

  • SEER Research Plus and NCCR Data

Visit for a deeper comparison of the datasets. Learn more about SEER and SEER Datasets on the .

hashtag
SyntheticMass

SyntheticMass is a Synthea-generated data set that contains realistic but fictional residents of the state of Massachusetts. The synthetic population aims to statistically mirror the state population in terms of demographics, disease burden, vaccinations, medical visits, and social determinants. Refer to the for more information.

There are several data sets available on the .

  • Complete SytheticMass data sets: "SyntheticMass Data Version 2 (24 May, 2017)". This ZIP file is quite large (21GB), so make sure you move the file to a location with enough storage before attempting to unzip.

  • Sample data sets (<100MB) containing 100 or 1,000 patient records

  • Specialized data sets that have been generated using Synthea by other study teams. These include COVID-19 data sets, a Childhood Obesity data set and more.

The following versions are available for each data set.

  • CSV ( describing all CSV tables)

  • C-CDA (xml files)

  • FHIR (json files)

hashtag
References

  1. MIT Laboratory for Computational Physiology. (n.d.). About MIMIC. Retrieved May 1, 2024, from

  2. Agency for Healthcare Research and Quality. (n.d.). Healthcare Cost and Utilization Project (HCUP). Retrieved May 1, 2024, from

  3. National Cancer Institute. (n.d.). SEER Data. Retrieved May 1, 2024, from

  4. MITRE Corporation. (n.d.). About Synthea. Retrieved May 1, 2024, from

hashtag
Resources

hashtag
Articles

  • Johnson, A.E.W., Bulgarelli, L., Shen, L. et al. MIMIC-IV, a freely accessible electronic health record dataset. Sci Data 10, 1 (2023).

  • Paris N, Lamer A, Parrot A. Transformation and Evaluation of the MIMIC Database in the OMOP Common Data Model: Development and Usability Study. JMIR Med Inform. 2021 Dec 14;9(12):e30970. doi: 10.2196/30970. PMID: 34904958; PMCID:

  • Johnson AE, Stone DJ, Celi LA, Pollard TJ. The MIMIC Code Repository: enabling reproducibility in critical care research. J Am Med Inform Assoc. 2018 Jan 1;25(1):32-39. doi: 10.1093/jamia/ocx084. PMID: 29036464; PMCID: .

hashtag
Links

Nationwide Readmissions Database: designed to support various types of analyses of national readmission rates for all payers and uninsured individuals
  • State Inpatient Databases (SID): inpatient discharge abstracts from participating States, translated into a uniform format to facilitate multi-State comparisons and analyses

  • State Ambulatory Surgery and Services Databases (SASD): encounter-level data for ambulatory surgery and other outpatient services from hospital-owned facilities

  • State Emergency Department Databases (SEDD): discharge information on all emergency department visits that do not result in an admission

  • Requires user authentication through eRA Commons or an HHS account.

  • Includes geography, month, and year of diagnosis, other demographic fields

  • Includes National Childhood Cancer Registry (NCCR) dataarrow-up-right excluding geography

  • Walonoski J, Klaus S, Granger E, Hall D, Gregorowicz A, Neyarapally G, Watson A, Eastman J. Syntheaâ„¢ Novel coronavirus (COVID-19) model and synthetic data set. Intell Based Med. 2020 Nov;1:100007. doi: 10.1016/j.ibmed.2020.100007. Epub 2020 Oct 2. PMID: 33043312; PMCID: PMC7531559arrow-up-right.

  • Walonoski J, Kramer M, Nichols J, Quina A, Moesel C, Hall D, Duffett C, Dube K, Gallagher T, McLachlan S. Synthea: An approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record. J Am Med Inform Assoc. 2018 Mar 1;25(3):230-238. doi: 10.1093/jamia/ocx079. Erratum in: J Am Med Inform Assoc. 2018 Jul 1;25(7):921. PMID: 29025144; PMCID: PMC7651916arrow-up-right.

  • SEER Data Access Request Processarrow-up-right
  • SyntheticMass Websitearrow-up-right

  • Syntheaarrow-up-right
    ETL-Syntheaarrow-up-right
    OHDSI software toolsarrow-up-right
    [1]
    MIT's MIMIC documentationarrow-up-right
    MIT's "Getting Started" instructionsarrow-up-right
    Healthcare Cost and Utilization Projectarrow-up-right
    [2]
    Online HCUP Central Distributorarrow-up-right
    ahrq.gov/data/hcuparrow-up-right
    HCUP websitearrow-up-right
    [3]
    https://seer.cancer.gov/data/arrow-up-right
    SEER websitearrow-up-right
    [4]
    SyntheticMass websitearrow-up-right
    SyntheticMass downloads pagearrow-up-right
    data dictionaryarrow-up-right
    https://mimic.mit.edu/docs/about/arrow-up-right
    https://www.ahrq.gov/data/hcup/index.htmlarrow-up-right
    https://seer.cancer.gov/data/arrow-up-right
    https://synthea.mitre.org/aboutarrow-up-right
    https://doi.org/10.1038/s41597-022-01899-xarrow-up-right
    PMC8715361arrow-up-right
    PMC6381763arrow-up-right
    HCUP Websitearrow-up-right
    HCUP Databasesarrow-up-right
    SEER Websitearrow-up-right
    SEER Data Productsarrow-up-right
    de-identified