Databases
Databases
Data sources are largely broken down into two categories: administrative billing databases and specialty databases.
Within each of the two categories, there are multiple databases, each with their own strengths and weaknesses. The former of the two is generally significantly more complex with a steep learning curve to be proficient. The latter of the two offers a very large and wide variety of options of varying degrees of complexity; ranging from approachable enough for someone with access to SPSS to highly complex, demanding a high degree of SAS or R programming abilities.
Should you need additional assistance in deciding on a data source or would like help performing analyses or writing a grant which will utilize secondary data sources, please go to https://u.osu.edu/secondarydatacore/ or email secondarydatacore@osumc.edu.
NOTE: The below list of databases is not exhaustive and does not represent all databases at Ohio State. Rather, they represent the databases most used by the statisticians in the Secondary Data Core. If you are an Ohio State researcher and work with a database not provided in the list below and are interested in assisting other researchers interested in using that data, please contact secondarydatacore@osumc.edu.
Administrative Billing Databases
Administrative billing databases are very large, very complex databases which offer near limitless possibilities as it pertains to study questions. While they can have their shortcomings, the flexibility, size, scope and relatively limited availability of the data make them ideal candidates for most any study question.
-
CMS Standard Analytic Files
- The Centers for Medicare and Medicaid Services (CMS) Standard Analytic Files (SAFs) contain inpatient, outpatient, skilled nursing facility, and Hospice data for millions of publicly insured individuals across the United States.
- 2012-2020 Ohio State Wexner Medical Center Faculty can gain access (cost may be associated).
- SDC member with expertise: Madison Hyer
-
IBM MarketScan
- Similar to CMS SAF, contains inpatient and outpatient encounters, privately insured beneficiaries and Medicare beneficiaries with supplemental insurance
- 2016-2022 Ohio State Wexner Medical Center Faculty can gain access (cost may be associated)
- SDC member with expertise: Jack Chiang
Specialty Databases
Specialty databases offer an immense amount of variety in terms of study population, answerable questions, and vary in complexity from quite approachable to very complex. We will break them down according to the study population and provide details of each one to help you decide which is most appropriate for your research question.
Electronic Health Record (EHR) Databases
Different from administrative billing databases, EHR-based databases are large and complex and offer near limitless possibilities as it pertains to study questions. They can be institution specific (LifeScale) or multi-institutional (Cosmos and PCORnet) and vary in administrative accessibility. While they can have their shortcomings, the flexibility, size, scope and relatively limited availability of the data can make them ideal candidates for most any study question.
-
Epic Cosmos
- Coded limited data set of data provided by participating organizations (including Ohio State Wexner Medical Center) to EPIC that is continuously updated including electronic health records of hundreds of millions of patients.
- SDC member with expertise: Mahmoud Abdel-Rasoul
-
LifeScale
- LifeScale is an Ohio State-specific, honest-broker mediated, coded-limited database developed in partnerships with Microsoft, College of Optometry, College of Dentistry, The Ohio State University Comprehensive Cancer Center – Arthur G. James Cancer Hospital and Richard J. Solove Research Institute (including Tumor and Registry data) and includes data not often found in electronic medical record data.
- Data from years 2012 – present are available (24-hour delay in 2024)
-
Ohio State Patient-Centered Clinical Research Network (PCORI)net Common Data Model (CDM)
- PCORI is funded with ~50 academic medical centers and health systems (grouped into eight Clinical Research Networks (CRNs)) spread all over the United States.
- All EPIC data from 2011 November onwards is available. For data prior to 2011, legacy data is available (data gets refreshed every month).
Cancer Databases
-
National Cancer Database (NCDB)
- Specialty/Cancer database
- Data must be requested by PI (project specific)
Healthcare Cost and Utilization Project (HCUP) Databases
-
National (Nationwide) Inpatient Sample (NIS)
- National (Nationwide) Inpatient Sample (NIS) is the largest publicly available all-payer inpatient healthcare database in the United States providing national estimates of hospital inpatient stays.
- 1988-2021 Ohio State Wexner Medical Center Faculty can gain access (cost may be associated).
-
Nationwide Readmissions Database (NRD)
- Nationwide Readmission Database (NRD) is a powerful database designed to support different types of analyses involving national readmission rates for all payers and the uninsured. This database addresses a big gap in healthcare data – the lack of nationally representative information on hospital readmissions for all ages and all payers.
- For data in years 2010-2021, Ohio State Wexner Medical Center Faculty can gain access (cost may be associated).
-
Ohio State Federal Statistics Research Data
- The Federal Statistics Research Data Centers (FSRDC) are a resource managed by the United States Census Bureau to provide access to highly restricted data from the Census and 16 other federal agencies. Both survey data and administrative data (tax data, social security records, birth and death certificates) are available through the FSRDC and data from different sources can be linked at the individual level.
-
All of Us
- The National Institutes of Health’s All of Us Research Program is building one of the largest biomedical data resources of its kind. The All of Us Research Hub stores health data from a diverse group of participants from across the United States. All of Us participants contribute to the program in many ways, such as by responding to surveys, sharing electronic health records, and providing biosamples.