Find Datasets


Perform a Literature Search

Consider beginning your search for published data with a search for published literature. Search for academic articles related to the population, problem or methodology you are targeting. These articles should reference the data repositories, databases, or specific data sets used to conduct the research, and may detail where any data created throughout this research has been deposited.

Data presented alongside published articles could take any of the following three forms:

supplemental material, data availability statement, and data citations  
Finding Data in PubMed

Within PubMed articles, data links can be found in two main places:

  1. Data supplied by publishers or NCBI sequence databases will be found in the Secondary Source ID field
  2. Links provided directly by external data repositories are found in the LinkOut section of a PubMed record
Access PubMed through the library
Finding Data in PubMed Central

After performing a search for your topic of interest, use the Associated Data filter to show only the results that contain data associated with the research.

screen shot of associated data selection under article attributes

These associated data could be any or all of the following:

  • Supplemental Material: Files stored and made available with a full-text article
  • Data Availability Statements: Text description in full text (may include a citation for the data)
  • Data Citations: Machine-readable metadata in references or article text

  • These data will be listed in the Data Box within the PMC article. The Data Box displays only on articles that have one or more instances of associated data.

    screen shot of PMC data box showing associated data   Access PubMed Central

    Search through Data Repositories

    Data repositories are storage locations for datasets and information about those datasets in a way that is searchable to users. Data repositories are often databases made available online, and they tend to have an overall theme for the type of data stored within.

    Learn more about data repositories and how to choose one Visit this page on the library's data services site
    Browse and search for data repositories Search and browse a comprehensive database of data repositories

    Discover Data Oriented Research

    The data journals described below publish articles about significant datasets produced through scientific research, or topics in data science. Description of individual data journals are provided below.

    View a list of data journals
    • Scientific Data: a peer-reviewed, open-access journal for descriptions of scientifically valuable datasets, and research that advances the sharing and reuse of scientific data.

    • MDPI Data Journal: a peer-reviewed open access journal in data science, with the aim of enhancing data transparency and reusability. The journal published in two sections: a section on the collection, treatment and analysis methods of data in science; and a section publishing descriptions of scientific and scholarly datasets.

    • Data in Brief: provides a way for researchers to easily share and reuse each other's datasets by publishing data articles.

    • GigaScience: publishes all research objects from big data studies across the entire spectrum of life and biomedical sciences.

    • Journal of Open Psychology Data: features peer reviewed data papers describing psychology datasets with high reuse potential.

    • Open Health Data: features peer-reviewed data papers describing health datasets with high reuse potential.


    Data Source Quick Search

    If you would like to add an existing published dataset added to this list please, email

    Data Source Description Topics

    CDC Data Catalog

    Datasets and data visualizations from the Centers for Disease Control and Prevention.

    • Administrative data 
    • Biomonitoring
    • Disability & health & toxicology
    • Injury
    • Vaccination
    • Violence
    • Pregnancy
    • Disability & health 

    Epic Data Requests (JDAT)

    The Joint Data Analytics Team (JDAT) at Yale provides customized reporting and data analysis from the Epic data system. 

    • Electronic health record data

    Global Health Data Exchange (GHDx)

    A comprehensive catalog of surveys, censuses, vital statistics, and other health-related data. Search or browse by data type, keyword, organization, survey family or country. 

    • Administrative record
    • Census
    • Demographic surveillance
    • Disease registry
    • Environmental monitoring
    • Epi surveillance
    • Financial records
    • Modeled data

    Global Health Observatory (GHO) Data Repository

    The GHO is the WHO's gateway to health-related statistics for more than 1000 indicators for its 194 member states. 

    • Maternal mortality
    • Newborn & child mortality
    • Communicable diseases
    • Noncommunicable diseases & mental health
    • Substance abuse
    • Road traffic injuries
    • Sexual & reproductive health
    • Universal health coverage
    • Mortality from environmental pollution
    • Tobacco control
    • Essential medicines and vaccines
    • Health financing & health workforce
    • National & global health risks
    • Child malnutrition
    • Drinking water 
    • Sanitation and hygiene
    • Clean household energy
    • Violence Datasets data is collected and supplied from agencies from the U.S. Department of Health and Human Services as well as state partners. 

    • Environmental health 
    • Medical devices 
    • Medicare & Medicaid 
    • Social services
    • Community health 
    • Mental health 
    • Substance abuse

    HRSA Data Portal

    HRSA programs provide health care to people who are geographically isolated, or economically or medically vulnerable. Data sources for the HRSA include the American Community Survey, Census Tracts, CMS Healthcare Cost Report, Health Professional Shortage Areas, Veterans Health Administration (VHA) Facilities.

    • Demographics
    • Health care facilities 
    • Health professional shortage areas
    • Medically underserved areas/populations
    • Organ donation and transplantation
    • Scholarships & loans - NURSE Corps

    Inter-university Consortium for Political and Social Research (ICPSR)

    ICPSR is an international consortium of more than 750 academic institutions and research organizations. This group maintains a data archive of more than 250,000 files of research in the social and behavioral sciences. 

    • Census enumerations  
    • Community & urban studies
    • Geography & environment
    • Health care & facilities
    • Social indicators
    • Conflict, aggression, violence & wars

    National Death Index (NDI)

    The NDI is a centralized database of death record information on file in state vital statistics offices. It is a resource to aid epidemiologists and other health and medical investigators with their mortality ascertainment activities. 

    • Death Location (State Level)
    • Death date
    • Death certificate numbers
    • Specific statistical information
    • Cause of death

    National Environmental Public Health Tracking Network

    The National Environmental Public Health Tracking Network brings together health data and environmental data from national, state, and city sources and provides supporting information to make the data easier to understand. 

    • Environments & hazards
    • Health effects 
    • Population health

    National Health and Nutrition Examination Survey (NHANES)

    The NHANES is a program of studies designed to assess the health and nutritional status of adults and children in the United States. The survey combines interviews and physical examinations. 

    National Outbreak Reporting System (NORS)

    The National Outbreak Reporting System (NORS) supports outbreak reporting by partners in local, state, and territorial public health agencies. 

    • Foodborne disease
    • Waterborn disease
    • Disease from animal contact 
    • Person-to-person disease
    • Disease from environmental contamination
    • Disease of indeterminate/unknown causes

    Surveillance Resource Center Interactive Database Systems

    The CDC's Surveillance Resource Center makes available interactive database systems containing continuously updated data and provides data reports. 

    • Birth defects & developmental disabilities
    • Child and adolescent health
    • Chronic disease
    • Cross-cutting disease
    • Diabetes
    • Disabilities
    • Environmental health
    • Global health
    • Health risk behavior 
    • HIV, STDs, & Viral hepatitis
    • Infectious disease
    • Influenza
    • Injury
    • Maternal & child health 
    • Occupational safety & health
    • Oral Health
    • Population
    • Vaccination coverage

    The Demographic and Health Surveys (DHS) Program

    Surveys are browsable by country, survey type, year, and survey characteristics, and are indexed for advanced searching. 

    • Fertility
    • Family planning
    • Maternal and child health
    • Gender
    • HIV/AIDS
    • Malaria
    • Nutrition

    The YODA Project

    The YODA Project makes available clinical trial data generated by Johnson & Johnson, Medtronic, and SI-BONE that might not otherwise be published or easily accessible. 

    • Clinical trials for pharmaceutical drugs and medical devices browsable by generic name, product class, therapeutic area, and condition studied

    Vital Statistics Online Data Portal

    This page is a portal to the online data dissemination activities of the Division of Vital Statistics, including both interactive online data access tools and downloadable public use data files. 

    • Birth 
    • Infant death
    • Mortality
    • Fetal death