Where to Start Your Search
Literature Search
Review academic literature, where many published studies generate and analyze data. Search for articles related to the population, problem, or methodology you are targeting. These articles should reference the data repositories, databases, or specific datasets used to conduct the research, and may detail where data has been deposited.
Tips for finding data in literature:- Select data-specific search filters. For example, in PubMed, you can search articles with Associated Data by selecting this filter under Article Attributes in the sidebar. You can also select Dataset as an Article Type. If Dataset doesn't appear under Article Type in your sidebar, click Additional Filters button and add it.
- Browse articles for sections such as Supplementary Materials, Data Availability Statement, or Data Citations to find attached data files and links to external data.
- Use data as a search term. For example, “nonalcoholic fatty liver disease AND data”.
- Enter your search terms in Yale Library's Books+ and then filter by format Data Sets.
- Search across all National Center for Biotechnology Information (NCBI) resources to see many National Library of Medicine resources at once, including literature and databases.
Data Repositories
Data repositories are storage locations for datasets and information about those datasets (such as metadata and documentation), and they are often themed by subject.
Tips for finding data repositories:
- Use a data repository registry, such as re3data.org and fairsharing.org.
- Find data repositories by funder, such as the National Institutes of Health (NIH).
- Find data repositories by discipline, such as those recommended by Nature’s Scientific Data and PLoS ONE.
- Search generalist data repositories, such as Dataverse, Dryad, Figshare, Inter-University Consortium for Political and Social Research (ICPSR), Mendeley Data, Open Science Framework, Qualitative Data Repository (QDR), Synapse, Vivli, and Zenodo.
- Use a government data portal, such as data.gov and healthdata.gov.
Data at Yale
Data can be found right here at Yale, whether it's data that has been purchased by the library or departments, data generated by your colleagues, and data available through some other means. Data at Yale that may be of interest to health sciences researchers includes:
- American Hospital Association (AHA), available from Yale Library on WRDS
- Merative MarketScan Database (must be on Yale VPN to access the link), available from Yale Biomedical Informatics and Computing (YBIC) and Biomedical Informatics & Data Science (BIDS)
- Yale University Open Data Access (YODA) Project, available from Center for Outcomes Research and Evaluation
Data Creators
Consider who might be creating the data you need for research. This includes governing bodies, academic and research institutions, biomedical companies, and not-for-profit organizations, as well as news organizations and other invested groups.
Examples of these include:
- World Health Organization’s data resources
- United Nations Refugee Agency data catalog
- Gapminder data (read more about Gapminder)
- New York Times’ COVID-19 data on Github (read more about this data)
Data-Oriented Research
Some academic journals focus on publishing data, and information about data science. Try searching one of the following:
-
Scientific Data: a peer-reviewed, open-access journal for descriptions of scientifically valuable datasets and research that advances the sharing and reuse of scientific data.
-
Data in Brief: a multidisciplinary, open access, peer-reviewed journal, publishing short articles that describe and provide access to research data.
-
Journal of Open Psychology Data: features peer-reviewed data papers describing psychology datasets with high reuse potential.
-
Open Health Data: features peer-reviewed data papers describing health datasets with high reuse potential.
-
GigaScience: publishes all research objects from big data studies across the entire spectrum of life and biomedical sciences.
What to Ask As You Search
Does the data help me answer my research question?
It’s important to formulate your research question before starting your data search, as your question can direct and inform your data search.Does the data contain the variables I need?
Once you have a research question, you may start to understand what type of analysis you want to perform. Does that data you’ve found contain the variables you need to perform your analysis? For example, if you’re studying brain disorders in young adults, you may be interested in data variables such as age at diagnosis, disorder type, and progression state. You may also be seeking brain scan images. Thinking about analysis in advance of your data search can help narrow down relevant datasets.Is the data within the scope of my project?
The data you find may involve additional work on your part before it becomes useful — for instance, you may need to conduct data cleaning, curation, or analysis to answer your research question. You may also encounter obstacles to working with the data, such as licensing, wait times, and technical challenges. It may take time and resources to acquire the data you need — and it’s important to recognize whether the data you’ve found aligns with your constraints, such as an upcoming deadline or budget limitation. For more about this, see our “Re(Use) Data” page.Can I reuse the data?
Visit our “Re(Use) Data” page to learn more about when data is usable, and why reusable data is important for science.
Other Resources
- Definition of Scientific Data | Final NIH Policy for Data Management and Sharing
- Discovering Associated Data in PMC | NCBI Insights
- Data filters in PMC and PubMed | NLM Technical Bulletin
- Related Data | PubMed Help
- Find Datasets | Yale University Library
- National Center for Data Services | National Library of Medicine