Where to Start Your Search
Review academic literature, where many published studies generate and analyze data. Search for articles related to the population, problem, or methodology you are targeting. These articles should reference the data repositories, databases, or specific datasets used to conduct the research, and may detail where data has been deposited.Tips for finding data in literature:
- Select data-specific search filters. For example, in PubMed and PubMed Central, you can search articles with Associated Data by selecting this filter in the Article Attributes sidebar.
- Browse articles for sections such as Supplementary Materials, Data Availability Statement, or Data Citations to find attached data files and links to external data.
- Use data as a search term. For example, “nonalcoholic fatty liver disease AND data”.
- Enter your search terms in Yale Library's Books+ and then filter by format Data Sets.
- Search across all National Center for Biotechnology Information (NCBI) resources to see many National Library of Medicine resources at once, including literature and databases.
Data repositories are storage locations for datasets and information about those datasets (such as metadata and documentation), and they are often themed by subject.
Tips for finding data repositories:
- Use a data repository registry, such as re3data.org and fairsharing.org.
- Use a government data portal, such as data.gov and healthdata.gov.
- Find data repositories by funder, such as the National Institutes of Health (NIH).
- Find data repositories by discipline, such as those recommended by Nature’s Scientific Data and PLoS ONE.
- Search generalist data repositories, such as Dataverse, Dryad, Figshare, Inter-University Consortium for Political and Social Research (ICPSR), Mendeley Data, Open Science Framework, Qualitative Data Repository (QDR), Synapse, Vivli, and Zenodo.
Consider who might be creating the data you need for research. This includes governing bodies, academic and research institutions, biomedical companies, and not-for-profit organizations, as well as news organizations and other invested groups.
Examples of these include:
Some academic journals focus on publishing data, and information about data science. Try searching one of the following:
Scientific Data: a peer-reviewed, open-access journal for descriptions of scientifically valuable datasets and research that advances the sharing and reuse of scientific data.
Data in Brief: a multidisciplinary, open access, peer-reviewed journal, publishing short articles that describe and provide access to research data.
Journal of Open Psychology Data: features peer-reviewed data papers describing psychology datasets with high reuse potential.
Open Health Data: features peer-reviewed data papers describing health datasets with high reuse potential.
MDPI Data Journal: a peer-reviewed open access journal in data science, with the aim of enhancing data transparency and reusability. The journal publishes in two sections: a section on the collection, treatment and analysis methods of data in science; and a section that publishes descriptions of scientific and scholarly datasets.
GigaScience: publishes all research objects from big data studies across the entire spectrum of life and biomedical sciences.
What to Ask As You Search
Does the data help me answer my research question?It’s important to formulate your research question before starting your data search, as your question can direct and inform your data search.
Does the data contain the variables I need?Once you have a research question, you may start to understand what type of analysis you want to perform. Does that data you’ve found contain the variables you need to perform your analysis? For example, if you’re studying brain disorders in young adults, you may be interested in data variables such as age at diagnosis, disorder type, and progression state. You may also be seeking brain scan images. Thinking about analysis in advance of your data search can help narrow down relevant datasets.
Is the data within the scope of my project?The data you find may involve additional work on your part before it becomes useful — for instance, you may need to conduct data cleaning, curation, or analysis to answer your research question. You may also encounter obstacles to working with the data, such as licensing, wait times, and technical challenges. It may take time and resources to acquire the data you need — and it’s important to recognize whether the data you’ve found aligns with your constraints, such as an upcoming deadline or budget limitation. For more about this, see our “Re(Use) Data” page.
Can I reuse the data?Visit our “Re(Use) Data” page to learn more about when data is usable, and why reusable data is important for science.