What is data reuse, and what’s open data?
Data reuse is the final R of FAIR, a best practice standard for data sharing. FAIR stands for findable, accessible, interoperable, and reusable. Clear descriptions are at the heart of data reuse. When reusing data, look for well-described data, where the data’s context is apparent. When publishing data, ensure you document it thoroughly, well enough that others can understand and reuse it, and if possible and appropriate, consider publishing your data openly.
Open data is data that can be freely used, re-used, and redistributed by and to anyone as publicly available resources — definition adapted from Open Knowledge Foundation.
Why should you reuse data, and make your data reusable?
Reuse data to:
- Verify your own research.
- Mine the data for new insights.
- Work with data that matches a population or problem you’re interested in — without the overhead of data collection or generation.
- Increase collaboration by analyzing someone else’s data and connecting with like-minded data producers.
- Make open science a standard practice in medicine.
Make your data reusable to:
- Allow others to verify and validate your findings, and potentially collaborate with you.
- Propagate the research cycle and fuel new discoveries, by allowing someone to derive new findings from your data.
- Contribute to the process of tracking scientific inquiry over time.
- Allow citizen scientists to view and interact with health sciences data about their own conditions.
- Align with open science aims as set forth by many professional and cultural organizations, including the UN, UNESCO, and the National Academies of Science, Engineering, and Medicine
- Comply with funder mandates — such as NIH’s and NSF’s — and scientific transparency standards.
What potential challenges should you know about when reusing data?
- Licensing. Some data are licensed under certain terms — a common one is that you won’t attempt to re-identify research subjects — and some data require you to sign a data use agreement. Read licenses and other agreements/terms carefully, and ensure you and your research team can comply.
- Access. Sometimes, you have to fill out a data request form, or contact the creator(s) directly and ask for data access. This process can take time.
- Lack of context. As noted above in ‘what’s data reuse?,’ data documentation is central to whether data can be reused. If you’re having difficulty understanding what data variables mean, or how the data was produced, you may not be able to reuse the data.
- Technical difficulties. Sometimes, technical difficulties, such as not enough data storage or unfamiliar data formats, prevent you from accessing a dataset. Reach out to the Medical Library for help before deciding if this is a barrier, though.
- Fees. Not all data are free. If the dataset is not yet in Yale’s collections, consider requesting we purchase it through this form. Additionally, sometimes similar datasets can be found for free. Consider consulting our “Find Data” page.
Where can you find data to reuse?
Visit our “Find Data” page to learn more about how and where to find data.
- Ten Simple Rules for using public biological data for your research | PLOS Computational Biology
- A FAIR guide for data providers to maximise sharing of human genomic data | PLOS Computational Biology
- Best practices for creating reusable data publications | Dryad
- Your data can live forever: how to plan for data reuse | Mozilla
- A dataset describing data discovery and reuse practices in research | Scientific Data