Home Research Data Learn to Work with Data

Learn to Work with Data

Consultations

banner with the words everyone can code hung on a wall

Got data questions? Book a free consultation with Cushing/Whitney Medical Library’s data librarian for yourself, group, or team to discuss data-related research questions and needs.

Most consultations occur over Zoom, but you may also send your question via email, or schedule an in-person meeting.

Examples of consultation topics
  • How to find and select health sciences dataset(s), including how to reuse open public data for research or course assignments
  • Best practices for research data management
  • How to make your research more open, accessible, and reproducible, and how to make your data FAIR (findable, accessible, interoperable, and reusable)
  • Data compliance and governance, including data use agreements, data licensing, and proper data storage methods
  • Data processing, analysis, and visualization
  • Data tools and software to support your research, such as Python and R

Self-Guided Resources

closeup of a macbook screen with code present

Looking for asynchronous ways of learning? Here are some of our favorite free resources for learning to work with data.

Python
  • Learn Python the Right Way [online book] — A free, comprehensive guide to Python and programming in general. If you've taken "Getting Started with Python" at the library, you will already be familiar with Replit, which this book also uses for exercises.
  • Python for Non-Programmers [LinkedIn Learning] — Access this free course, aimed at those new to programming, through Yale's subscription to LinkedIn Learning. Like the above suggestion, this course also uses Replit.
  • RealPython.com [online resource and tutorials] — An expansive collection of free Python tutorials, as well as other resources like forums, podcasts, and helpful articles.
  • An Introduction to Programming for Bioscientists: A Python-Based Primer. [open-access journal article] — Read this PLOS Computational Biology article for a step-by-step guide to getting started with Python for biological and biomedical use.Full citation: Ekmekci B, McAnany CE, Mura C (2016) An Introduction to Programming for Bioscientists: A Python-Based Primer. PLOS Computational Biology 12(6): e1004867. https://doi.org/10.1371/journal.pcbi.1004867.
  • Automate the Boring Stuff [online book] — A free, excellent introduction to all things automation, including web scraping, reminder applications, data formatting, auto-complete forms, and more.
  • CS Dojo’s Python Tutorial for Absolute Beginners [YouTube videos] — If you prefer to learn through video, this is a great series.
  • Python Documentation — Official Python docs are available at python.org, where you can also find a beginner's guide and many additional resources. We also recommend W3 Schools Python Tutorial as supplementary quick-reference documentation and as a learning resource.
R
  • SwirlStats [interactive tutorial and R package] — SwirlStats allows you to "Learn R, in R!" This interactive tutorial provides an immersive experience for learning R and data science concepts.
  • R Programming [online course] — A comprehensive online Coursera course for getting up and running with R, R programming and troubleshooting, and simulation and profiling in R.
  • R for Data Science [online book with exercises] — From RStudio's Chief Scientist and the inventor of the concept of "tidy data" comes this book: the definitive guide to R, the Tidyverse, and how to use R for data science.
  • Ten simple rules for teaching yourself R. [open-access journal article] — Read this PLOS Computational Biology article for a step-by-step guide to getting started with R on your own.Full citation: Lawlor J, Banville F, Forero-Muñoz NR, Hébert K, Martínez-Lanfranco JA, et al. (2022) Ten simple rules for teaching yourself R. PLOS Computational Biology 18(9): e1010372. https://doi.org/10.1371/journal.pcbi.1010372.
  • R-Bloggers [online resource] — A blog aggregator for content about R, R programming, data science, and statistics. A great place to learn what's new in R and find tutorials and guides on a variety of topics.
  • R Documentation — Official R docs are available at r-project.org. We also recommend W3 Schools R Tutorial as supplementary quick-reference documentation and as a learning resource.
From spreadsheets to programming languages
Data cleaning

What is data cleaning?

Data cleaning typically involves changing a dataset to adjust for information that is:

  • Malformed (for example, incorrect, incomplete, inconsistent, corrupted, poorly formatted, etc.)
  • Duplicated
  • Missing
  • Other (for example, outliers, irrelevant rows/columns, etc.)

See resources below for more information on what data cleaning is and how to do it.

Questions to ask before data cleaning:

  1. Is anything actually wrong with the data? Deal with this first. See list above for possible issues.
  2. What’s missing in the data, and why? This may require you to gather more documentation, or more data. Once you have the information you need, make a plan for how you want to deal with missing data.
  3. What do you have planned for analysis? Do you need to make data more consistent to enable clean visualizations, for instance? Are you most interested in a subset of the data? Data cleaning can be endless; prioritize tasks that affect analysis.

Data cleaning resources:

Data visualization

Data visualization guidance, types, and recommended tools are compiled in this research guide on data visualization.

Working with data in spreadsheets
Free datasets for practicing your data skills

 

More Support

Across Yale

Beyond Yale