OBJECTIVE

The traditional approach to research programmes is to assume that students will find a way to analyse and visualise their data. This assumption brings problems for the students, their supervisors and a significant waste of time. Many students are scared by the data rather than curious and usually skip exploratory data analysis and go straight to advanced statistical models that they cannot explain later because they do not understand their data in depth. This course aims to provide basic knowledge, skills and tools to perform such an exploratory data analysis, with a major focus on publication-ready data visualisation to detect patterns and trends in the data, to extract meaningful information from the data and to prepare for further inferential analysis.

LEARNING OUTCOMES

On successful completion of the course, the students will be able to understand/perform:

Knowledge

  • Exploratory Data Analysis
  • Grammar of graphics
  • Data Management
  • Reproducibility

Skills and Tools

  • Master the RStudio interface
  • Install additional packages
  • Identify the formats of the tidy data structure
  • How to deal with errors in coding and search for answers
  • Identify an optimal strategy for collecting and exporting data: Forms, Spreadsheets, CSV
  • Import their data into R
  • Perform an initial exploratory data analysis: head, dim, NA values, cross-tabulation
  • Identify basic data visualisation options: columns, points, histograms, bar- and boxplots
  • Generate summary tables
  • Use repositories to maintain, manage and share their data: GIT, OSF
  • Export the results of the exploratory analysis: Rmarkdown
  • Good practices in data organisation, naming and coding

Competence

On successful completion of the course, the students will have the knowledge and practical skills to successfully apply R and its essential functions and packages to wrangle and transform their research data to perform informative exploratory analyses and perform publication-ready visualisation of their data, enabling effective interpretation and communication of the research results and findings to the scientific community.

COURSE APPROVAL

To approve the course, each student must present a final project and code script with the exploratory analysis of data from an original dataset of their choice.

PRE-REQUISITES

SOFTWARE AND PACKAGES REQUIRED

Please tune into class with a laptop that has the following installed:

  • A recent version of R (>=3.9.0), which is available for free at https://cran.r-project.org/

  • A recent version of RStudio Desktop (>=1.3.0), available for free at https://www.rstudio.com/download (RStudio Desktop Open Source License)

  • The R packages we will use, which you can install by connecting to the internet, opening RStudio, and running at the command line:

    install.packages(c("tidyverse", "gtsummary", 
                             "janitor", "ggpubr", 
                       "ggthemes", "naniar", "NHANES"))

Sessions

Session Theme Contents
1 Data Visualization: Why - Introduction to R - The grammar of graphics: data, geoms, aes - Basic visualisations - How to deal with errors
2 Data Visualization II: How geom_points - geom_histogram - geom_col - geom_bar - facets - Locating and dealing with NA
3 Data Wrangling: Why Filter - Select - Mutate - Summarise I - Arrange - lubridate
4 Data Wrangling II: How Pivoting - group_by - Summarise II - gtsummary
5 Data Management GIT, OSF - Data workflow: data validation, data form entry, when and when not to use spreadsheets - Files naming - Basic of coding management: tidylog - Basic of data cleaning: janitor - Good code practices and data sharing Codebook (dataMaid)
6 Final project Students presentations, 15 min per project, max 6 project

LECTURES

Compulsory

Data Science Books (required some chapters)