Course Infos

Course description

Ten years ago, who would have thought, that R, the “environment for statistical computing and graphics”, would become one of the most popular programming languages for data scientists?

The impressive growth of R is not a coincidence. As free & open-source alternative to expensive & proprietary software like SPSS, Matlab and Excel, R’s strengths have always been its capabilities for statistical data analysis as well as its functionalities to create powerful, aesthetically appealing graphics and charts.

While R attracted a rather exclusively academic audience in the 90’s & 00’s, the R community since has grown not only by sheer number but also in diversity, as people from different industries and backgrounds discover R’ usefulness for a wide range of applications. As of February 2020, more than 15,000 (!) packages have been published to CRAN, ca. half of them since 2015.

Especially in the last decade, the functionality and versatility of R has gained momentum. Among the most popular R packages are:

In Data Science with R (DataSciR), you will learn fundamentals of R and how to use the following packages for Data Science:

  • the “tidyverse” which includes packages like dplyr and tidyr for data manipulation and ggplot2 for data visualization,
  • rmarkdown and knitr for reproducible & automated reporting,
  • shiny for creating interactive web applications, and
  • tidymodels for inferential and predictive modeling.

You will demonstrate your proficiency in these packages on a semester-long graded data science project.

Prerequisites

There are no mandatory prerequisites for DataSciR. However, you are expected to have a profound knowledge of fundamental data mining techniques, such as classification, regression and clustering. Hence, it is recommended that you have heard at least one of the following lectures (or comparable):

Also, you should have a basic programming and statistics knowledge. For example, you will learn the most important vector types and classes in R, but you will not learn what a vector or a class is in general. Accordingly, you should know what the terms mean, standard deviation, probability, etc. mean.

Software

By the end of the first week, you should have installed the following software on your own laptop:

  1. R (>=4.0.0)
  2. RStudio (>=1.4)
  3. on Windows: Rtools

Also, please check whether you can successfully install packages. To do so, click on the Packages tab in the bottom-right pane in RStudio. Then, click on the Install button and specify an arbitrary package, e.g. dplyr. Finally, click on Install. Alternatively, you can install a package from the console with install.packages("dplyr"). If everything is set up correctly, no error messages should be displayed when you load the installed package with library(dplyr).

FAQ

Q: Do I have to show up to regular course meetings?
A: There is no compulsory attendance during general course meetings. Obviously, it is recommended to attend and actively participate in the meetings. However, there are several deliverables in the context of your project during the semester. Some of them require attendance.


Q: Where can I find interesting datasets for my project?
A: Here is a list of websites with various real-world datasets:

Don’t use a small or built-in dataset like iris, mtcars or Titanic in your project! Also, very popular data sources like Kaggle or the UCI ML repository are deprecated because most of datasets have been extensively studied already.


Q: I’ve no prior programming experience with R? Is that a problem?
A: Programming experience with R is not a mandatory prerequisite. An introduction to the fundamentals of the R language will be given in the second course week. However, please take into account that this introduction cannot cover all details. You are expected to work through additional ressources and materials for yourself or together with your team. Most of the referenced materials are freely available.


Q: Where can I get some inspiration for our project?
A: Please have a look at the hall of fame for an overview of student projects from 2019, 2020, and 2021.