R for data science

Esittely

Course introduction

Course structure

The backbone of this training is a workshop, where the focus is on the effective usage of R language in data science related projects. This program assumes that participants have reliable hands-on skills with general R usage, and that they are interested in using R for effective data analysis at an advanced level. Participants will get background information and hands-on practices to understand the usefulness of R in their own situation. Participants will use their own laptop (with Windows, Mac, or Linux operating systems) to get the material from the workshop into their own use. The trainers at the workshop will assist all participants so that they benefit most from the offered material.

Day1

The first day of the workshop is focused on topics essential to use R in data science projects. Real life issues related to software installation, data collection and validation are covered to facilitate the successful implementation of projects.

General introduction and setting up environment

  1. Course structure, instructors, mode of study
  2. Install R and RStudio
  3. Install packages used in the projects
  4. Github integration

Data IO

  1. Import data from websites
  2. Import data from public APIs
  3. Import data from proprietary databases

Working with Tidy data

  1. Introduction to Tidy data
  2. Packages of the Tidyverse
  1. Tidy up data
  2. Data conversion, selection, join tables
  3. Visualize tidy data with ggplot

Large datasets

  1. Sampling at data or IO level
  2. Replace for loops with apply functions
  3. Parallelization: multicore CPUs, computer clusters

Reproducible analysis and reporting

1. Rmarkdown
2. Dynamic documents with Knitr

Network based collaboration

  1. RStudio server version
  2. Online data visualization: Shiny

R in cloud environments

  1. Virtualization with Docker
  2. IBM Cloud: Jupyter Notebook with R and Spark

 

Day2
Project 1: Mapping bus service density

Demonstrated topics:

  1. Import static data from public API
  2. Transform data with tidyr
  3. Join data frames from different sources
  4. Visualize geospatial data using Google Maps API

Project 2:

Demonstrated topics:
1. Use eurostat wrapper package to obtain data from Eurostat

  1. Search for topic specific Eurostat data
  2. Downsampling large data set
  3. Reformatting aggregated data
  4. Visualize tidy data with ggplot
  5. Customize ggplots with esquisse
  6. Visualize tables with knitr and grid table

 

OPTIONAL: Online practice

The workshop can be supported by 6 weeks of online practices. During this period, every week is dedicated to a group of problems which are known to be problematic for workshop participants. The online practices involve the assistance of tutors via our Internet based Learning Management System, where participants can get answers to their issues from trained personnel. This solution facilitates the real-world application of the covered material and offers enhanced gains from this course.

Week 1 – Setup R and RStudio, package installation; Github setup
Week 2 – Data import from Wikipedia, FMI API, PostrgreSQL database, MongoDB Week 3 – Tidyfy data from different sources
Week 4 – Analyze large datasets on small computer: downsampling
Week 5 – Analyze large datasets in the cloud: use Jupyter notebooks in IBM Cloud Week 6 – Reporting with Knitr and RStudio

+ Lue koko esittely

Kouluttaja:

Python, R

Csaba Ortutay

Chief Executive Officer of HiDucator Ltd, Adjunct Professor at the University of Tampere

Dr Ortutay,  is an accomplished data science professional who has been developing data analysis courses since 2008. He has a special focus on the analysis of data from molecular biology field, and in more general, the application of the R language in different projects. He was the head of the Master’s Degree Program in Bioinformatics at the University of Tampere in 2012, and holds an adjunct professor title there.

Lue lisää

R for data science

Python, R
Teema:
Ketterä kehitys
Kouluttaja:
Csaba Ortutay
Kieli:
English
Kesto:
2 days
Paikka:
Remote training
Aloituspäivämäärät:
Ota yhteyttä

Koulutusohjelmalla / kurssilla ei ole aktiivisia aloituspäivämääriä, jos olet kiinnostunut kurssista ota yhteyttä.

Ota yhteyttä

Ottakaa yhteyttä:

 

  • Kenttä on validointitarkoituksiin ja tulee jättää koskemattomaksi.

Saattaisit olla kiinnostunut myös näistä

Ketterä kehitys

SQL perusteet

Lue lisää
Ketterä kehitys

SQL Advanced – syventävät päivät

Lue lisää
+