Python for advanced data analysis and machine learning

Course introduction

Course structure

The aim of the course is to provide insights into advanced data analysis modules in Python, and to offer a feeling about how to use machine learning to address a real life problem.

Workshop

This course is organized as a two full days workshop, and it assumes that the attendants have basic knowledge of Python. The course is a practice-oriented one with a project assignment announced at the beginning of the course. The project will use real data sets, focused on a real life classification problem. The course is intended for work in small groups (ideally 2 students/group) and uses pair-programming as pedagogical method. Participants will use their own device with Python and some IDE with which the user is familiar with. Demonstrators will use Spyder. During the project the full extend of an idealized data science project will be demonstrated:

  1. Data understanding
  2. Data preparation
  3. Data modeling and Evaluation
  4. Implementation
  5. Deployment

Day1

The first day of the workshop is focused on data acquisition, understanding, and modeling.

General introduction

  1. Course structure, instructors, mode of study
  2. The place of Python in data science and machine learning
  3. Background information on the project problem: data source, project goal

Setting up the environment

1. Install the used Python libraries

  1. Install Pytorch
  2. Optionally: GitHub integration

Data understanding

  1. Data acquisition
  2. Import data to Pandas
  3. Descriptive statistics of categorical and numeric data
  4. Data visualization: bar plots, histograms, box plots, scatter plots

Data transformation

  1. Normalization and standardization
  2. Data transformation for machine learning purposes
  3. Pre-processing: select data, merge data

Data modeling

  1. Brief introduction for simple neural networks
  2. Steps of training a machine learning model
    1. Loss calculation
    2. Model optimization
    3. How this looks with Pytorch
  3. Split data: training, validation, test subsets
  4. Saving and loading trained models

Model evaluation

  1. Simple model performance metrics: confusion matrix
  2. Complex metrics: Accuracy, precision, recall, sensitivity, specificity
  3. Comparing models: ROC curve and area under curve
  4. Cross-validation

 

Day2

The second day of the workshop is using the trained model from the first day of the curve and implements it as an API service in the Amazon Web Service environment.

Predicting with trained model

  1. Predictions with ML model
  2. On-fly data transformation and pre-processing
  3. Prototype predictor

API service in AWS environment

  1. The idea behind cloud computing and its models (IaaS, PaaS, SaaS)
  2. The idea of scaleability
  3. AWS Lambda for running Python code
  4. AWS API services, using AWS lambda as back-end

Transform predictor to and API

  1. Data-sanity check at AWS
  2. IO from AWS environment
  3. Test API with AWS tool
  4. Test API with Postman
  5. Optionally: integrate the new API with other services
+ Read more

Educator:

Python, R

Csaba Ortutay

Chief Executive Officer of HiDucator Ltd, Adjunct Professor at the University of Tampere

Dr Ortutay,  is an accomplished data science professional who has been developing data analysis courses since 2008. He has a special focus on the analysis of data from molecular biology field, and in more general, the application of the R language in different projects. He was the head of the Master’s Degree Program in Bioinformatics at the University of Tampere in 2012, and holds an adjunct professor title there.

Read more

Python for advanced data analysis and machine learning

Python, R
Theme:
Agile Development
Educator:
Csaba Ortutay
Dates:
  • 27.04.2020
Language:
English
Duration:
2 days
Price: 1900€ + vat
Sign up

More than one participants from same company?

We also organize company-specific courses.

Course for company

You might be interested in

+