Course Structure
The aim of the course is to provide insights into advanced data analysis modules in Python, and to offer a feeling about how to use machine learning to address a real life problem.
Workshop
This course is organized as a two full days workshop, and it assumes that the attendants have basic knowledge of Python. The course is a practice-oriented one with a project assignment announced at the beginning of the course. The project will use real data sets, focused on a real life classification problem. The course is intended for work in small groups (ideally 2 students/group) and uses pair-programming as pedagogical method. Participants will use their own device with Python and some IDE with which the user is familiar with. Demonstrators will use Spyder. During the project the full extend of an idealized data science project will be demonstrated:
- Data understanding
- Data preparation
- Data modeling and Evaluation
- Implementation
- Deployment
Day1
The first day of the workshop is focused on data acquisition, understanding, and modeling.
General introduction
- Course structure, instructors, mode of study
- The place of Python in data science and machine learning
- Background information on the project problem: data source, project goal
Setting up the environment
1. Install the used Python libraries
- Install Pytorch
- Optionally: GitHub integration
Data understanding
- Data acquisition
- Import data to Pandas
- Descriptive statistics of categorical and numeric data
- Data visualization: bar plots, histograms, box plots, scatter plots
Data transformation
- Normalization and standardization
- Data transformation for machine learning purposes
- Pre-processing: select data, merge data
Data modeling
- Brief introduction for simple neural networks
- Steps of training a machine learning model
- Loss calculation
- Model optimization
- How this looks with Pytorch
- Split data: training, validation, test subsets
- Saving and loading trained models
Model evaluation
- Simple model performance metrics: confusion matrix
- Complex metrics: Accuracy, precision, recall, sensitivity, specificity
- Comparing models: ROC curve and area under curve
- Cross-validation
Day2
The second day of the workshop is using the trained model from the first day of the curve and implements it as an API service in the Amazon Web Service environment.
Predicting with trained model
- Predictions with ML model
- On-fly data transformation and pre-processing
- Prototype predictor
API service in AWS environment
- The idea behind cloud computing and its models (IaaS, PaaS, SaaS)
- The idea of scaleability
- AWS Lambda for running Python code
- AWS API services, using AWS lambda as back-end
Transform predictor to and API
- Data-sanity check at AWS
- IO from AWS environment
- Test API with AWS tool
- Test API with Postman
- Optionally: integrate the new API with other services