Course ML with Python, fall 2023

ML with Python

Python intro, int objects in memory

History of popular programming languages
Main goals of the course
Compilation and interpretation
Size of int in Python and some examples

Slides

Python types, mutables and immutables

Sizes of objects in Python
Main uses of the language
Dive into mutables and immutables
Standart types algorithmic complexity

Slides

Homework 1: Introduction to Python with PyCharm

Variables, Strings, Data structures, Condition expressions, Loops, Functions, Classes and objects, Modules and packages, File input and output.

Indexing, slices, comprehensions and collections

Indexing, slices, unpacking
List and dict comprehensions
Module collections and their algorithmic complexities
Garbage collector and reference count

Slides

Numpy and Pandas

Numpy basics
Broadcasting and Vectorization
Pandas basics
DataFrame object, join and merge

Slides

Homework 2: Python Libraries - NumPy

Array Basics, Array Indexing and Slicing, Transposing, Sorting and Concatenating, Compare Search, Array Math, Arrays of String and Unicode Values.

Machine learning intro

Main concepts of machine learning: learning from precedents (supervised), objects, features, answers, model algorithms, learning method, empirical risk, overfitting
Overfitting prevention: HoldOut, LeaveOneOut, CrossValidation
Chronology of significant events in machine learning

Slides

Matrix differentiation

Recap of function differentiation
Differential
Matrix differential properties
Hessian

Slides

Homework 3: Gateway to Pandas

Understanding Pandas data structures: DataFrame and Series. Data summarizing, filtering, sorting. Simple implementation of your own TED Talks recommendation model to deepen your knowledge and proficiency with Pandas.

Linear Models, Stochastic Gradient Descent

Linear models of regression and classification
The Stochastic Gradient (SG, SAG) method is suitable for any models and loss functions
Approximation of the threshold loss function
Regularization solves the multicollinearity problem and also reduces overfitting
Likelihood maximization and minimization of the empirical risk are different views on the same optimization problem

Slides

Linear models practice

Analytical and numerical approaches to solve linear regression
from sklearn.linear_model import LinearRegression
Learning rate choosing

ipynb

Logical Rules and Decision Trees

Logical regularity definition
Local rules searching and modifications, Pareto front
Decision Trees: definition, construction and using

Slides

Homework 4: Linear models theory

Logistic regression, decision trees.

Metric methods practice

K-nearest neighbors classifier (KNN)
Parzen Window Method
Potential Function Method
Nadaraya-Watson Estimator

ipynb

Homework 5: Mini project: Horror Trees or Bayes Guards SMS

Ensembles, gradient boosting and random forest

Simple and weighted voting, mixture of experts
Boosting, bagging, RSM
XGBoost, CatBoost, LightGBM
Random forest

Slides

Ensembles practice

Bagging, Boosting
GradBoost, XGoost, CatBoost, LightGBM
Random forest

ipynb

Homework 6: Classification task on Kaggle Community Competitions

Intro to neural networks and backpropagation

Rise of neural networks
Expressive power of neural network
Backpropagation algorithm

Slides

Backpropagation practice

MNIST dataset
Computational graph
Micrograd by Andrej Karpathy

ipynb

Homework 7: Simple neural network in NumPy

Intro to language modelling: bigrams

Makemore by Andrej Karpathy
Bigrams language modeling

ipynb

Intro to language modelling: Multi Layer Perceptron

MLP language model paper
Cross entropy loss
Some results

ipynb

Activations, Gradients, BatchNorm

Logits visualization and dead neurons
BatchNorm
Gradients and weights plots

ipynb

Homework 8 (bonus): Image classification task on Kaggle

Building a WaveNet

PyTorchify previous code
Dilated causal convolutional layers
Brief preview of convolutions

ipynb

Convolutional neural networks

Brief history of computer vision
The progress of convolutional neural nets
Details of AlexNet model

Slides

Building GPT from scratch

Attention is all you need
Math trick in self-attention
Layer normalization and dropout

ipynb

Bayesian methods intro

Conditional probability and Bayes' theorem
Comparison of Frequentist and Bayesian approaches
Markov Chain Monte Carlo (MCMC) and Gibbs sampling

Slides

Bayesian methods practice

Conjugate distributions
Maximum Likelihood Estimation
Metropolis-Hastings algorithm

ipynb

If you want to use the materials (e.g., figures) in your paper/report and to cite this course, you can do this using the following BibTex:

                    @misc{avalur2023mlCourse,

                        title={ML with Python},

                        url={https://avalur.github.io/ml_with_python.html},

                        author={Alexander Avdiushenko},

                        year={2023},

                        month={Sep}

                    }