ML with Python

  Python intro, int objects in memory

  • History of popular programming languages
  • Main goals of the course
  • Compilation and interpretation
  • Size of int in Python and some examples
   Slides

  Python types, mutables and immutables

  • Sizes of objects in Python
  • Main uses of the language
  • Dive into mutables and immutables
  • Standart types algorithmic complexity
   Slides

Homework 1: Introduction to Python with PyCharm

Variables, Strings, Data structures, Condition expressions, Loops, Functions, Classes and objects, Modules and packages, File input and output.

  Indexing, slices, comprehensions and collections

  • Indexing, slices, unpacking
  • List and dict comprehensions
  • Module collections and its algorithmic complexities
  • Garbage collector and reference count
   Slides

  Numpy and Pandas

  • Numpy basics
  • Broadcasting and Vectorization
  • Pandas basics
  • DataFrame object, join and merge
   Slides

Homework 2: Python Libraries - NumPy

Array Basics, Array Indexing and Slicing, Transposing, Sorting and Concatenating, Compare Search, Array Math, Arrays of String and Unicode Values.

  Machine learning intro

  • Main concepts of machine learning: learning from precedents (supervised), objects, features, answers, model algorithms, learning method, empirical risk, overfitting
  • Overfitting prevention: HoldOut, LeaveOneOut, CrossValidation
  • Chronology of significant events in machine learning
   Slides

  Matrix differentiation

  • Recap of function differentiation
  • Differential
  • Matrix differential properties
  • Hessian
   Slides

Homework 3: Gateway to Pandas

Understanding Pandas data structures: DataFrame and Series. Data summarizing, filtering, sorting. Simple implementation of your own TED Talks recommendation model to deepen your knowledge and proficiency with Pandas.

  Linear Models, Stochastic Gradient Descent

  • Linear models of regression and classification
  • The Stochastic Gradient (SG, SAG) method is suitable for any models and loss functions
  • Approximation of the threshold loss function
  • Regularization solves the multicollinearity problem and also reduces overfitting
  • Likelihood maximization and minimization of the empirical risk are different views on the same optimization problem
   Slides

  Linear models practice

  • Analytical and numerical approaches to solve linear regression
  • from sklearn.linear_model import LinearRegression
  • Learning rate choosing
   ipynb

  Logical Rules and Decision Trees

  • Logical regularity definition
  • Local rules searching and modifications, Pareto front
  • Decision Trees: definition, construction and using
   Slides

Homework 4: Linear models theory

Logistic regression, decision trees.

  Metric methods practice

  • K-nearest neighbors classifier (KNN)
  • Parzen Window Method
  • Potential Function Method
  • Nadaraya-Watson Estimator
   ipynb

  Ensembles, gradient boosting and random forest

  • Simple and weighted voting, mixture of experts
  • Boosting, bagging, RSM
  • XGBoost, CatBoost, LightGBM
  • Random forest
   Slides

  Ensembles practice

  • Bagging, Boosting
  • GradBoost, XGoost, CatBoost, LightGBM
  • Random forest
   ipynb

  Intro to neural networks and backpropagation

  • Rise of neural networks
  • Expressive power of neural network
  • Backpropagation algorithm
   Slides

  Backpropagation practice

  • MNIST dataset
  • Computational graph
  • Micrograd by Andrej Karpathy
   ipynb

  Intro to language modelling: bigrams

  • Makemore by Andrej Karpathy
  • Bigrams language modeling
   ipynb

  Intro to language modelling: Multi Layer Perceptron

  • MLP language model paper
  • Cross entropy loss
  • Some results
   ipynb

  Activations, Gradients, BatchNorm

  • Logits visualization and dead neurons
  • BatchNorm
  • Gradients and weights plots
   ipynb

  Building a WaveNet

  • PyTorchify previous code
  • Dilated causal convolutional layers
  • Brief preview of convolutions
   ipynb

  Convolutional neural networks

  • Brief history of computer vision
  • The progress of convolutional neural nets
  • Details of AlexNet model
   Slides

  Building GPT from scratch

  • Attention is all you need
  • Math trick in self-attention
  • Layer normalization and dropout
   ipynb

  Bayesian methods intro

  • Conditional probability and Bayes' theorem
  • Comparison of Frequentist and Bayesian approaches
  • Markov Chain Monte Carlo (MCMC) and Gibbs sampling
   Slides

  Bayesian methods practice

  • Conjugate distributions
  • Maximum Likelihood Estimation
  • Metropolis-Hastings algorithm
   ipynb

If you want to use the materials (e.g., figures) in your paper/report and to cite this course, you can do this using the following BibTex:

@misc{avalur2023mlCourse,
title={ML with Python},
url={https://avalur.github.io/ml_with_python.html},
author={Alexander Avdiushenko},
year={2023},
month={Sep}
}