ML is built using the tools of mathematical statistics, numerical methods, mathematical analysis, optimization methods, probability theory, and various techniques for working with data in digital form.
Before ML, we had to create math models:
\[\begin{cases} \frac{\partial \rho}{\partial t} + \frac{\partial(\rho u_{i})}{\partial x_{i}} = 0 \\ \\ \frac{\partial (\rho u_{i})}{\partial t} + \frac{\partial[\rho u_{i}u_{j}]}{\partial x_{j}} = -\frac{\partial p}{\partial x_{i}} + \frac{\partial \tau_{ij}}{\partial x_{j}} + \rho f_{i} \end{cases} \]In machine learning, there is no pre-set model with equations...
$X$ — set of objects
$Y$ — set of answers
$y: X \to Y$ — unknown dependence (target function)
The task is to find, based on the training sample $\{x_1,\dots,x_\ell\} = X_\ell \subset X$
with known answers $y_i=y(x_i)$,
an algorithm ${\color{orange}a}: X \to Y$,
which is a decision function that approximates $y$ over the entire set $X$
The whole ML course is about this:
$f_j: X \to D_j$
The vector $(f_1(x), \dots, f_n(x))$ is so called "feature description" of the object $x$
Types of features:
Data matrix (objects and features as rows and columns):
$F = ||f_j(x_i)||_{\ell\times n} = \left[ {\begin{array}{ccc}
f_1(x_1) & \dots & f_n(x_1) \\
\vdots & \ddots & \vdots \\
f_1(x_\ell) & \dots & f_n(x_\ell)
\end{array} } \right]$
$Y = \mathbb{R}\ $ or $\ Y = \mathbb{R}^m$
A model (predictive model) — a parametric family of functions
$A = \{g(x, \theta) | \theta \in \Theta\}$,
where $g: X \times \Theta \to Y$ — a fixed function, $\Theta$ — a set of allowable values of parameter $\theta$
Linear model with vector of parameters $\theta = (\theta_1, \dots, \theta_n), \Theta = \mathbb{R}^n$:
$g(x, \theta) = \sum\limits_{j=1}^n \theta_jf_j(x)$ — for regression and ranking, $Y = \mathbb{R}$
$g(x, \theta) = \mathrm{sign}\left(\sum\limits_{j=1}^n \theta_jf_j(x)\right)$ — for classification, $Y = \{-1, +1\}$
$X = Y = \mathbb{R}$, $\ell = 50$ objects
$n = 3$ features: $\{1, x, x^2\}$ or $\{1, x, \sin x\}$
The method $\mu$ constructs an algorithm $a = \mu(X_\ell, Y_\ell)$ from the sample $(X_\ell, Y_\ell) = (x_i, y_i)_{i=1}^\ell$
$\boxed{ \left[ {\begin{array}{ccc} f_1(x_1) & \dots & f_n(x_1) \\ \dots & \dots & \dots \\ f_1(x_\ell) & \dots & f_n(x_\ell) \end{array} } \right] \xrightarrow{y} \left[ {\begin{array}{c} y_1 \\ \dots \\ y_\ell \end{array} }\right] \thinspace} \xrightarrow{\mu} a$
The algorithm $a$ produces answers $a(x_i^\prime)$ for new objects $x_i^\prime$
$\mathcal{L}(a, x)$ — loss function. The error magnitude of the algorithm $a \in A$ on the object $x \in X$
Empirical risk — functional quality of the algorithm $a$ on $X^\ell$:
$Q(a, X^\ell) = \frac{1}{\ell} \sum\limits_{i=1}^\ell \mathcal{L}(a, x_i)$
Method of minimizing empirical risk
$\mu(X^\ell) = \arg\min\limits_{a \in A} Q(a, X^\ell)$
$Y = \mathbb{R}, \mathcal{L}$ is quadratic:
$$\mu(X^\ell) = \arg\min\limits_{\theta} \sum\limits_{i=1}^{\ell} (g(x_i, \theta) - y_i)^2$$Dependency $y(x) = \frac{1}{1+25x^2}$ on the interval $x \in \left[-2, 2\right]$
Feature description $x \to (1, x, x^2, \dots, x^n)$
$a(x, \theta) = \theta_0 + \theta_1 x + \dots + \theta_n x^n$ — a polynomial of degree $n$
$Q(\theta, X^\ell) = \sum\limits_{i=1}^\ell (\theta_0 + \theta_1 x_i + \dots + \theta_n x_i^n - y_i)^2 \to \min\limits_{\theta_0,\dots,\theta_n}$
Training sample: $X^\ell = \{x_i = 4\frac{i-1}{\ell-1} - 2 | i = 1, \dots, \ell \}$
Test sample: $X^k = \{x_i = 4\frac{i-0.5}{\ell-1} - 2 | i = 1, \dots, \ell-1 \}$
What happens to $Q(\theta, X^\ell)$ and $Q(\theta, X^k)$ as $n$ increases?
When test_score >> train_score
1997: IBM Deep Blue defeats world chess champion Garry Kasparov
2004: self-driving cars competition — DARPA Grand Challenge
2006: Google Translate launched
2011: 40 years of DARPA CALO (Cognitive Assistant that Learns and Organizes) development
2011-2015: ImageNet — error rate reduced from 25% to 3.5% versus 5% in humans
2015: Creation of the open company OpenAI by Elon Musk and Sam Altman, pledged to invest $1 billion
2016: Google DeepMind beat the world champion of the game Go
2018: At the Christie's auction, a painting, formally drawn by AI, sold for $432,500
2020: AlphaFold 2 predicts the structure of proteins with over 90% accuracy for about two-thirds of the proteins in the dataset
2021: DALL-E appears — an AI system developed by OpenAI that can generate images from textual descriptions, which has potential applications in creative industries
2022: ChatGPT (Generative Pre-trained Transformer) developed by OpenAI — the fastest-growing consumer software application in history
2024: AI gets en-Nobel-ed: John Hopfield and Geoffrey Hinton receive the Nobel Prize in Physics, David Baker, Demis Hassabis and John Jumper receive the Nobel Prize in Chemistry
2025: another one release from OpenAI - GPT o3
Thank you for your attention!