1. For $f(x): \mathbb{R} \to \mathbb{R}$
\[ Df(x)[h] = f'(x) \cdot h \]2. For $f(x): \mathbb{R}^n \to \mathbb{R}$
\[ Df(x)[h] = \sum\limits_i \frac{\partial f}{\partial x_i} \cdot h_i = \left< \nabla f(x), h \right> \] $\nabla f$ — gradient of function $f$3. For $f(x): \text{Matrix}_{m\times n} \to \mathbb{R}$
\[ Df(x)[H] = \sum\limits_{i,j} \frac{\partial f}{\partial x_{ij}} \cdot h_{ij} = \mathrm{tr}(\nabla f^T H) = \left< \nabla f(x), H \right> \]4. For $f(x): \mathbb{R}^m \to \mathbb{R}^m$, defined as $f(x) = (g(x_1), \dots, g(x_m))$
\[ f(x+h)-f(x) = (\dots, g(x_i+h_i) - g(x_i) ,\dots) \approx \\ (\dots, g'(x_i) \cdot h_i ,\dots) = f'(x) \odot h \]5. For $f(X): \text{Matrix}_{m\times n} \to \text{Matrix}_{m\times k}$, defined as $f(X) = XW$
\[ f(X+H)-f(X) = HW = Df(X)[H]\text{ — is already linear function} \]Warning: there is no $\nabla f(x)$ in 4 and 5 item and actually you don't need it.
Since \[ D v(u(x)) [h] \approx v(u(x+h)) - v(u(x)) \approx \\ Dv (u(x)) [u(x+h)-u(x)] \approx Dv (u(x)) [D u(x)[h]] \]
1. $f(x) = \left< a, x \right>$ (dot product)
2. $f(x) = \left< Ax, x \right>$
3. $f(X) = X^{-1}$
4. $f(X) = \det X$
5. $f(x) = \| Ax-b \|^2$
6. $f(X) = \log \det X$
7. $f(X) = \mathrm{tr}(AX^TX)$
8. $f(X) = \det(AX^{-1}B)$ and $A$, $B$ are not necessary square
9. $ \begin{cases} \| X - A \|_{fro} \to \min \\ X = X^T \end{cases}$ where $\| Y \|_{fro} = \sqrt{\mathrm{tr}(Y^TY)}$
Let's write the Lagrangian for square:
$$ \mathcal{L} = \mathrm{tr}\left( (X-A)^T(X-A) \right) + \sum\limits_{ij} \lambda_{ji}(x_{ji}-x_{ij}) = \\ = \mathrm{tr}\left( (X-A)^T(X-A) \right) + \mathrm{tr}\left( \Lambda^T (X^T-X) \right) \\ D\mathcal{L} = \mathrm{tr}\left( (DX)^TX+X^TDX - A^TDX - (DX)^TA + \Lambda^T(DX)^T - \Lambda^T DX\right) \\ \mathrm{tr}\left([\quad ?\quad]DX^T \right) \Rightarrow \nabla \mathcal{L} = X+X-A-A+\Lambda^T-\Lambda \\ \nabla \mathcal{L} = 2X-2A+\Lambda^T-\Lambda = 0, \nabla \mathcal{L}^T = 2X-2A^T+\Lambda-\Lambda^T = 0 \\ \nabla \mathcal{L} + \nabla \mathcal{L}^T = 0 \Rightarrow 4X = 2A+2A^T \Rightarrow \boxed{X = \frac{A+A^T}{2}}$$Hessian — a square matrix of second-order partial derivatives of a scalar-valued function.
\[ f(x+h) - f(x) = Df(x)[h] + \frac12 D^2f(x)[h, h] + o(\|h\|^2) \] \[ D^2f(x)[h, h] = h^T \nabla^2 f(x) h \] $\nabla^2 f(x)$ is HessianLet $f(x) = \| Ax-b \|^2$
When our solution $x$ of $f(x) = 0$ is unique?