Gradients, Jacobian Matrices, and the Chain Rule Review
Gradients, Jacobian Matrices, and the Chain Rule Review
We will now review some of the recent material regarding gradients, Jacobian matrices, and the chain rule for functions from $\mathbb{R}^n$ and $\mathbb{R}^m$. Let $S \subseteq \mathbb{R}^n$, $\mathbf{c} \in S$, and $\mathbf{f} : S \to \mathbb{R}^m$ unless otherwise stated.
- On The Gradient of a Differentiable Function from Rn to R page we said that if $f : S \to \mathbb{R}$ and all of the partial derivatives of $f$ at $\mathbf{c}$, $D_1 f(\mathbf{c})$, $D_2 f(\mathbf{c})$, …, $D_n f(\mathbf{c})$ exist, then the Gradient of $f$ at $\mathbf{c}$ is defined as:
\begin{align} \quad \nabla f(\mathbf{c}) = \left ( D_1 f(\mathbf{c}), D_2 f(\mathbf{c}), ..., D_n f(\mathbf{c}) \right ) \end{align}
- We noted that if $f$ is differentiable at $\mathbf{c}$ or even if only all of the directional derivatives of $f$ at $\mathbf{c}$ exist then the gradient of $f$ at $\mathbf{c}$ is defined.
- We then looked at a nice theorem which states that if $f$ is differentiable at $\mathbf{c}$ then there exists a unit vector $\mathbf{u} \in \mathbb{R}^n$ for which:
\begin{align} \quad \mid f'(\mathbf{c}, \mathbf{u}) \mid = \| \nabla f(\mathbf{c}) \| \end{align}
- In particular, we saw that if $\mathbf{u}$ is the unit vector in the same direction as $\nabla f(\mathbf{c})$ then the equality above will hold. We also noted that $\mid f'(\mathbf{c}, \mathbf{u}) \mid$ never exceed $\| \nabla f(\mathbf{c}) \|$ and so the maximum rate of change of $f$ at $\mathbf{c}$ occurs in the direction of $\nabla f(\mathbf{c})$.
- On The Jacobian Matrix of Differentiable Functions from Rn to Rm page we said that if $\mathbf{f}$ is differentiable at $\mathbf{c}$ then the Jacobian Matrix of $\mathbf{f}$ at $\mathbf{c}$ is defined to be the $m \times n$ matrix given by:
\begin{align} \mathbf{D} \mathbf{f} (\mathbf{c}) = \begin{bmatrix} D_1 f_1 (\mathbf{c}) & D_2 f_1 (\mathbf{c}) & \cdots & D_n f_1 (\mathbf{c}) \\ D_1 f_2 (\mathbf{c}) & D_2 f_2 (\mathbf{c}) & \cdots & D_n f_2 (\mathbf{c}) \\ \vdots & \vdots & \ddots & \vdots \\ D_1 f_m (\mathbf{c}) & D_2 f_m (\mathbf{c}) & \cdots & D_n f_m (\mathbf{c}) \\ \end{bmatrix} = \begin{bmatrix} \nabla f_1(\mathbf{c})\\ \nabla f_2(\mathbf{c})\\ \vdots\\ \nabla f_m(\mathbf{c}) \end{bmatrix} \end{align}
- Like above, the Jacobian of $\mathbf{f}$ exists provided that all of the partial derivatives of all of the component functions of $\mathbf{f}$ exist. We proved that if $\mathbf{f}$ is indeed differentiable at $\mathbf{c}$ then the total derivative of $\mathbf{f}$ at $\mathbf{c}$ is identically the Jacobian matrix of $\mathbf{f}$ at $\mathbf{c}$, i.e.:
\begin{align} \quad \mathbf{f}'(\mathbf{c}) = \mathbf{D} \mathbf{f} (\mathbf{c}) \end{align}
- We also noted that if $m = 1$, i.e., $f : S \to \mathbb{R}$ ($f$ is a real-valued function) then the Jacobian of $f$ at $c$ is precisely the gradient of $f$ at $\mathbf{c}$. Therefore, if $f$ is differentiable at $\mathbf{c}$ then the Jacobian of $f$ at $\mathbf{c}$ evaluated at $\mathbf{v} \in \mathbb{R}^n$ gives us the directional derivative of $f$ at $\mathbf{c}$ in the direction of $\mathbf{v}$ and is equal to the $\nabla f(\mathbf{c}) \cdot \mathbf{v}$, i.e., another proof that $f'(\mathbf{c}, \mathbf{v}) = \nabla f(\mathbf{c}) \cdot \mathbf{v}$.
- On the A Bound for the Total Derivative of a Function from Rn to Rm we proved that if $\mathbf{f}$ is differentiable at $\mathbf{c}$ then for all $\mathbf{v} \in \mathbb{R}^n$ we have that:
\begin{align} \quad \| \mathbf{f}'(\mathbf{c})(\mathbf{v}) \| \leq M \| \mathbf{v} \| \end{align}
- Where $\displaystyle{M = \sum_{k=1}^{m} \| \nabla f_k(\mathbf{c}) \| }$. Notice that this is simply a generalization to the inequality $\mid f'(\mathbf{c}, \mathbf{u}) \mid \leq \| \nabla f(\mathbf{c}) \|$ that we established earlier for real-valued functions.
- On The Chain Rule for Compositions of Differentiable Functions from Rn to Rm page we saw that if $\mathbf{f}$ and $\mathbf{g}$ are functons for which the composition $\mathbf{h} = \mathbf{f} \circ \mathbf{g}$ makes sense, and if $\mathbf{g}$ is differentiable at $\mathbf{a}$ and $\mathbf{f}$ is differentiable at $\mathbf{b} = \mathbf{g}(\mathbf{a})$ then $\mathbf{h}$ is differentiable at $\mathbf{a}$ and:
\begin{align} \quad \mathbf{h}'(\mathbf{a}) = \mathbf{f}'(\mathbf{b}) \circ \mathbf{g}'(\mathbf{a}) = \mathbf{f}'(\mathbf{g}(\mathbf{a})) \circ \mathbf{g}'(\mathbf{a}) \end{align}
- On The Matrix Form of the Chain Rule for Compositions of Differentiable Functions from Rn to Rm page we stated the chain rule in terms of matrices. That is, if $\mathbf{f}$ and $\mathbf{g}$ are functions such that the composition $\mathbf{h} = \mathbf{f} \circ \mathbf{g}$ is well-defined and if $\mathbf{g}$ is differentiable at $\mathbf{a}$ and $\mathbf{f}$ is differentiable at $\mathbf{b} = \mathbf{g}(\mathbf{a})$ then $\mathbf{h}$ is differentiable at $\mathbf{a}$ and:
\begin{align} \quad [\mathbf{D} \mathbf{h} (\mathbf{a})] = [\mathbf{D} \mathbf{f} (\mathbf{b})][\mathbf{D} \mathbf{g}(\mathbf{a})] = [\mathbf{D} \mathbf{f} (\mathbf{g}(\mathbf{a}))][\mathbf{D} \mathbf{g}(\mathbf{a})] \end{align}
- Furthermore, if $\mathbf{g} : S \to \mathbb{R}^m$, $\mathbf{f} : R(\mathbf{g}) \to \mathbb{R}^m$ such that:
\begin{align} \quad (x_1, x_2, ..., x_n) \to_{\mathbf{g}} (y_1, y_2, ..., y_m) \to_{\mathbf{f}} (z_1, z_2, ..., z_p) \end{align}
- Then for all $k \in \{ 1, 2, ..., p \}$ and for all $j \in \{ 1, 2, ..., n \}$ we have that:
\begin{align} \quad \frac{\partial z_k}{\partial x_j} = \sum_{i=1}^{m} \frac{\partial z_k}{\partial y_i} \frac{\partial y_i}{\partial x_j} \end{align}