Gradients, Jacobian Matrices, and the Chain Rule Review

# Gradients, Jacobian Matrices, and the Chain Rule Review

We will now review some of the recent material regarding gradients, Jacobian matrices, and the chain rule for functions from $\mathbb{R}^n$ and $\mathbb{R}^m$. Let $S \subseteq \mathbb{R}^n$, $\mathbf{c} \in S$, and $\mathbf{f} : S \to \mathbb{R}^m$ unless otherwise stated.

• On The Gradient of a Differentiable Function from Rn to R page we said that if $f : S \to \mathbb{R}$ and all of the partial derivatives of $f$ at $\mathbf{c}$, $D_1 f(\mathbf{c})$, $D_2 f(\mathbf{c})$, …, $D_n f(\mathbf{c})$ exist, then the Gradient of $f$ at $\mathbf{c}$ is defined as:
(1)
\begin{align} \quad \nabla f(\mathbf{c}) = \left ( D_1 f(\mathbf{c}), D_2 f(\mathbf{c}), ..., D_n f(\mathbf{c}) \right ) \end{align}
• We noted that if $f$ is differentiable at $\mathbf{c}$ or even if only all of the directional derivatives of $f$ at $\mathbf{c}$ exist then the gradient of $f$ at $\mathbf{c}$ is defined.
• We then looked at a nice theorem which states that if $f$ is differentiable at $\mathbf{c}$ then there exists a unit vector $\mathbf{u} \in \mathbb{R}^n$ for which:
(2)
\begin{align} \quad \mid f'(\mathbf{c}, \mathbf{u}) \mid = \| \nabla f(\mathbf{c}) \| \end{align}
• In particular, we saw that if $\mathbf{u}$ is the unit vector in the same direction as $\nabla f(\mathbf{c})$ then the equality above will hold. We also noted that $\mid f'(\mathbf{c}, \mathbf{u}) \mid$ never exceed $\| \nabla f(\mathbf{c}) \|$ and so the maximum rate of change of $f$ at $\mathbf{c}$ occurs in the direction of $\nabla f(\mathbf{c})$.
(3)
\begin{align} \mathbf{D} \mathbf{f} (\mathbf{c}) = \begin{bmatrix} D_1 f_1 (\mathbf{c}) & D_2 f_1 (\mathbf{c}) & \cdots & D_n f_1 (\mathbf{c}) \\ D_1 f_2 (\mathbf{c}) & D_2 f_2 (\mathbf{c}) & \cdots & D_n f_2 (\mathbf{c}) \\ \vdots & \vdots & \ddots & \vdots \\ D_1 f_m (\mathbf{c}) & D_2 f_m (\mathbf{c}) & \cdots & D_n f_m (\mathbf{c}) \\ \end{bmatrix} = \begin{bmatrix} \nabla f_1(\mathbf{c})\\ \nabla f_2(\mathbf{c})\\ \vdots\\ \nabla f_m(\mathbf{c}) \end{bmatrix} \end{align}
• Like above, the Jacobian of $\mathbf{f}$ exists provided that all of the partial derivatives of all of the component functions of $\mathbf{f}$ exist. We proved that if $\mathbf{f}$ is indeed differentiable at $\mathbf{c}$ then the total derivative of $\mathbf{f}$ at $\mathbf{c}$ is identically the Jacobian matrix of $\mathbf{f}$ at $\mathbf{c}$, i.e.:
(4)
\begin{align} \quad \mathbf{f}'(\mathbf{c}) = \mathbf{D} \mathbf{f} (\mathbf{c}) \end{align}
• We also noted that if $m = 1$, i.e., $f : S \to \mathbb{R}$ ($f$ is a real-valued function) then the Jacobian of $f$ at $c$ is precisely the gradient of $f$ at $\mathbf{c}$. Therefore, if $f$ is differentiable at $\mathbf{c}$ then the Jacobian of $f$ at $\mathbf{c}$ evaluated at $\mathbf{v} \in \mathbb{R}^n$ gives us the directional derivative of $f$ at $\mathbf{c}$ in the direction of $\mathbf{v}$ and is equal to the $\nabla f(\mathbf{c}) \cdot \mathbf{v}$, i.e., another proof that $f'(\mathbf{c}, \mathbf{v}) = \nabla f(\mathbf{c}) \cdot \mathbf{v}$.
(5)
\begin{align} \quad \| \mathbf{f}'(\mathbf{c})(\mathbf{v}) \| \leq M \| \mathbf{v} \| \end{align}
• Where $\displaystyle{M = \sum_{k=1}^{m} \| \nabla f_k(\mathbf{c}) \| }$. Notice that this is simply a generalization to the inequality $\mid f'(\mathbf{c}, \mathbf{u}) \mid \leq \| \nabla f(\mathbf{c}) \|$ that we established earlier for real-valued functions.
• On The Chain Rule for Compositions of Differentiable Functions from Rn to Rm page we saw that if $\mathbf{f}$ and $\mathbf{g}$ are functons for which the composition $\mathbf{h} = \mathbf{f} \circ \mathbf{g}$ makes sense, and if $\mathbf{g}$ is differentiable at $\mathbf{a}$ and $\mathbf{f}$ is differentiable at $\mathbf{b} = \mathbf{g}(\mathbf{a})$ then $\mathbf{h}$ is differentiable at $\mathbf{a}$ and:
(6)
\begin{align} \quad \mathbf{h}'(\mathbf{a}) = \mathbf{f}'(\mathbf{b}) \circ \mathbf{g}'(\mathbf{a}) = \mathbf{f}'(\mathbf{g}(\mathbf{a})) \circ \mathbf{g}'(\mathbf{a}) \end{align}
• On The Matrix Form of the Chain Rule for Compositions of Differentiable Functions from Rn to Rm page we stated the chain rule in terms of matrices. That is, if $\mathbf{f}$ and $\mathbf{g}$ are functions such that the composition $\mathbf{h} = \mathbf{f} \circ \mathbf{g}$ is well-defined and if $\mathbf{g}$ is differentiable at $\mathbf{a}$ and $\mathbf{f}$ is differentiable at $\mathbf{b} = \mathbf{g}(\mathbf{a})$ then $\mathbf{h}$ is differentiable at $\mathbf{a}$ and:
(7)
\begin{align} \quad [\mathbf{D} \mathbf{h} (\mathbf{a})] = [\mathbf{D} \mathbf{f} (\mathbf{b})][\mathbf{D} \mathbf{g}(\mathbf{a})] = [\mathbf{D} \mathbf{f} (\mathbf{g}(\mathbf{a}))][\mathbf{D} \mathbf{g}(\mathbf{a})] \end{align}
• Furthermore, if $\mathbf{g} : S \to \mathbb{R}^m$, $\mathbf{f} : R(\mathbf{g}) \to \mathbb{R}^m$ such that:
(8)
\begin{align} \quad (x_1, x_2, ..., x_n) \to_{\mathbf{g}} (y_1, y_2, ..., y_m) \to_{\mathbf{f}} (z_1, z_2, ..., z_p) \end{align}
• Then for all $k \in \{ 1, 2, ..., p \}$ and for all $j \in \{ 1, 2, ..., n \}$ we have that:
(9)
\begin{align} \quad \frac{\partial z_k}{\partial x_j} = \sum_{i=1}^{m} \frac{\partial z_k}{\partial y_i} \frac{\partial y_i}{\partial x_j} \end{align}