The Matrix Form of the Chain Rule For Compositions Of Differ

# The Matrix Form of the Chain Rule for Compositions of Differentiable Functions from Rn to Rm

Recall from The Chain Rule for Compositions of Differentiable Functions from Rn to Rm page that if $S \subseteq \mathbb{R}^n$ is open, $\mathbb{a} \in S$, $\mathbf{g} : S \to \mathbb{R}^p$, and if $\mathbf{f}$ is another function such that the composition $\mathbf{h} = \mathbf{f} \circ \mathbf{g}$ is well defined then if $\mathbf{g}$ is differentiable at $\mathbf{a}$ with total derivative $\mathbf{g}'(\mathbf{a})$ and $\mathbf{f}$ is differentiable at $\mathbf{b} = \mathbf{g}(\mathbf{a})$ with total derivative $\mathbf{f}'(\mathbf{b}) = \mathbf{f}'(\mathbf{g}(\mathbf{a}))$ then $\mathbf{h}$ is differentiable at $\mathbf{a}$ and:

(1)
\begin{align} \quad \mathbf{h}'(\mathbf{a}) = \mathbf{f}'(\mathbf{b}) \circ \mathbf{g}'(\mathbf{a}) = \mathbf{f}'(\mathbf{g}(\mathbf{a})) \circ \mathbf{g}'(\mathbf{a}) \end{align}

Also recall from earlier on The Jacobian Matrix of Differentiable Functions from Rn to Rm page that if a function is differentiable at a point then the total derivative of that function at that point is the Jacobian matrix of that function at that point. Therefore, if the composition $\mathbf{h} = \mathbf{f} \circ \mathbf{g}$ is well defined, $\mathbf{g}$ is differentiable at $\mathbf{a}$ with total derivative $\mathbf{f}'(\mathbf{a}) = \mathbf{D} \mathbf{g}(\mathbf{a})$ and $\mathbf{f}$ is differentiable at $\mathbf{b} = \mathbf{g}(\mathbf{a})$ with total derivative $\mathbf{f}'(\mathbf{b}) = \mathbf{D} \mathbf{f} (\mathbf{b})$ (i.e., $\mathbf{f}'(\mathbf{g}(\mathbf{a})) = \mathbf{D} \mathbf{f} (\mathbf{g}(\mathbf{a}))$ then from linear algebra, the matrix of a composition of two linear maps is equal to the product of the matrices of those linear maps, that is:

(2)
\begin{align} \quad \mathbf{D} \mathbf{h} (\mathbf{a}) = [\mathbf{D} \mathbf{f} (\mathbf{b})][\mathbf{D} \mathbf{g}(\mathbf{a})] = [\mathbf{D} \mathbf{f} (\mathbf{g}(\mathbf{a}))] [\mathbf{D} \mathbf{g} (\mathbf{a})] \end{align}

Furthermore, if $S \subseteq \mathbb{R}^n$ is open, $\mathbf{g} : S \to \mathbb{R}^m$ and $\mathbf{f} : R(\mathbf{g}) \to \mathbb{R}^p$, i.e.:

(3)
\begin{align} \quad (x_1, x_2, ..., x_n) \to_{\mathbf{g}} (y_1, y_2, ..., y_m) \to_{\mathbf{f}} (z_1, z_2, ..., z_p) \end{align}

Then for all $k \in \{ 1, 2, ..., p \}$ and for all $j \in \{ 1, 2, ..., n \}$ we have that:

(4)
\begin{align} \quad \frac{\partial z_k}{\partial x_j} = \sum_{i=1}^{m} \frac{\partial z_k}{\partial y_i} \frac{\partial y_i}{\partial x_j} \end{align}