The Chain Rule for Comps. of Differentiable Functions from Rn to Rm

# The Chain Rule for Compositions of Differentiable Functions from Rn to Rm

Let $S \subseteq \mathbb{R}^n$ be open, $\mathbf{a} \in S$, and $\mathbf{g} : S \to \mathbb{R}^p$. If $\mathbf{f}$ is a function such that $\mathbf{f} : D(\mathbf{f}) \to \mathbb{R}^m$ where $D(\mathbf{f}) \subseteq R(\mathbf{g})$ (recall the notation "$D(\mathbf{f})$" denotes the domain of $\mathbf{f}$ and the notation "$R(\mathbf{g})$" denotes the range of $\mathbf{g}$) then we can consider the composition function, $\mathbf{h} = \mathbf{f} \circ \mathbf{g} : S \to \mathbb{R}^m$ defined for all $\mathbf{x} \in S$ by:

(1)
\begin{align} \quad \mathbf{h}(\mathbf{x}) = \mathbf{f}(\mathbf{g}(\mathbf{x})) \end{align}

If $\mathbf{g}$ is differentiable at $\mathbf{a}$, and if $\mathbf{f}$ is differentiable at $\mathbf{b} = \mathbf{g}(\mathbf{a})$ then we can generalize the chain rule from differential calculus of a single variable.

 Theorem 1 (The Chain Rule): Let $S \subseteq \mathbb{R}^n$ be open, $\mathbf{a} \in S$, and $\mathbf{g} : S \to \mathbb{R}^p$. Let $\mathbf{f}$ be defined such that the composition $\mathbf{h} = \mathbf{f} \circ \mathbf{g}$ is well defined and let $\mathbf{b} = \mathbf{g}(\mathbf{a})$. If $\mathbf{g}$ is differentiable at $\mathbf{a}$ with total derivative $\mathbf{g}'(\mathbf{a})$ and if $\mathbf{f}$ is differentiable at $\mathbf{b}$ with total derivative $\mathbf{f}'(\mathbf{b})$ then $\mathbf{h}$ is differentiable at $\mathbf{a}$ with total derivative $\mathbf{h}'(\mathbf{a}) = \mathbf{f}'(\mathbf{b}) \circ \mathbf{g}'(\mathbf{a}) = \mathbf{f}'(\mathbf{g}(\mathbf{a})) \circ \mathbf{g}'(\mathbf{a})$.
• Proof: Let $\mathbf{g}$ be differentiable at $\mathbf{a}$ and let $\mathbf{f}$ be differentiable at $\mathbf{b} = \mathbf{g}(\mathbf{a})$. Then:
(2)
\begin{align} \quad \mathbf{h}(\mathbf{a} + \mathbf{y}) - \mathbf{h}(\mathbf{a}) &= \mathbf{f}(\mathbf{g}(\mathbf{a} + \mathbf{y})) - \mathbf{f}(\mathbf{g}(\mathbf{a})) \\ &= \mathbf{f}(\mathbf{g}(\mathbf{a}) + \mathbf{g}(\mathbf{a} + \mathbf{y}) - \mathbf{g}(\mathbf{a})) - \mathbf{f}(\mathbf{g}(\mathbf{a})) \end{align}
• Let $\mathbf{v} = \mathbf{g}(\mathbf{a} + \mathbf{y})$. Then:
(3)
\begin{align} \quad \mathbf{h}(\mathbf{a} + \mathbf{y}) - \mathbf{h}(\mathbf{a}) &= \mathbf{f}(\mathbf{b} + \mathbf{v}) - \mathbf{f}(\mathbf{b}) \end{align}
• Now since $\mathbf{g}$ is differentiable at $\mathbf{a}$ there exists a linear function $\mathbf{T}_{\mathbf{a}} = \mathbf{g}'(\mathbf{a})$ such that:
(4)
\begin{align} \quad \mathbf{g}(\mathbf{a} + \mathbf{y}) &= \mathbf{g}(\mathbf{a}) + \mathbf{T}_{\mathbf{a}} (\mathbf{y}) + \| \mathbf{y} \| \mathbf{E}_{\mathbf{a}} (\mathbf{y}) \\ &= \mathbf{g}(\mathbf{a}) + \mathbf{g}'(\mathbf{a})(\mathbf{y}) + \| \mathbf{y} \| \mathbf{E}_{\mathbf{a}} (\mathbf{y}) \end{align}
• Where $\mathbf{E}_{\mathbf{a}} (\mathbf{y}) \to \mathbf{0}$ as $\mathbf{y} \to \mathbf{0}$. Since $\mathbf{v} = \mathbf{g}(\mathbf{a} + \mathbf{y}) - \mathbf{g}(\mathbf{a})$ we see that:
(5)
\begin{align} \quad \mathbf{v} = \mathbf{g}'(\mathbf{a}) (\mathbf{y}) + \| \mathbf{y} \| \mathbf{E}_{\mathbf{a}} (\mathbf{y}) \quad (*) \end{align}
• Where once again, $\mathbf{E}_{\mathbf{a}} (\mathbf{y}) \to \mathbf{0}$ as $\mathbf{y} \to \mathbf{0}$.
• Similarly, since $\mathbf{f}$ is differentiable at $\mathbf{b}$ there exists a linear function $\mathbf{T}_{\mathbf{b}} = \mathbf{f}'(\mathbf{b})$ such that:
(6)
\begin{align} \quad \mathbf{f}(\mathbf{b} + \mathbf{v}) &= \mathbf{f}(\mathbf{b}) + \mathbf{T}_{\mathbf{b}}(\mathbf{v}) + \| \mathbf{v} \| + \mathbf{E}_{\mathbf{b}} (\mathbf{v}) \\ &= \mathbf{f}(\mathbf{b}) + \mathbf{f}'(\mathbf{b})(\mathbf{v}) + \| \mathbf{v} \| + \mathbf{E}_{\mathbf{b}} (\mathbf{v}) \end{align}
• Where $\mathbf{E}_{\mathbf{b}} (\mathbf{v}) \to \mathbf{0}$ as $\mathbf{v} \to \mathbf{0}$. Equivalently:
(7)
\begin{align} \quad \mathbf{f}(\mathbf{b} + \mathbf{v}) - \mathbf{f}(\mathbf{b}) = \mathbf{f}'(\mathbf{b})(\mathbf{v}) + \| \mathbf{v} \| \mathbf{E}_{\mathbf{b}}(\mathbf{v}) \quad (**) \end{align}
• Where once again, $\mathbf{E}_{\mathbf{b}} (\mathbf{v}) \to \mathbf{0}$ as $\mathbf{v} \to \mathbf{0}$.
• Using $(*)$ with $(**)$ gives us:
(8)
\begin{align} \quad \mathbf{f}(\mathbf{b} + \mathbf{v}) - \mathbf{f}(\mathbf{b}) &= \mathbf{f}'(\mathbf{b})(\mathbf{g}'(\mathbf{a}) (\mathbf{y}) + \| \mathbf{y} \| \mathbf{E}_{\mathbf{a}} (\mathbf{y})) + \| \mathbf{v} \| \mathbf{E}_{\mathbf{b}} (\mathbf{v}) \\ &= [\mathbf{f}'(\mathbf{b}) \circ \mathbf{g}'(\mathbf{a})] (\mathbf{y}) + \| \mathbf{y} \| \mathbf{f}'(\mathbf{b})(\mathbf{E}_{\mathbf{a}}(\mathbf{y})) + \| \mathbf{v} \| \mathbf{E}_{\mathbf{b}} (\mathbf{v}) \end{align}
• Let $\mathbf{E}$ be a new function defined by:
(9)
\begin{align} \quad \mathbf{E}(\mathbf{y}) = \left\{\begin{matrix} \mathbf{f}'(\mathbf{b})(\mathbf{E}_{\mathbf{a}}(\mathbf{y})) + \frac{\| \mathbf{v} \|}{\|\mathbf{y} \|}\mathbf{E}_{\mathbf{b}}(\mathbf{v}) & \mathrm{if} \: \mathbf{y} \neq \mathbf{0} \\ \mathbf{0} & \mathrm{if} \: \mathbf{y} = \mathbf{0} \end{matrix}\right. \end{align}
• Then $\mathbf{f}(\mathbf{b} + \mathbf{v}) - \mathbf{f}(\mathbf{b}) = \mathbf{f}(\mathbf{b}) \circ \mathbf{g}(\mathbf{a})(\mathbf{y}) + \| \mathbf{y} \| \mathbf{E}(\mathbf{y})$. The function $\mathbf{f}(\mathbf{b}) \circ \mathbf{g}(\mathbf{a})$ is a linear map (since it is a composition of linear maps), so all that remains to show is that $\mathbf{E}(\mathbf{y}) \to \mathbf{0}$ as $\mathbf{y} \to \mathbf{0}$.
• First, note that since $\mathbf{E}_{\mathbf{a}}(\mathbf{y}) \to \mathbf{0}$ as $\mathbf{y} \to \mathbf{0}$ we have that $\mathbf{f}'(\mathbf{b})(\mathbf{E}_{\mathbf{a}}(\mathbf{y}) \to \mathbf{0}$ as $\mathbf{y} \to \mathbf{0}$ from the proposition on the A Bound for the Total Derivative of a Function from Rn to Rm page.
• Now by $(*)$ we have that:
(10)
\begin{align} \quad \| \mathbf{v} \| & = \| \mathbf{g}'(\mathbf{a})(\mathbf{y}) + \| \mathbf{y} \| \mathbf{E}_{\mathbf{a}}(\mathbf{y}) \| \\ & \leq \| \mathbf{g}'(\mathbf{a})(\mathbf{y}) \| + \| \mathbf{y} \| \| \mathbf{E}_{\mathbf{a}} (\mathbf{y}) \| \end{align}
• From the page referenced above, we have that $\| \mathbf{g}'(\mathbf{a})(\mathbf{y}) \| \leq M \| \mathbf{y} \|$ where $\displaystyle{M = \sum_{k=1}^{m} \| \nabla g_k (\mathbf{a}) \|}$, so:
(11)
\begin{align} \quad \| \mathbf{v} \| & \leq \| \mathbf{y} \| M + \| \mathbf{y} \| \| \mathbf{E}_{\mathbf{a}} (\mathbf{y}) \| \end{align}
• Divide both sides by $\| \mathbf{y} \|$ to get:
(12)
\begin{align} \quad \frac{\| \mathbf{v} \|}{\| \mathbf{y} \|} \leq M + \| \mathbf{E}_{\mathbf{a}} (\mathbf{y}) \| \end{align}
• So as $\mathbf{y} \to \mathbf{0}$ we have that $\displaystyle{\frac{\| \mathbf{v} \|}{\| \mathbf{y} \|}}$ is bounded (since $\| \mathbf{E}_{\mathbf{a}} (\mathbf{y}) \| \to \mathbf{0}$). But also $\mathbf{E}_{\mathbf{b}}(\mathbf{v}) \to \mathbf{0}$ as $\mathbf{v} \to \mathbf{0}$ (and as $\mathbf{y} \to \mathbf{0}$), so the product $\displaystyle{\| \mathbf{v} \|}{\| \mathbf{y} \|}\mathbf{E}_{\mathbf{b}} (\mathbf{v}) \to \mathbf{0}$ as $\mathbf{y} \to \mathbf{0}$.
• Hence $\mathbf{E}(\mathbf{y}) \to \mathbf{0}$ as $\mathbf{y} \to \mathbf{0}$. Therefore, $\mathbf{h}$ is differentiable at $\mathbf{a}$ and has total derivative:
(13)
\begin{align} \quad \mathbf{h}'(\mathbf{a}) = \mathbf{f}'(\mathbf{b})\circ \mathbf{g}'(\mathbf{a}) = \mathbf{f}'(\mathbf{g}(\mathbf{a})) \circ \mathbf{g}'(\mathbf{a}) \quad \blacksquare \end{align}