General Chain Rule?

sponsoredwalk · 07-01-2011 3:12pm #1

I'm looking at the proof of the multivariable chain rule & just a little bit curious about something.
In the single variable chain rule proof the way I know it is that you take the derivative:

[latex] f'(x) \ = \ \lim_{ \Delta x \to \infty} \frac{ \Delta y}{ \Delta x} [/latex]

and manipulate it as follows:

[latex] f'(x) \ - \ \lim_{ \Delta x \to \infty} \frac{ \Delta y}{ \Delta x} \ = \ 0 [/latex]

[latex] f'(x) \ - \ \frac{ \Delta y}{ \Delta x} \ = \ \epsilon (x) [/latex]

[latex] \Delta y \ = \ f'(x) \Delta x \ + \ \epsilon (x) \Delta x [/latex]

and you work off that function to prove the single variable version.
The multivariable version uses a function:

[latex] \Delta z \ = \ f_x(x,y) \Delta x \ + \ f_y(x,y) \Delta y \ + \ \epsilon_1 (x) \Delta x \ + \ \epsilon_2 (x) \Delta y [/latex]

which I can see is analogous to the single variable version but having
trouble deriving to be honest. But assuming that I'm okay with this
function I wonder about the proof.

The special case is just do divide by Δt & take the limit:

[latex] \frac{dz}{dt} \ = \ \lim_{ \Delta t \to \infty} \frac{ \Delta z}{ \Delta t} \ = \ \lim_{ \Delta t \to \infty} \ [ \ f_x(x,y) \frac{ \Delta x}{ \Delta t} \ + \ f_y(x,y) \frac{ \Delta y}{ \Delta t} \ + \ \epsilon_1 (x) \frac{ \Delta x}{ \Delta t} \ + \ \epsilon_2 (x) \frac{ \Delta y}{ \Delta t} \ ] \ = \ \ f_x(x,y) \frac{ dx}{ dt} \ + \ f_y(x,y) \frac{ d y}{ dt} [/latex]

and if f(x,y) has both x & y as functions of two variables
z = f(x,y) = f [ x(s,t),y(s,t) ]
then you follow the exact same idea if you're taking the partial w.r.t.
to s or t.

The general chain rule would just be a natural extension of this right? i.e.

z = f(x₁,x₂,...,xᵢ) = f [ x₁(t₁,t₂,...,tᵢ),x₂(t₁,t₂,...,tᵢ),...,xᵢ(t₁,t₂,...,tᵢ) ]

and the partial w.r.t. to tᵥ is the exact same idea:

[latex] \frac{\partial z}{\partial t_v} \ = \ f_{x_1}[x_2(t_1,t_2,...,t_v,...,t_i),x_2(t_1,t_2,...,t_v,...,t_i),...] \ \frac{dx_1}{dt_v} \ + \ f_{x_2}[x_2(t_1,t_2,...,t_v,...,t_i),x_2(t_1,t_2,...,t_v,...,t_i),...] \ \frac{dx_2}{dt_v} \ + \ ... \ + \ f_{x_i}[x_1(t_1,t_2,...,t_v,...,t_i),x_2(t_1,t_2,...,t_v,...,t_i),...] \ \frac{dx_i}{dt_v}[/latex]

obviously the notation can be shortened

but that's it right?

Assuming that proof to be correct I'm wondering about the function

[latex] \Delta z \ = \ f_x(x,y) \Delta x \ + \ f_y(x,y) \Delta y \ + \ \epsilon_1 (x) \Delta x \ + \ \epsilon_2 (x) \Delta y [/latex]

I mean rather than just saying it's analogous in different dimensions
shouldn't there be a way to derive it from the very similar arguments
involving tangent planes?

Start with the vector equation N • (X - X₀) = 0 to derive the plane.
N•(X - X₀) = 0
(A,B,C)•[(x - x₀),(y - y₀),(z - z₀)] = 0
A(x - x₀) + B(y - y₀) + C(z - z₀) = 0
z - z₀ = (-A/C)(x - x₀) + (-B/C)(y - y₀)
f(x,y) = f(x₀,y₀) + (-A/C)(x - x₀) + (-B/C)(y - y₀)
f(x,y) = f(x₀,y₀) + (∂f/∂x)(x - x₀) + (∂f/∂y)(y - y₀)

Now, I understand that this is the description of the tangent plane that
intersects the point f(x₀,y₀) & can be used to approximate a function for
all x close to f(x₀,y₀)
f(x,y) ≈ f(x₀,y₀) + (∂f/∂x)(x - x₀) + (∂f/∂y)(y - y₀)
I say this to make sure I have the correct understanding, when I derived
f(x,y) = f(x₀,y₀) + (∂f/∂x)(x - x₀) + (∂f/∂y)(y - y₀)
above I was deriving a linear tangent plane equation but for any function
at the point f(x₀,y₀) we can use this equation to find the tangent plane
intersecting the point f(x₀,y₀) and we can also linearly approximate any
function for all x,y close to f(x₀,y₀) just like the single variable tangent line.

It is the extra terms of taylor's formula that turn
f(x,y) ≈ f(x₀,y₀) + ... into f(x,y) = f(x₀,y₀) + ...
That's been confusing me & I'd really appreciate confirmation that I've
got the logic right now.

But how do we turn f(x,y) - f(x₀,y₀) = (∂f/∂x)(x - x₀) + (∂f/∂y)(y - y₀) into
[latex] \Delta z \ = \ f_x(x,y) \Delta x \ + \ f_y(x,y) \Delta y \ + \ \epsilon_1 (x) \Delta x \ + \ \epsilon_2 (x) \Delta y [/latex]

in a more linear fashion than just saying it should work

Fringe · 07-01-2011 5:07pm

You can generalise the derivative as the linear operator f'(a) of f at a such that
f(a + h) = f(a) + f'(a)h + R(h)
where ||R(h)||/||h|| goes to zero as h goes to zero. This definition can then be applied to any normed space.

For the chain rule, take functions g:U -> V and f:V -> W. Then the composition is f o g. Now if you assume that g and f are differentiable, then the chain rule comes from applying the definition.

sponsoredwalk · 07-01-2011 5:47pm

Fringe wrote: »

You can generalise the derivative as the linear operator f'(a) of f at a such that
f(a + h) = f(a) + f'(a)h + R(h)
where ||R(h)||/||h|| goes to zero as h goes to zero. This definition can then be applied to any normed space.

For the chain rule, take functions g:U -> V and f:V -> W. Then the composition is f o g. Now if you assume that g and f are differentiable, then the chain rule comes from applying the definition.

The book I'm reading doesn't have this form of the proof, I have this one in
a more advanced book & will come to it soon I just want to understand the
more elementary one here first. It seems alright to me, I think, but there's
also the question of turning the tangent plane equation

f(x,y) - f(x₀,y₀) = (∂f/∂x)(x - x₀) + (∂f/∂y)(y - y₀)
f(x₀ + Δx,y₀ + Δy) - f(x₀,y₀) = (∂f/∂x)(x - x₀) + (∂f/∂y)(y - y₀)
Δz= (∂f/∂x)(x - x₀) + (∂f/∂y)(y - y₀)

into

Δz= (∂f/∂x)(x - x₀) + (∂f/∂y)(y - y₀) + ε₁(x)Δx + ε₂(y)Δy
Δz= (∂f/∂x)Δx + (∂f/∂y)Δy + ε₁(x)Δx + ε₂(y)Δy

sponsoredwalk · 07-01-2011 7:34pm

You can't derive from a linear tangent plane a function with error terms,
the whole thing is conceptually null & void from the beginning!!!

I'm okay, should have recognized this from the start

Remove this
abortion of a thread

General Chain Rule?

Comments