Let be a subspace of and let be a vector in In this section, we will learn to compute the closest vector to in The vector is called the orthogonal projection of onto This is exactly what we will use to almost solve matrix equations, as discussed in the introduction to Chapter 7.
We begin by fixing some notation.
Let be a subspace of and let be a vector in We denote the closest vector to on by
To say that is the closest vector to on means that the difference is orthogonal to the vectors in
In other words, if then we have where is in and is in The first order of business is to prove that the closest vector always exists.
Let be a subspace of and let be a vector in Then we can write uniquely as
Let so by this fact in Section 7.2. Let be a basis for and let be a basis for We showed in the proof of this fact in Section 7.2 that is linearly independent, so it forms a basis for Therefore, we can write
where and Since is orthogonal to the vector is the closest vector to on so this proves that such a decomposition exists.
As for uniqueness, suppose that
for in and in Rearranging gives
Since and are subspaces, the left side of the equation is in and the right side is in Therefore, is in and in so it is orthogonal to itself, which implies Hence and which proves uniqueness.
Let be a subspace of and let be a vector in The expression
for in and in is called the orthogonal decomposition of with respect to and the closest vector is the orthogonal projection of onto
Since is the closest vector on to the distance from to the subspace is the length of the vector from to i.e., the length of To restate:
Closest vector and distance
Let be a subspace of and let be a vector in
The orthogonal projection is the closest vector to in
Now we turn to the problem of computing and Of course, since really all we need is to compute The following theorem gives a method for computing the orthogonal projection onto a column space. To compute the orthogonal projection onto a general subspace, usually it is best to rewrite the subspace as the column space of a matrix, as in this important note in Section 3.3.
Let be an matrix, let and let be a vector in Then the matrix equation
in the unknown vector is consistent, and is equal to for any solution
Let be the orthogonal decomposition with respect to By definition lies in and so there is a vector in with Choose any such vector We know that lies in which is equal to by this important note in Section 7.2. We thus have
This exactly means that is consistent. If is any solution to then by reversing the above logic, we conclude that
Example(Orthogonal projection onto a line)
Let be a line in and let be a vector in By the theorem, to find we must solve the matrix equation where we regard as an matrix (the column space of this matrix is exactly ). But and so is a solution of and hence
In the special case where we are projecting a vector in onto a line our formula for the projection can be derived very directly and simply. The vector is a multiple of say This multiple is chosen so that is perpendicular to as in the following picture.
In other words,
Using the distributive property for the dot product and isolating the variable gives us that
When is a matrix with more than one column, computing the orthogonal projection of onto means solving the matrix equation In other words, we can compute the closest vector by solving a system of linear equations. To be explicit, we state the theorem as a recipe:
Recipe: Compute an orthogonal decomposition
Let be a subspace of Here is a method to compute the orthogonal decomposition of a vector with respect to
Rewrite as the column space of a matrix In other words, find a a spanning set for and let be the matrix with those columns.
Compute the matrix and the vector
Form the augmented matrix for the matrix equation in the unknown vector and row reduce.
This equation is always consistent; choose one solution Then
In the context of the above recipe, if we start with a basis of then it turns out that the square matrix is automatically invertible! (It is always the case that is square and the equation is consistent, but need not be invertible in general.)
Let be an matrix with linearly independent columns and let Then the matrix is invertible, and for all vectors in we have
We will show that which implies invertibility by the invertible matrix theorem in Section 6.1. Suppose that Then so by the theorem. But (the orthogonal decomposition of the zero vector is just so and therefore is in Since the columns of are linearly independent, we have so as desired.
Let be a vector in and let be a solution of Then so
The corollary applies in particular to the case where we have a subspace of and a basis for To apply the corollary, we take to be the matrix with columns
In this subsection, we change perspective and think of the orthogonal projection as a function of This function turns out to be a linear transformation with many nice properties, and is a good example of a linear transformation which is not originally defined as a matrix transformation.
We have to verify the defining properties of linearity in Section 4.3. Let be vectors in and let and be their orthogonal decompositions. Since and are subspaces, the sums and are in and respectively. Therefore, the orthogonal decomposition of is so
Now let be a scalar. Then is in and is in so the orthogonal decomposition of is and therefore,
Since satisfies the two defining properties in Section 4.3, it is a linear transformation.
Any vector in is in the range of because for such vectors. On the other hand, for any vector in the output is in so is the range of
We compute the standard matrix of the orthogonal projection in the same way as for any other transformation: by evaluating on the standard coordinate vectors. In this case, this means projecting the standard coordinate vectors onto the subspace.
We emphasize that the properties of projection matrices would be very hard to prove in terms of matrices. By translating all of the statements into statements about linear transformations, they become much more transparent. For example, consider the projection matrix we found in this example. Just by looking at the matrix it is not at all obvious that when you square the matrix you get the same matrix back.
As we saw in this example, if you are willing to compute bases for and then this provides a third way of finding the standard matrix for projection onto indeed, if is a basis for and is a basis for then
where the middle matrix in the product is the diagonal matrix with ones and zeros on the diagonal. However, since you already have a basis for it is faster to multiply out the expression as in the corollary.