← mapVector Spaces

Orthogonality & Gram–Schmidt

requiresBasis & Dimension ⚠The Dot Product ⚠

⚗ Dr. Möbius, from the lab

Bases are great. Orthonormal bases are obscene luxury — perpendicular axes, each of length one, where finding coordinates stops being algebra and becomes a single dot product. The Federation of Boring Textbook Authors treats this as a footnote. I treat it as the holy grail. Today we build the machine that takes any sorry pile of vectors and grinds it into a perfect set of right angles: Gram–Schmidt. Each step is the same brutal move — subtract the shadow, keep the remainder.

THE BIG IDEA

Orthogonal vectors are automatically independent; an orthonormal basis gives coordinates by dot products alone; and Gram–Schmidt converts any basis into an orthonormal one by repeatedly subtracting projections.

The inner product, and why right angles are independence for free

Recall from Dot Product the inner product on $\mathbb{R}^n$ : $u \cdot v = u_1 v_1 + \cdots + u_n v_n$ . It measures length, $\|v\| = \sqrt{v \cdot v}$ , and angle — in particular $u \cdot v = 0$ exactly when $u$ and $v$ are orthogonal (perpendicular). A set is orthogonal if every pair of distinct vectors in it dots to zero.

Here's the first gift, and it's a good one. An orthogonal set of nonzero vectors is automatically linearly independent. Proof — one clean line of dotting, no row reduction required. Suppose $\{v_1, \dots, v_k\}$ are nonzero and mutually orthogonal, and

$c_1 v_1 + c_2 v_2 + \cdots + c_k v_k = 0.$

Dot both sides with $v_i$ . Every cross term $v_j \cdot v_i$ ( $j \ne i$ ) vanishes by orthogonality, leaving

$c_i (v_i \cdot v_i) = 0.$

Since $v_i \ne 0$ , we have $v_i \cdot v_i = \|v_i\|^2 > 0$ , so $c_i = 0$ . This works for every $i$ , so all coefficients are zero — independent. $\blacksquare$ No row reduction, no determinant. Perpendicularity is a stronger, cleaner reason for independence than the general case ever gives you.

Orthonormal bases: coordinates for free

Normalize each vector to length one (divide by its norm) and you have an orthonormal set: mutually perpendicular and unit length, $q_i \cdot q_j = 0$ for $i \ne j$ and $q_i \cdot q_i = 1$ . An orthonormal basis $\{q_1, \dots, q_n\}$ is the luxury tier, and here's why.

In a general basis, finding the coordinates of $v$ means solving a linear system $Pc = v$ . In an orthonormal basis, you skip all of that. Write $v = \sum c_i q_i$ and dot with $q_j$ :

$v \cdot q_j = \sum_i c_i (q_i \cdot q_j) = c_j,$

because every term dies except $i = j$ , where $q_j \cdot q_j = 1$ . So

$c_j = v \cdot q_j.$

The coordinate is just a dot product. No system, no inverse, no elimination. Fuck Gaussian elimination — a single multiply-and-sum is all you need. That's the whole reason orthonormal bases are worth the trouble of building — they turn coordinate-finding from a computation into a reflex.

Projection onto a subspace: the closest point

Take a subspace $W$ with orthonormal basis $\{q_1, \dots, q_m\}$ . The orthogonal projection of a vector $b$ onto $W$ is

$\operatorname{proj}_W(b) = (b \cdot q_1)\,q_1 + \cdots + (b \cdot q_m)\,q_m.$

Each term $(b \cdot q_i) q_i$ is the shadow of $b$ along one axis; sum the shadows. For a single direction $a$ (not necessarily unit), the formula is the un-normalized version you've seen:

$\operatorname{proj}_a(b) = \frac{b \cdot a}{a \cdot a}\, a.$

The deep fact: the projection is the closest point in $W$ to $b$ . Among all vectors in the subspace, $\operatorname{proj}_W(b)$ minimizes the distance $\|b - w\|$ , because the error $b - \operatorname{proj}_W(b)$ comes out perpendicular to $W$ , and a perpendicular dropped to a subspace is the shortest segment to it. Minimizing distance to a subspace is exactly the least-squares problem — the backbone of fitting lines to noisy data — and it's nothing but a projection. Drag the vectors and watch the projection slide to the nearest point on the line, the error arrow staying perpendicular:

dot product & projection

u·v = 11θ = 42.3°proj_v(u) = (2.59, 0.65) acute

Gram–Schmidt: subtract the shadow, repeat

Now the main event. Given any basis $\{v_1, v_2, v_3, \dots\}$ — possibly a skewed, ugly, embarrassing mess — Gram–Schmidt manufactures an orthogonal basis spanning the same space. The move is always the same: take the next vector, subtract off its shadow on everything you've already built, keep the remainder. What's left is perpendicular to all the earlier ones by construction. Simple as hell, powerful as anything in this course.

$u_1 = v_1,$ $u_2 = v_2 - \frac{v_2 \cdot u_1}{u_1 \cdot u_1}\, u_1,$ $u_3 = v_3 - \frac{v_3 \cdot u_1}{u_1 \cdot u_1}\, u_1 - \frac{v_3 \cdot u_2}{u_2 \cdot u_2}\, u_2,$

and so on. Each subtracted term is a projection — a shadow — and removing it leaves a vector orthogonal to that direction. At the end, normalize each $u_i$ to get an orthonormal basis. Worked in full in Example 3. It's tedious arithmetic, not deep, but get one sign wrong and the orthogonality silently dies — so always check the dot products are zero at the end. That check is free and it has saved more homework than coffee has. Do not skip it.

Orthogonal matrices: the rigid motions

Stack an orthonormal basis into the columns of a matrix $Q$ . Then $Q^T Q = I$ — the $(i,j)$ entry of $Q^T Q$ is exactly $q_i \cdot q_j$ , which is $1$ on the diagonal and $0$ off it. A square matrix with $Q^T Q = I$ is called orthogonal, and it has a superpower: it preserves lengths and angles. For any $v$ ,

$\|Qv\|^2 = (Qv) \cdot (Qv) = v^T Q^T Q v = v^T I v = v \cdot v = \|v\|^2.$

Lengths are untouched; since dot products are preserved too, so are angles. Orthogonal matrices are exactly the rigid motions of space — rotations and reflections, the transformations that move the world without bending, stretching, or shearing it. Their determinant is $\pm 1$ ( $+1$ rotation, $-1$ reflection). And $Q^{-1} = Q^T$ for free — inverting becomes transposing.

This is the last piece of equipment before the summit. The spectral theorem says every real symmetric matrix can be diagonalized by an orthogonal $Q$ — perpendicular axes, no distortion, no shit. You just built every single tool that sentence needs. Go do the gauntlet, then climb.

🔬 SPECIMENS (worked examples)

Worked example 1 — coordinates by dot product▸

The set $\left\{ q_1 = \tfrac{1}{\sqrt2}\begin{pmatrix} 1 \\ 1 \end{pmatrix},\ q_2 = \tfrac{1}{\sqrt2}\begin{pmatrix} 1 \\ -1 \end{pmatrix} \right\}$ is an orthonormal basis of $\mathbb{R}^2$ . Find the coordinates of $v = \begin{pmatrix} 3 \\ 5 \end{pmatrix}$ in this basis.

First confirm it's orthonormal: $q_1 \cdot q_2 = \tfrac12(1\cdot1 + 1\cdot(-1)) = 0$ (perpendicular), and $\|q_1\|^2 = \tfrac12(1+1) = 1$ , same for $q_2$ (unit length). Good — so coordinates are just dot products.

$c_1 = v \cdot q_1 = \tfrac{1}{\sqrt2}(3 + 5) = \tfrac{8}{\sqrt2} = 4\sqrt2.$ $c_2 = v \cdot q_2 = \tfrac{1}{\sqrt2}(3 - 5) = \tfrac{-2}{\sqrt2} = -\sqrt2.$

So $[v] = \begin{pmatrix} 4\sqrt2 \\ -\sqrt2 \end{pmatrix}$ in this basis. Check by reassembling: $4\sqrt2 \cdot \tfrac{1}{\sqrt2}\begin{pmatrix}1\\1\end{pmatrix} - \sqrt2\cdot\tfrac{1}{\sqrt2}\begin{pmatrix}1\\-1\end{pmatrix} = 4\begin{pmatrix}1\\1\end{pmatrix} - \begin{pmatrix}1\\-1\end{pmatrix} = \begin{pmatrix}3\\5\end{pmatrix}$ . $\checkmark$ No system solved — two dot products and we're done.

Worked example 2 — projection finds the closest point▸

Find the point on the line $W = \operatorname{span}\left\{\begin{pmatrix}1\\1\end{pmatrix}\right\}$ closest to $b = \begin{pmatrix} 4 \\ 0 \end{pmatrix}$ , and confirm the error is perpendicular to the line.

The closest point is the projection of $b$ onto the line. With $a = \begin{pmatrix}1\\1\end{pmatrix}$ :

$\operatorname{proj}_a(b) = \frac{b \cdot a}{a \cdot a}\, a = \frac{(4)(1) + (0)(1)}{(1)(1)+(1)(1)}\begin{pmatrix}1\\1\end{pmatrix} = \frac{4}{2}\begin{pmatrix}1\\1\end{pmatrix} = \begin{pmatrix} 2 \\ 2 \end{pmatrix}.$

So the closest point on the line is $\begin{pmatrix}2\\2\end{pmatrix}$ .

Confirm perpendicularity of the error. The error vector is

$b - \operatorname{proj}_a(b) = \begin{pmatrix}4\\0\end{pmatrix} - \begin{pmatrix}2\\2\end{pmatrix} = \begin{pmatrix} 2 \\ -2 \end{pmatrix}.$

Dot it with the line's direction: $\begin{pmatrix}2\\-2\end{pmatrix}\cdot\begin{pmatrix}1\\1\end{pmatrix} = 2 - 2 = 0$ . $\checkmark$ The error is perpendicular to $W$ , which is precisely why $\begin{pmatrix}2\\2\end{pmatrix}$ is the closest point — drop a perpendicular and you've found the shortest route to the line. That's least-squares in miniature.

Worked example 3 — Gram–Schmidt run in full: three ugly vectors get orthogonalized▸

Apply Gram–Schmidt to $v_1 = \begin{pmatrix}1\\1\\0\end{pmatrix},\ v_2 = \begin{pmatrix}1\\0\\1\end{pmatrix},\ v_3 = \begin{pmatrix}0\\1\\1\end{pmatrix}$ to produce an orthogonal basis of $\mathbb{R}^3$ .

Step 1. $u_1 = v_1 = \begin{pmatrix}1\\1\\0\end{pmatrix}$ , with $u_1 \cdot u_1 = 2$ .

Step 2 — subtract $v_2$ 's shadow on $u_1$ . $v_2 \cdot u_1 = (1)(1)+(0)(1)+(1)(0) = 1$ , so

$u_2 = v_2 - \frac{1}{2}u_1 = \begin{pmatrix}1\\0\\1\end{pmatrix} - \frac12\begin{pmatrix}1\\1\\0\end{pmatrix} = \begin{pmatrix}1/2\\-1/2\\1\end{pmatrix}.$

Scale by $2$ for cleaner arithmetic (scaling preserves direction and orthogonality): $u_2 = \begin{pmatrix}1\\-1\\2\end{pmatrix}$ , with $u_2 \cdot u_2 = 1 + 1 + 4 = 6$ . Check $u_2 \cdot u_1 = 1 - 1 + 0 = 0$ . $\checkmark$

Step 3 — subtract $v_3$ 's shadows on $u_1$ and $u_2$ . $v_3 \cdot u_1 = (0)(1)+(1)(1)+(1)(0) = 1$ , so the $u_1$ shadow coefficient is $\tfrac{1}{2}$ . $v_3 \cdot u_2 = (0)(1)+(1)(-1)+(1)(2) = 1$ , so the $u_2$ shadow coefficient is $\tfrac{1}{6}$ .

$u_3 = \begin{pmatrix}0\\1\\1\end{pmatrix} - \frac12\begin{pmatrix}1\\1\\0\end{pmatrix} - \frac16\begin{pmatrix}1\\-1\\2\end{pmatrix} = \begin{pmatrix}0 - \tfrac12 - \tfrac16 \\ 1 - \tfrac12 + \tfrac16 \\ 1 - 0 - \tfrac13\end{pmatrix} = \begin{pmatrix}-2/3\\2/3\\2/3\end{pmatrix}.$

Scale by $\tfrac32$ : $u_3 = \begin{pmatrix}-1\\1\\1\end{pmatrix}$ .

Verify orthogonality of the final set $\left\{\begin{pmatrix}1\\1\\0\end{pmatrix}, \begin{pmatrix}1\\-1\\2\end{pmatrix}, \begin{pmatrix}-1\\1\\1\end{pmatrix}\right\}$ : $u_3 \cdot u_1 = -1 + 1 + 0 = 0$ , $u_3 \cdot u_2 = -1 - 1 + 2 = 0$ , $u_2 \cdot u_1 = 0$ (above). All zero. $\checkmark$

To make it orthonormal, divide each by its norm: $\|u_1\| = \sqrt2$ , $\|u_2\| = \sqrt6$ , $\|u_3\| = \sqrt3$ . Done — a perfect set of right angles built from a skewed one.

☠ KNOWN HAZARDS

Confusing "orthogonal" with "orthonormal". Orthogonal = perpendicular (dot to zero). Orthonormal = perpendicular and unit length. The free-coordinate formula $c_j = v\cdot q_j$ needs the normal part; without unit length you'd divide by $\|q_j\|^2$ .
Normalizing too early in Gram–Schmidt. Subtract all the shadows first, building the orthogonal $u_i$ ; normalize only at the very end. Normalizing mid-stream just dumps error-prone square roots into every subsequent computation. Don't make your own life harder.
Forgetting to verify the dot products. Gram–Schmidt is sign-sensitive arithmetic. One slip and your "orthogonal" set isn't. Always confirm $u_i \cdot u_j = 0$ for the pairs at the end — it's a free, decisive check.
Thinking "orthogonal matrix" means its entries are zeros and ones. No — it means its columns form an orthonormal set. A $45^\circ$ rotation matrix is orthogonal and full of $\tfrac{1}{\sqrt2}$ 's.

TL;DR

▸
An orthogonal set of nonzero vectors is automatically independent (dot the dependence relation with each $v_i$ ; everything but one term dies).
▸
In an orthonormal basis, coordinates are free dot products: $c_j = v \cdot q_j$ — no system to solve.
▸
Projection onto a subspace is the sum of shadows $\sum (b\cdot q_i)q_i$ , and it's the closest point in the subspace to $b$ (least squares). The error is perpendicular to the subspace.
▸
Gram–Schmidt: $u_k = v_k - \sum_{j<k}\frac{v_k \cdot u_j}{u_j \cdot u_j}u_j$ — subtract the shadow on everything built so far, then normalize.
▸
Orthogonal matrices ( $Q^TQ = I$ ) preserve lengths and angles — rotations and reflections — with $Q^{-1} = Q^T$ and $\det Q = \pm 1$ .

unlocks

The Spectral Theorem →