← mapVector Spaces

Orthogonality & Gram–Schmidt

⚗ Dr. Möbius, from the lab

Bases are great. Orthonormal bases are obscene luxury — perpendicular axes, each of length one, where finding coordinates stops being algebra and becomes a single dot product. The Federation of Boring Textbook Authors treats this as a footnote. I treat it as the holy grail. Today we build the machine that takes any sorry pile of vectors and grinds it into a perfect set of right angles: Gram–Schmidt. Each step is the same brutal move — subtract the shadow, keep the remainder.

THE BIG IDEA

Orthogonal vectors are automatically independent; an orthonormal basis gives coordinates by dot products alone; and Gram–Schmidt converts any basis into an orthonormal one by repeatedly subtracting projections.

The inner product, and why right angles are independence for free

Recall from Dot Product the inner product on Rn\mathbb{R}^n: uv=u1v1++unvnu \cdot v = u_1 v_1 + \cdots + u_n v_n. It measures length, v=vv\|v\| = \sqrt{v \cdot v}, and angle — in particular uv=0u \cdot v = 0 exactly when uu and vv are orthogonal (perpendicular). A set is orthogonal if every pair of distinct vectors in it dots to zero.

Here's the first gift, and it's a good one. An orthogonal set of nonzero vectors is automatically linearly independent. Proof — one clean line of dotting, no row reduction required. Suppose {v1,,vk}\{v_1, \dots, v_k\} are nonzero and mutually orthogonal, and

c1v1+c2v2++ckvk=0.c_1 v_1 + c_2 v_2 + \cdots + c_k v_k = 0.

Dot both sides with viv_i. Every cross term vjviv_j \cdot v_i (jij \ne i) vanishes by orthogonality, leaving

ci(vivi)=0.c_i (v_i \cdot v_i) = 0.

Since vi0v_i \ne 0, we have vivi=vi2>0v_i \cdot v_i = \|v_i\|^2 > 0, so ci=0c_i = 0. This works for every ii, so all coefficients are zero — independent. \blacksquare No row reduction, no determinant. Perpendicularity is a stronger, cleaner reason for independence than the general case ever gives you.

Orthonormal bases: coordinates for free

Normalize each vector to length one (divide by its norm) and you have an orthonormal set: mutually perpendicular and unit length, qiqj=0q_i \cdot q_j = 0 for iji \ne j and qiqi=1q_i \cdot q_i = 1. An orthonormal basis {q1,,qn}\{q_1, \dots, q_n\} is the luxury tier, and here's why.

In a general basis, finding the coordinates of vv means solving a linear system Pc=vPc = v. In an orthonormal basis, you skip all of that. Write v=ciqiv = \sum c_i q_i and dot with qjq_j:

vqj=ici(qiqj)=cj,v \cdot q_j = \sum_i c_i (q_i \cdot q_j) = c_j,

because every term dies except i=ji = j, where qjqj=1q_j \cdot q_j = 1. So

cj=vqj.c_j = v \cdot q_j.

The coordinate is just a dot product. No system, no inverse, no elimination. Fuck Gaussian elimination — a single multiply-and-sum is all you need. That's the whole reason orthonormal bases are worth the trouble of building — they turn coordinate-finding from a computation into a reflex.

Projection onto a subspace: the closest point

Take a subspace WW with orthonormal basis {q1,,qm}\{q_1, \dots, q_m\}. The orthogonal projection of a vector bb onto WW is

projW(b)=(bq1)q1++(bqm)qm.\operatorname{proj}_W(b) = (b \cdot q_1)\,q_1 + \cdots + (b \cdot q_m)\,q_m.

Each term (bqi)qi(b \cdot q_i) q_i is the shadow of bb along one axis; sum the shadows. For a single direction aa (not necessarily unit), the formula is the un-normalized version you've seen:

proja(b)=baaaa.\operatorname{proj}_a(b) = \frac{b \cdot a}{a \cdot a}\, a.

The deep fact: the projection is the closest point in WW to bb. Among all vectors in the subspace, projW(b)\operatorname{proj}_W(b) minimizes the distance bw\|b - w\|, because the error bprojW(b)b - \operatorname{proj}_W(b) comes out perpendicular to WW, and a perpendicular dropped to a subspace is the shortest segment to it. Minimizing distance to a subspace is exactly the least-squares problem — the backbone of fitting lines to noisy data — and it's nothing but a projection. Drag the vectors and watch the projection slide to the nearest point on the line, the error arrow staying perpendicular:

dot product & projection
uv
u·v = 11θ = 42.3°proj_v(u) = (2.59, 0.65) acute

Gram–Schmidt: subtract the shadow, repeat

Now the main event. Given any basis {v1,v2,v3,}\{v_1, v_2, v_3, \dots\} — possibly a skewed, ugly, embarrassing mess — Gram–Schmidt manufactures an orthogonal basis spanning the same space. The move is always the same: take the next vector, subtract off its shadow on everything you've already built, keep the remainder. What's left is perpendicular to all the earlier ones by construction. Simple as hell, powerful as anything in this course.

u1=v1,u_1 = v_1, u2=v2v2u1u1u1u1,u_2 = v_2 - \frac{v_2 \cdot u_1}{u_1 \cdot u_1}\, u_1, u3=v3v3u1u1u1u1v3u2u2u2u2,u_3 = v_3 - \frac{v_3 \cdot u_1}{u_1 \cdot u_1}\, u_1 - \frac{v_3 \cdot u_2}{u_2 \cdot u_2}\, u_2,

and so on. Each subtracted term is a projection — a shadow — and removing it leaves a vector orthogonal to that direction. At the end, normalize each uiu_i to get an orthonormal basis. Worked in full in Example 3. It's tedious arithmetic, not deep, but get one sign wrong and the orthogonality silently dies — so always check the dot products are zero at the end. That check is free and it has saved more homework than coffee has. Do not skip it.

Orthogonal matrices: the rigid motions

Stack an orthonormal basis into the columns of a matrix QQ. Then QTQ=IQ^T Q = I — the (i,j)(i,j) entry of QTQQ^T Q is exactly qiqjq_i \cdot q_j, which is 11 on the diagonal and 00 off it. A square matrix with QTQ=IQ^T Q = I is called orthogonal, and it has a superpower: it preserves lengths and angles. For any vv,

Qv2=(Qv)(Qv)=vTQTQv=vTIv=vv=v2.\|Qv\|^2 = (Qv) \cdot (Qv) = v^T Q^T Q v = v^T I v = v \cdot v = \|v\|^2.

Lengths are untouched; since dot products are preserved too, so are angles. Orthogonal matrices are exactly the rigid motions of space — rotations and reflections, the transformations that move the world without bending, stretching, or shearing it. Their determinant is ±1\pm 1 (+1+1 rotation, 1-1 reflection). And Q1=QTQ^{-1} = Q^T for free — inverting becomes transposing.

This is the last piece of equipment before the summit. The spectral theorem says every real symmetric matrix can be diagonalized by an orthogonal QQ — perpendicular axes, no distortion, no shit. You just built every single tool that sentence needs. Go do the gauntlet, then climb.

🔬 SPECIMENS (worked examples)

Worked example 1 — coordinates by dot product

The set {q1=12(11), q2=12(11)}\left\{ q_1 = \tfrac{1}{\sqrt2}\begin{pmatrix} 1 \\ 1 \end{pmatrix},\ q_2 = \tfrac{1}{\sqrt2}\begin{pmatrix} 1 \\ -1 \end{pmatrix} \right\} is an orthonormal basis of R2\mathbb{R}^2. Find the coordinates of v=(35)v = \begin{pmatrix} 3 \\ 5 \end{pmatrix} in this basis.

First confirm it's orthonormal: q1q2=12(11+1(1))=0q_1 \cdot q_2 = \tfrac12(1\cdot1 + 1\cdot(-1)) = 0 (perpendicular), and q12=12(1+1)=1\|q_1\|^2 = \tfrac12(1+1) = 1, same for q2q_2 (unit length). Good — so coordinates are just dot products.

c1=vq1=12(3+5)=82=42.c_1 = v \cdot q_1 = \tfrac{1}{\sqrt2}(3 + 5) = \tfrac{8}{\sqrt2} = 4\sqrt2. c2=vq2=12(35)=22=2.c_2 = v \cdot q_2 = \tfrac{1}{\sqrt2}(3 - 5) = \tfrac{-2}{\sqrt2} = -\sqrt2.

So [v]=(422)[v] = \begin{pmatrix} 4\sqrt2 \\ -\sqrt2 \end{pmatrix} in this basis. Check by reassembling: 4212(11)212(11)=4(11)(11)=(35)4\sqrt2 \cdot \tfrac{1}{\sqrt2}\begin{pmatrix}1\\1\end{pmatrix} - \sqrt2\cdot\tfrac{1}{\sqrt2}\begin{pmatrix}1\\-1\end{pmatrix} = 4\begin{pmatrix}1\\1\end{pmatrix} - \begin{pmatrix}1\\-1\end{pmatrix} = \begin{pmatrix}3\\5\end{pmatrix}. \checkmark No system solved — two dot products and we're done.

Worked example 2 — projection finds the closest point

Find the point on the line W=span{(11)}W = \operatorname{span}\left\{\begin{pmatrix}1\\1\end{pmatrix}\right\} closest to b=(40)b = \begin{pmatrix} 4 \\ 0 \end{pmatrix}, and confirm the error is perpendicular to the line.

The closest point is the projection of bb onto the line. With a=(11)a = \begin{pmatrix}1\\1\end{pmatrix}:

proja(b)=baaaa=(4)(1)+(0)(1)(1)(1)+(1)(1)(11)=42(11)=(22).\operatorname{proj}_a(b) = \frac{b \cdot a}{a \cdot a}\, a = \frac{(4)(1) + (0)(1)}{(1)(1)+(1)(1)}\begin{pmatrix}1\\1\end{pmatrix} = \frac{4}{2}\begin{pmatrix}1\\1\end{pmatrix} = \begin{pmatrix} 2 \\ 2 \end{pmatrix}.

So the closest point on the line is (22)\begin{pmatrix}2\\2\end{pmatrix}.

Confirm perpendicularity of the error. The error vector is

bproja(b)=(40)(22)=(22).b - \operatorname{proj}_a(b) = \begin{pmatrix}4\\0\end{pmatrix} - \begin{pmatrix}2\\2\end{pmatrix} = \begin{pmatrix} 2 \\ -2 \end{pmatrix}.

Dot it with the line's direction: (22)(11)=22=0\begin{pmatrix}2\\-2\end{pmatrix}\cdot\begin{pmatrix}1\\1\end{pmatrix} = 2 - 2 = 0. \checkmark The error is perpendicular to WW, which is precisely why (22)\begin{pmatrix}2\\2\end{pmatrix} is the closest point — drop a perpendicular and you've found the shortest route to the line. That's least-squares in miniature.

Worked example 3 — Gram–Schmidt run in full: three ugly vectors get orthogonalized

Apply Gram–Schmidt to v1=(110), v2=(101), v3=(011)v_1 = \begin{pmatrix}1\\1\\0\end{pmatrix},\ v_2 = \begin{pmatrix}1\\0\\1\end{pmatrix},\ v_3 = \begin{pmatrix}0\\1\\1\end{pmatrix} to produce an orthogonal basis of R3\mathbb{R}^3.

Step 1. u1=v1=(110)u_1 = v_1 = \begin{pmatrix}1\\1\\0\end{pmatrix}, with u1u1=2u_1 \cdot u_1 = 2.

Step 2 — subtract v2v_2's shadow on u1u_1. v2u1=(1)(1)+(0)(1)+(1)(0)=1v_2 \cdot u_1 = (1)(1)+(0)(1)+(1)(0) = 1, so

u2=v212u1=(101)12(110)=(1/21/21).u_2 = v_2 - \frac{1}{2}u_1 = \begin{pmatrix}1\\0\\1\end{pmatrix} - \frac12\begin{pmatrix}1\\1\\0\end{pmatrix} = \begin{pmatrix}1/2\\-1/2\\1\end{pmatrix}.

Scale by 22 for cleaner arithmetic (scaling preserves direction and orthogonality): u2=(112)u_2 = \begin{pmatrix}1\\-1\\2\end{pmatrix}, with u2u2=1+1+4=6u_2 \cdot u_2 = 1 + 1 + 4 = 6. Check u2u1=11+0=0u_2 \cdot u_1 = 1 - 1 + 0 = 0. \checkmark

Step 3 — subtract v3v_3's shadows on u1u_1 and u2u_2. v3u1=(0)(1)+(1)(1)+(1)(0)=1v_3 \cdot u_1 = (0)(1)+(1)(1)+(1)(0) = 1, so the u1u_1 shadow coefficient is 12\tfrac{1}{2}. v3u2=(0)(1)+(1)(1)+(1)(2)=1v_3 \cdot u_2 = (0)(1)+(1)(-1)+(1)(2) = 1, so the u2u_2 shadow coefficient is 16\tfrac{1}{6}.

u3=(011)12(110)16(112)=(01216112+161013)=(2/32/32/3).u_3 = \begin{pmatrix}0\\1\\1\end{pmatrix} - \frac12\begin{pmatrix}1\\1\\0\end{pmatrix} - \frac16\begin{pmatrix}1\\-1\\2\end{pmatrix} = \begin{pmatrix}0 - \tfrac12 - \tfrac16 \\ 1 - \tfrac12 + \tfrac16 \\ 1 - 0 - \tfrac13\end{pmatrix} = \begin{pmatrix}-2/3\\2/3\\2/3\end{pmatrix}.

Scale by 32\tfrac32: u3=(111)u_3 = \begin{pmatrix}-1\\1\\1\end{pmatrix}.

Verify orthogonality of the final set {(110),(112),(111)}\left\{\begin{pmatrix}1\\1\\0\end{pmatrix}, \begin{pmatrix}1\\-1\\2\end{pmatrix}, \begin{pmatrix}-1\\1\\1\end{pmatrix}\right\}: u3u1=1+1+0=0u_3 \cdot u_1 = -1 + 1 + 0 = 0, u3u2=11+2=0u_3 \cdot u_2 = -1 - 1 + 2 = 0, u2u1=0u_2 \cdot u_1 = 0 (above). All zero. \checkmark

To make it orthonormal, divide each by its norm: u1=2\|u_1\| = \sqrt2, u2=6\|u_2\| = \sqrt6, u3=3\|u_3\| = \sqrt3. Done — a perfect set of right angles built from a skewed one.

☠ KNOWN HAZARDS

  • Confusing "orthogonal" with "orthonormal". Orthogonal = perpendicular (dot to zero). Orthonormal = perpendicular and unit length. The free-coordinate formula cj=vqjc_j = v\cdot q_j needs the normal part; without unit length you'd divide by qj2\|q_j\|^2.

  • Normalizing too early in Gram–Schmidt. Subtract all the shadows first, building the orthogonal uiu_i; normalize only at the very end. Normalizing mid-stream just dumps error-prone square roots into every subsequent computation. Don't make your own life harder.

  • Forgetting to verify the dot products. Gram–Schmidt is sign-sensitive arithmetic. One slip and your "orthogonal" set isn't. Always confirm uiuj=0u_i \cdot u_j = 0 for the pairs at the end — it's a free, decisive check.

  • Thinking "orthogonal matrix" means its entries are zeros and ones. No — it means its columns form an orthonormal set. A 4545^\circ rotation matrix is orthogonal and full of 12\tfrac{1}{\sqrt2}'s.

TL;DR

  • An orthogonal set of nonzero vectors is automatically independent (dot the dependence relation with each viv_i; everything but one term dies).

  • In an orthonormal basis, coordinates are free dot products: cj=vqjc_j = v \cdot q_j — no system to solve.

  • Projection onto a subspace is the sum of shadows (bqi)qi\sum (b\cdot q_i)q_i, and it's the closest point in the subspace to bb (least squares). The error is perpendicular to the subspace.

  • Gram–Schmidt: uk=vkj<kvkujujujuju_k = v_k - \sum_{j<k}\frac{v_k \cdot u_j}{u_j \cdot u_j}u_j — subtract the shadow on everything built so far, then normalize.

  • Orthogonal matrices (QTQ=IQ^TQ = I) preserve lengths and angles — rotations and reflections — with Q1=QTQ^{-1} = Q^T and detQ=±1\det Q = \pm 1.

unlocks