Variance in the direction of a vector

Status: half-baked

This all is true in any number of dimensions, but let’s say we measure two things for a bunch of people. Perhaps it is height and weight, who knows. A measurement is then a vector x=(h,w)x = (h, w). If we have many measurements we can put them in a matrix X=(x1xn)X = (x_1 \ldots x_n).

Suppose we center these data by subtracting the average vector from each individual vector so that x̃i=xim\tilde x_i = x_i - m, with m=1nxim = \frac{1}{n} \sum x_i being the average. The total variance in these data is 1nx̃i2\frac{1}{n} \sum \lVert \tilde x_i\rVert^2, the average squared length of the centered vectors.

The variance along a particular vector vv is just the average squared length of your centered vectors if you project them onto vv.

For any orthogonal basis v1,v2v_1, v_2 we can have x̃=cv1+dv2\tilde x = cv_1 + dv_2 and because of Pythagoras we will have x2=cv12+dv22\lVert x \rVert^2 = \lVert cv_1 \rVert^2 + \lVert dv_2 \rVert^2. This means that we can always express the total variance as the sum of variances in orthogonal directions.

this file last touched 2024.01.22