Eigenvector / Eigenvalue for Machine Learning

Eigenvectors and eigenvalues have many uses. They can be used to eliminate strongly correlated features (principle components) and can help reduce over-fitting. It helps to think of them as a much simpler / smaller but "close enough" copy of a larger, more complex matrix. They summarize the complexity and extract the base "meaning" from the data. This is a form of "lossy" data compression but one which can be applied to complex matrices. Because they express commonalities while rejecting disparate perturbations, they are used to reduce noise in data.

An Eigen (eye-gun; "proper" in German) Vector is a type of vector. A vector is a list of numbers (called components) which describe a direction as well as magnitude, especially as determining the position of one point in space relative to another. For example, [1,1] is a vector that says "go up one, and go right one" assuming an X, Y plane. [10,10] has the same direction, but more than 10 times the magnitude. [1,5,3] might describe the relative motion in 3D space: X, Y, and Z. Vectors are a subclass of Matrix with only one row or column. A matrix with more than one row is written with semicolons ";" separating the rows. E.g. [a, b; c, d] is a two by two matrix with a, b as the first row and c, d as the second. The first column is a, c and the second b, d.

Math operations are applied in specific ways to all the components of a vector. For example, you can multiple every component by the same value. This does not change the vectors direction, only it's magnitude. Think about the vector [1,1]. If you go up one, and over one, you have traveled at 45 degrees. If you multiply this vector by 10, producing the new vector [10,10], and now you are going 10 times further, but you are still traveling at 45 degrees. This is true of any vector. How can we be sure? The angle of a vector is arctan(rise/run). And rise/run = rise*scale/run*scale so the scale drops out. Multiplying every component of a vector by the same value is called scaling, and scaling is a type of linear transformation.

There are other types of linear transformation, such as multiplication by a matrix which is commonly used in Machine Learning, especially in Neural Nets. For example, a very simple NN which computes the logical AND function, can be made by multiplying an input vector of [1, In1, In2] by a matrix (actually a vertical vector) of [-15; 10; 10] to give a value that is less than -5 if it's false and more than 5 if true. Or to rotate a point, [X, Y] by 90 degrees clockwise, multiply it by [0,-1;1,0] or for any angle, O use [cos(O), -sin(O); sin(O), cos(O)]

With most vectors, applying linear transformations other than scaling will cause the vector to change direction, but an eigenvector will not change direction when transformed linearly. Any linear transformation only has the effect of scaling an eigenvector. We could do the same thing by multiplying the eigenvector by a value; by simply scaling it. The value that is multiplied by the eigenvector, used to scale it, is called the eigenvalue.

So, if we have a matrix, called "A", an eigenvector "x" and an eigenvalue "l" and we multiply the matrix times the eigenvector, and then subtract the eigenvalue times the eigenvector, we should get zero. (Note: To add or subtract Matrices, every value in one is paired with the one in the same position in the other and they must be the same size; same number of columns, same number of rows. See Matrix: Addition for more)

First we have to make sure that A is not invertable. A matrix is invertable if there exists some other matrix, call it "B" which, when multiplied by A, or if A is multiplied by B, equals an identity matrix of the same size. An identity matrix is just a matrix of mostly zeros, with a diagonal pattern of 1's from upper left to lower right. e.g. For a size of 3, the identity matrix is:

So a matrix, A is invertable if A*B = B*A = I_n. B is the inverse of A, sort of like it's reciprocal. It is found by a complex series of operations, that it turns out we don't need. We can know the matrix is NOT invertable if it's determinant is zero. The determinant is a slightly less complex series of operations. For a 2x2 matrix, the determinant is the sum of the opposite corners multiplied together. For M = [a, b; c, d] it's a*d+b*c. For larger matrices, we can find the determinant by computing the determinant of a bunch of 2x2 matrix subsets of the full matrix in a specified order.

If we want to solve x*A - x*l = 0 for l, first we need to factor out the x but that would give us x*(A - l) and A is a matrix, while l is a scalier (single number) so they can't be subtracted: In matrix math, you can only subtract things of the same size. Luckily, a magical property of the identity matrix is that you can multiply any matrix times it and basically not change the value; it's like multiplying by 1 so we can multiply both sides by I_n where n is the size of A.

At this point, we need to make sure that we don't find a solution for l which leads us to a solution for an x which is zero. If we could just divide both sides by (A - I_n * l) then we would have x = 0/(A - I_n * l) which is x = 0. So to make sure that can't happen, we need to make sure that (A - I_n * l)is "non-invertable"; meaning that it isn't a valid denominator (remember this is matrix math so we can't divide, we can only invert). So we do that by assuming it's determinant is zero, which says it can't be inverted.

Following the rules of matrix math, we can expand all the calculations required to solve for l. This is not simple and is best done by a computer. E.g. in Octave or MatLab using eig ¹ . We will get back multiple possible values as there are more than one Eigenvalue for a given matrix. Given an Eigenvalue, we can put it back into the equation

in place of l and then, again, follow the rules of matrix math to solve for the values of x, the eigenvector. As ever, this is best done by a computer; the eig function can also return the eigenvectors with the eigenvalues.