In the 4th edition of this series, we will quickly go over the fundamental Linear Algebra concepts before we start building our machine learning algorithms using Octave. Don’t worry, we’ll keep the math to a minimal and only include what is absolutely needed.
What is linear algebra needed in machine learning?
As in the previous article, I had mentioned the importance of “vectorizing”. Basically, it means to convert your four ordinary operations of add, subtract, multiply and divide on large amounts of data into vector or matrix operations.
There are linear algebra libraries that are available for most programming languages (Octave comes built-in with it). Optimized by professional developers to ensure each operation is the most efficient it can be, using these libraries — as opposed to using for-loops or implementing such libraries yourself — can drastically improve the performance of your machine learning algorithms.
In order to increase the performance of the operations, these libraries require data to be stored as matrices.
With that being said, in the beginning, we will try to keep the usage of such advanced vectorizing techniques to a minimum and focus on creating easy-to-understand machine learning algorithms in the outset.
Once we have implemented many algorithms using simple techniques, we will then look at how to speed them up through vectorizing.
Nonetheless, you should still have a basic understanding of vector and matrix operations as they will still be used from time to time. Not to mention, you may stumble upon them when examining more advanced algorithms.
We’ll be using the Octave language we learned in the previous post to demonstrate these linear algebra operations. Don’t know it? No worries, we’ll just use it for demonstration purposes, so you don’t need to know it.
A matrix can be thought of as a container for numbers. The numbers are arranged in rows and columns.
When we talk about a matrix, we denote the number of rows it has and then the number of columns. So, for example, there could be a matrix A that has 5 rows and 3 columns, and we denote it as a 5×3 matrix.
Here’s a 5×3 Matrix:
Vectors are just matrices with one column. So, for example, we can have a vector, v, as below:
When adding a constant number to a matrix, the number is added to each matrix. So, for the matrix A above, if we add 5 to the matrix, we get:
Of course, subtraction works the same way as addition: it’s just adding the negative of a number.
When adding or subtracting two matrices, the dimensions of both the matrices must be the same. So, for example, we can do A + A, but we can’t do A + B where B has different dimensions:
Now, when we need to multiply to two matrices; there is a rule to be kept in mind:
The number of columns in the first matrix must be equal to the number of rows in the second matrix
So, for our matrix A, we can multiply it by a 3×6 matrix, for example.
Let’s call the result C. To get the first number of c, what we do is take the entire first row of A (1, 2, 3) and the first column of B (1, 4, 7) and we do this:
C = 1 * 1 + 2 * 4 + 3 * 7 = 30. This is the first element of C.
We basically multiply the first number of the row we took from A by the first number of the column we took from B, did this for all the numbers (here, 3), and added them together.
Makes sense? Maybe this picture can explain what we did:
So to get the (i, k) position of the resultant matrix, we take the ith row from the first matrix and the kth column of the second matrix.
Let’s take another example.
To get the element in the second row and third column of C, we will do:
From A, we take the row (4, 5. 6) and from B, we take the column (3, 6, 9). Then, the result is:
C(2, 3) = 4 * 3 + 5 * 6 + 6 * 9 = 96.
Still doesn’t make sense? Don’t worry, just keep in mind the golden rule for multiplying matrices as the computer does it for you anyways: you just have to make sure the dimensions of the matrices you provide are correct.
The Rule for Multiplying Matrices: To multiply two matrices A and B, the columns of A must match the rows of B. Hence, a (x, n) matrix multiplied by a (n, y) matrix will result in a x by y matrix.
Note: Matrix multiplication is not commutative due to the rule above (A x B is not equal to B x A, if B x A is even possible).
When we want to multiply or divide a matrix by a number (a constant) X, we just multiply or divide each number in the matrix by X.
The transposition of a matrix is “rotating” it 90 degrees clockwise and then reversing it. We use the superscript “T” or an apostrophe to indicate this operation.
The below image visually demonstrates the transposition of a matrix:
What’s the use of a transpose? Well, if we want to do something like A * A, we can’t do that as it will be (5×3) * (5×3), but if we do A * A’, it will become (5×3) * (3×5) which is definitely possible.
Hence, the transpose operator allows us to multiply a matrix by itself.
- For matrices, although we did not use it in the code samples above, we usually use upper case letters as in Matrix A while for vectors we use lower case letters as in vector a. In Octave, vectors are just matrices with one column so it does not matter that much.
- Matrix multiplication is not commutative but is associative.
Wow, we’re done already? Well, actually yes.
Although the field of linear algebra is pretty broad (it’s a course of its own in some places!), we just need the simple basics in order to get going with creating our machine learning algorithms in Octave.
In the next post, we will be finally be ready to implement our very first machine learning algorithm: Linear Regression!
Stay tuned for the next post; at the end of which, you will have your very own machine learning algorithm to feed it data of all sorts and decipher unseen patterns!
As always, if you didn’t understand something, leave them down in the comments below.
Until the next post!
To be the first to know when the next post comes out, make sure to follow us on Twitter at Trollgen Studios.