Section1.5Matrix notation

The three seemingly distinct viewpoints we've considered are unified by the concept of a matrix.

The word “matrix” is from Latin. The word entered the English language with a variety of meanings — in Latin it means womb. In mathematics, matrix (pl. matrices) always means a table containing numerical values. It is rather hard to guess how a word meaning “uterus” could get morphed into one meaning “table of numbers”, but languages are funny that way…

Generally speaking, a table of numbers will have some arbitrary number of rows and of columns. There are some special cases that we'll need to talk about, but let's look at the general situation first. We'll use the variable \(m\) to refer to the number of rows in a matrix and the variable \(n\) to refer to the number of columns. We'll use upper-case letters (about 90% of the time: \(A\)) to refer to the whole table as a single entity, in which case we'll speak of \(A\) being an \(m \times n\) matrix. The entries of a matrix will usually be denoted using the corresponding lower-case letter with two subscripts. This is (hopefully) reminiscent of the doubly-indexed quantities we saw near the end of Section 1.4; the components of a linear transformation.

Example1.5.1matrix notation

Here are a couple of matrices: \begin{equation*} A = \left[ \begin{array}{ccc} 1 \amp 4 \amp 9 \\ 7 \amp \pi \amp 42 \end{array} \right] \quad \mbox{and} \quad B = \left[ \begin{array}{cc} -1 \amp 11 \\ -3 \amp e \end{array} \right]\end{equation*}

Notice how we are referring to the entire tables with the variables \(A\) and \(B\)? If we need to refer to the individual entries of a matrix we'll write things like \(\aij{2}{3} = 42\) (the number in the 2nd row and 3rd column of \(A\) is 42), or \(\bij{1}{2} = 11\) (the number in the 1st row, 2nd column of \(B\) is 11).

It's also fairly common to ignore this lower-case convention! That is, you may also see things like \(A_{1\:\!3} = 9\) and \(B_{2\:\!2} = e\).

Now to the special cases. When the number of columns is \(n=1\), the matrix is known as a column vector. When the number of rows is \(m=1\), the matrix is known as a row vector. There is clearly a choice to be made as to whether the things we have been referring to as (merely) “vectors” are going to be represented as column vectors, or as row vectors. Here's a surprising thing! Your Calculus teachers and I (up until now) have been lieing to you. When we wrote vectors as (for example) \(\vec{v} = \langle 1, 2, 3 \rangle\), it was only for convenience. A row of numbers fits more easily on the page than a column does. For a variety of reasons it makes sense to treat vectors as columns of numbers, not rows.

There is an operation known as transposition that changes row vectors into column vectors and vice versa. The transpose of a matrix is indicated by a superscript T, the rows of the transposed matrix are the columns of the original matrix and its columns are the original matrix's rows. This idea (interchanging rows and columns) is surprisingly important and we'll be using it quite a bit in the future. For the moment let's just notice that it gives us a nice way to write a column vector — with the typographical advantage that the components appear in a row!

To summarize what the last few paragraphs have said: It is technically not right to write \(\vec{v} = \langle 1, 2, 3 \rangle\), we should really write \(\vec{v} = \left[ \begin{array}{c} 1 \\ 2\\ 3 \end{array} \right]\), but that takes up too much vertical space so instead we write \(\vec{v} = [ 1 \; 2 \; 3 ]^T\). This may all seem like too high of a price to pay for accuracy, but it will pay future dividends if we start thinking now about rows and columns and how to switch between them.

If we only had row and column vectors to worry about we'd probably find some other way to distinguish them — maybe there'd be red vectors and blue vectors!

Note1.5.2

In Physics (especially in the Tensor Analysis which is used in e.g. General Relativity) they distinguish between covariant and contravariant indices. An entity with a single contravariant index is a vector, if instead there is a single covariant index it is known as a co-vector. These concepts aren't identical to row/column vectors, but nevertheless, contravariant vectors are usually written as columns and covariant vectors as rows.

By convention there is no need to refer to the entries of a row or column vector using double indices — one of them would always be 1 so we can omit it. When we have more general matrices, where \(m\) and \(n\) are both greater than \(1\), the roles of rows and columns are more evident and two indices will be necessary to refer to the entries.

One useful way to think about matrices is the following: When we write down a system of equations, a lot of the symbols that we write are redundant. If we eliminate all of the stuff that is utterly predictable we are left with a table of numbers — in other words, a matrix. So one way to think of matrices is that they are highly abbreviated ways of referring to a system of linear equations. In this scheme the rows of the matrix correspond to the individual equations in the system and the columns contain all the coefficients that multiply a given variable. A short example will probably help:

Example1.5.3Converting a linear system to matrix form

Consider the following system of \(3\) equations in \(4\) variables. \begin{equation*} \begin{alignedat}{8} x_1 \amp {}+{} \amp x_2 \amp \amp \amp {}+{} \amp 3 x_4 \amp {}={} \amp 101 \\ 2 x_1 \amp {}-{} \amp x_2 \amp {}+{} \amp x_3 \amp {}+{} \amp x_4 \amp {}={} \amp 102 \\ \amp \amp 3 x_2 \amp {}-{} \amp x_3 \amp {}+{} \amp 2x_4 \amp {}={} \amp 103 \end{alignedat} \end{equation*}

Now we'll take one step backwards before proceeding two steps forward. If a variable appears, but has no coefficient, that just means the coefficient is \(1\). If a variable doesn't appear at all, that means the coefficient is \(0\). Finally, if we see subtraction we can always replace it by addition (by putting a minus sign on the coefficient). So, let's re-express this system in a fully anal-retentive way… \begin{equation*} \begin{alignedat}{8} 1x_1 \amp {}+{} \amp 1x_2 \amp {}+{} \amp 0 x_3 \amp {}+{} \amp 3 x_4 \amp {}={} \amp 101 \\ 2x_1 \amp {}+{} \amp {-1}x_2 \amp {}+{} \amp 1 x_3 \amp {}+{} \amp 1 x_4 \amp {}={} \amp 102 \\ 0x_1 \amp {}+{} \amp 3x_2 \amp {}+{} \amp {-1} x_3 \amp {}+{} \amp 2 x_4 \amp {}={} \amp 103 \end{alignedat} \end{equation*}

Okay, so now the promised two steps forward. First, notice that in every equation in the system every variable is present and they all appear in ascending order. If we were only given the lists of coefficients we'd easily be able to reconstruct the equations. So, we're going to eject all of the plus signs and all of the variables with all of those subscripts. We just won't deign to write them down! Sometimes it's a good idea to imagine their presence but it certainly isn't necessary to. Also, the equals signs that separate the left- and right-hand sides of the equations always come before the very last number. There really isn't a lot of information conveyed by the appearance of those equals signs, but we usually keep a slight vestige of them around — a thin vertical line separates the last column of the matrix form from everything else. So, with no further ado, here is the matrix form of this system: \begin{equation*} \left[ \begin{array}{rrrr|r} 1 \amp 1 \amp 0 \amp 3 \amp 101 \\ 2 \amp {-1} \amp 1 \amp 1 \amp 102 \\ 0 \amp 3 \amp {-1} \amp 2 \amp 103 \end{array} \right] \end{equation*}

In the previous example the final matrix we wrote is actually known as the augmented matrix of the system. Sometimes it is a good idea to separate out the part of the matrix that appears to the left of the thin vertical line. That part is known as the coefficient matrix of the system. This isn't just pedantry! In many real-world applications we need to solve bunches of linear systems that all have the same coefficient matrix — so they only differ in the final column (a.k.a. the augmented column) of their augmented matrices. We can take advantage of such a situation, essentially solving all of the systems while only doing the work of solving the first one!

Matrix notation was probably invented purely out of laziness. When we use the Re-ordering, Scaling and Combining operations that we introduced in Section 1.2, we find ourselves having to re-copy the entire system over and over. By switching to matrix notation we get a considerable savings in effort. The operations that we originally developed to use on equations now become operations that one can apply to the rows of a matrix — a.k.a. row operations — which we will study in much greater depth in Section 2.4. Regardless of the origins of matrix notation, nowadays we don't think of matrices only in terms of being abbreviations for linear systems. They have taken on a life of their own!

There are two features of matrices that we'll explore in the remainder of this section. The first is that matrices may be thought of as “funny shaped” vectors. The second is that, under certain conditions, we can multiply matrices. If you've already studied multi-variable calculus (and perhaps even if you haven't) you'll have run into the dot product (a.k.a. scalar product) and the cross product (a.k.a. vector product) in \(\Reals^3\). No matter what the dimension of the space, there is always a dot product. On the other hand, there is usually nothing analogous to the cross product — it depends on a very special coincidence, an odd fact about the space \(\Reals^3\). The dot product is a way of multiplying vectors, but the product is not a vector. On the other hand, the cross product does result in a vector. Matrices (as “funny shaped” vectors) give us a way of multiplying vectors and getting other vectors.

The most important thing with vectors is that we need to be able to add them. The second most important thing is that we should know how to scale them.

If \(A\) and \(B\) are matrices, what would it mean to add them? As was the case with vectors, it doesn't make any sense to add them unless they are the same size. With vectors they needed to have the same number of components in order to even think about adding them. With matrices the restriction is even stronger; they need to have the same number of rows and of columns. Provided that that restriction is met, we just add the corresponding entries.

Definition1.5.4matrix addition

If \(A\) and \(B\) are both \(m \times n\) matrices, their sum, \(A+B\) is also an \(m \times n\) matrix. For all integers \(i\) and \(j\) satisfying \(1 \leq i \leq m\) and \(1 \leq j \leq n\), the entry in the \(i\)th row and \(j\)th column of \(A+B\) is \(\aij{i}{j} + \bij{i}{j}\).

Scaling also works in much the same way as it did with vectors. If we multiply a scalar and a matrix, every entry of the matrix is multiplied by the scalar.

Definition1.5.5matrix scaling

If \(A\) is an \(m \times n\) matrix, and \(s\) is a real number, the scalar product, \(sA\) is also an \(m \times n\) matrix. For all integers \(i\) and \(j\) satisfying \(1 \leq i \leq m\) and \(1 \leq j \leq n\), the entry in the \(i\)th row and \(j\)th column of \(sA\) is \(s\cdot\aij{i}{j}\).

Example1.5.6vector properties of matrices

Let \(A = \left[ \begin{array}{cc} 1 \amp -1 \\ -1 \amp 2 \end{array}\right]\) and \(B = \left[ \begin{array}{cc} 0 \amp 1 \\ 2 \amp 3 \end{array}\right]\). These matrices are both \(2 \times 2\) so their sum is defined. \begin{equation*} A+B = \left[ \begin{array}{cc} 1 \amp 0 \\ 1 \amp 5 \end{array}\right]\end{equation*}

Let's also provide an example of scaling. If we scale the matrix \(A\) by a factor of \(3\) we get \begin{equation*} 3A = \left[ \begin{array}{cc} 3 \amp -3 \\ -3 \amp 6 \end{array}\right]\end{equation*}

Exercise1.5.7linear combinations of matrices

So that was nice! Once we know how to add matrices and how to multiply them by scalars, we can form linear combinations. Next we'll look at multiplying our funny shaped vectors…

The easiest example (and also a very instructive example) of multiplying vectors is the product of a row and a column vector. Provided they have the same number of entries, a row vector times a column vector produces a \(1 \times 1\) matix — also known as a real number. You have almost certainly seen this before! The dot product of two vectors is actually a row/column matrix product. In fact, in many settings they will write \(\vec{x}^T\vec{y}\) rather than \(\vec{x} \cdot \vec{y}\) when referring to the dot product. As you move towards more advanced math the tendency will be to call this the “inner product” rather than the “dot product”, one reason to make the change (other than it sounds more sophisticated) is that there is also an “outer product” of vectors which is what you get if you multiply a column times a row. As we'll see shortly, \(\vec{x}^T\vec{y}\) and \(\vec{x}\vec{y}^T\) are extremely different! Anyway, we need to do this row/column product as a component of the general matrix product computation so let's proceed to over-explain it by some huge factor…

If you've ever done the challenge where you rub your belly in a circular motion while simultaneously patting your head, then this shouldn't be too difficult. What you need to do is trace across the entries of a row with your left index finger, while simultaneously tracing down the entries of a column with your right index finger. As you encounter the entries you multiply them and keep a running tally of the sum of these products.

Example1.5.8an inner product

Suppose \begin{equation*}\vec{x} = \left[ \begin{array}{c} 3 \\ 1 \\ -2 \\ 5 \end{array} \right] \quad \mbox{and} \quad \vec{y} = \left[ \begin{array}{c} -1 \\ 6 \\ 4 \\ 7 \end{array} \right] \end{equation*} then the inner product of these two vectors (\(\vec{x}^T\vec{y}\)) is the following row/column matrix computation: \begin{equation*}\vec{x}^T \vec{y} \quad = \quad \left[ \begin{array}{cccc} 3 \amp 1 \amp -2 \amp 5 \end{array} \right] \cdot \left[ \begin{array}{c} -1 \\ 6 \\ 4 \\ 7 \end{array} \right] \\ = \quad 3\cdot (-1) + 1\cdot 6 + (-2)\cdot 4 + 5\cdot 7 \quad = \quad 30\end{equation*}

Notice that if the vectors had different lengths (I mean “lengths” as in “number of entries”) the process we've described wouldn't work out so good… One of your fingers would be out of entries before the other! This is our first example of an idea known as conformability. Suppose we have a row vector of length \(m\) (that is, a \(1 \times m\) matrix) and a column vector of length \(n\) (in other words a \(n \times 1\) matrix), then they are conformable if \(m=n\) and if \(m\neq n\) they are not conformable, in which case the matrix product can't be computed.

The general rule for computing matrix products involves doing this row/column product multiple times. Suppose \(A\) is a \(p \times q\) matrix and \(B\) is an \(r \times s\) matrix. The product \(AB\) will be a \(p \times s\) matrix, but it can only be computed if \(q=r\). The entry in the \(p\)-th row and \(s\)-th column of the result is obtained using the \(p\)-th row of \(A\) and the \(s\)-th column of \(B\). When you physically write the sizes of the multiplicands next to one another, the inner two numbers must match and the outer two numbers tell you the size of the result!

Definition1.5.9matrix conformability

Suppose \(A\) is a \(p \times q\) matrix and \(B\) is an \(r \times s\) matrix. If \(q=r\) these matrices are conformable and the product \(AB\) can be computed.

Note1.5.10

Conformability has a directionality. If \(A\) and \(B\) are conformable it is not necessarily the case that \(B\) and \(A\) are conformable. Matrices fail to obey the commutative law in a fairly spectacular way! It is not generally the case that \(AB = BA\). Indeed, quite often it is impossible to compute the product \(BA\), even given that it is possible to compute \(AB\).

Definition1.5.11matrix product

Suppose we are given two matrices, \(A\) and \(B\) that are conformable for matrix multiplication, further, suppose that \(A\) is \(m \times n\) and \(B\) is \(n \times p\). The matrix product \(AB\) will be an \(m \times p\) matrix. The entry in the \(i\)-th row and \(j\)-th column of \(AB\) is \begin{equation*} AB_{ij} \; = \; \sum{k=1}^n A_{ik}\cdot B_{kj}\end{equation*}