Example1.2.1
Consider \(x_1 + 2x_2 + 3x_3 + 4x_4 + 5x_5\). This is a linear combination of the five variables \(\{x_1, x_2, x_3, x_4, x_5\}\). The constants (\(1, 2, 3, 4,\) and \(5\)) are called the coefficients of the linear combination.
In this section we'll look much more closely at the “systems of equations” approach to linear algebra.
First a few words about notation. When there are seventeen variables in a problem it becomes really awkward to use different letters for each variable. When there are a thousand variables it's impossible! We will follow the almost universal convention that the letter \(x\) will be used for the variables, with a subscript to identify which one. If we were to translate the problem from Section 1.1 into this notation it would become \begin{gather*} x_1 + x_2 = 42 \\ x_1 - x_2 = 6. \end{gather*}
A linear combination of some set of numbers \(\{ x_1, x_2, \ldots, x_n \}\) is created by multiplying each of the \(x\)'s by constants and then adding everything up. Of course if the constants are \(1\) or \(-1\) (as in the previous example) we tend to forget that they're there!
Consider \(x_1 + 2x_2 + 3x_3 + 4x_4 + 5x_5\). This is a linear combination of the five variables \(\{x_1, x_2, x_3, x_4, x_5\}\). The constants (\(1, 2, 3, 4,\) and \(5\)) are called the coefficients of the linear combination.
An equation is linear if it has the form of a linear combination set equal to some value on the right-hand side -- or if it can be put into that form. For example \begin{gather*} x_1 + 2x_2 + 3x_3 + 4x_4 + 5x_5 = 15 \end{gather*} is a linear equation in five variables.
Also, \begin{gather*} x_1 + 3x_3 = x_2 + x_4 \end{gather*} is a linear equation (in four variables) because we can manipulate it into the form \begin{gather*} x_1 - x_2 + 3x_3 - x_4 = 0. \end{gather*}
A system of equations is just a collection of linear equations.
The notation for systems of equations gets a bit complicated when we try to write them in general (that is, without particular values given for the various constants involved). There are three sorts of things that need names in such a system: the variables, the coefficients of the variables, and the numbers on the right-hand sides. There is a convention that is fairly universal for the names and numbering of these elements. The variables are \(x\)'s with subscripts, the right-hand sides are \(b\)'s with subscripts, and the coefficients are \(a\)'s with two subscripts (we need to indicate the equation that a given coefficient is in and also, which variable it is multiplying.
For example, here is how we would write the general form of a system of three equations in four unknowns: \begin{equation*} \begin{alignedat}{5} \aij{1}{1} x_1 \amp {}+{} \amp \aij{1}{2} x_2 \amp {}+{} \amp \aij{1}{3} x_3 \amp {}+{} \amp \aij{1}{4} x_4 \amp {}={} \amp b_1 \\ \aij{2}{1} x_1 \amp {}+{} \amp \aij{2}{2} x_2 \amp {}+{} \amp \aij{2}{3} x_3 \amp {}+{} \amp \aij{2}{4} x_4 \amp {}={} \amp b_2 \\ \aij{3}{1} x_1 \amp {}+{} \amp \aij{3}{2} x_2 \amp {}+{} \amp \aij{3}{3} x_3 \amp {}+{} \amp \aij{3}{4} x_4 \amp {}={} \amp b_3 \\ \end{alignedat} \end{equation*} Notice that the indices on the \(x\)'s run from 1 to 4, the indices on the \(b\)'s run from 1 to 3, and that there are a total of 12 coefficients.
Suppose you have $\(10,000\) that you want to invest in the stock market. After some research you've found three companies that you think will be good investments. SolarCity Corp (SCTY) is trading at about $\(20\) per share. SunPower - Solar energy company (SPWR) is trading at about $\(9\) per share. First Trust Global Wind Energy (FAN) is at about $\(12\) per share. One equation you can immediately write down is \begin{gather*} 20 x_1 + 9 x_2 + 12 x_3 = 10000, \end{gather*} where \(x_1\) is the number of shares of SCTY we will buy, \(x_2\) is the number of shares of SPWR, and \(x_3\) is the number of shares of FAN.
If we said nothing further, we'd have just this one equation and there are many possible sets of values for the variables that satisfy it. Notice that there are two broad categories of companies represented in our stock picks -- solar energy and wind power. Perhaps we'd be wise to split our investment between them based on some rational theory, for the sake of argument let's say that we've been advised to use a 60/40 split between solar and wind. What was previously a single equation is now two: \begin{gather*} 20 x_1 + 9 x_2 = 60000,\\ 12 x_3 = 40000. \end{gather*}
Notice that the second equation uniquely determines the value of \(x_3\) but that the other variables still have a bit of freedom. (For instance, notice that we could set either \(x_1\) or \(x_2\) to \(0\), and the other variable's value would then be uniquely determined. Or, of course we could have some mixture where our $\(6,000\) is split up between the two companies. As it happens, these two companies are competitors and there is some probability that one will succeed and the other will fail. A wise investor tries to guess what that probability is and “hedge” their bets on the market. For the sake of argument let's say we think SCTY is three times more likely to come out the winner in this competition. You might be inclined to just buy only the SCTY stock, but that's not what a hedging strategy would indicate -- you should mix your investments in a proportion that reflects the probabilities involved. As an equation in the \(x\)'s we have \begin{gather*} 20 x_1 = 3 \cdot 9 x_2. \end{gather*}
At this point we've obtained a system of 3 equations in 3 variables which, after manipulating the last one a little bit, looks like the following. \begin{gather*} 20 x_1 + 9 x_2 = 60000\\ 12 x_3 = 40000\\ 20 x_1 - 27 x_2 = 0 \end{gather*}
It is usually a good idea to format your systems so that the variables in each equation line up in columns. \begin{equation*} \begin{alignedat}{4} 20 x_1 \amp {}+{} \amp 9 x_2 \amp \amp \amp {}={} \amp 6000 \\ \amp \amp \amp \amp 12 x_3 \amp {}={} \amp 4000 \\ 20 x_1 \amp {}-{} \amp 27 x_2 \amp \amp \amp {}={} \amp 0 \end{alignedat} \end{equation*}
A bit more formalism is appropriate now. We'll start with some definitions.
A linear system, also known as a system of linear equations is a collection of \(m\) equations in \(n\) unknowns of the form \begin{equation*} \begin{alignedat}{5} a_{11} x_1 \amp {}+{} \amp a_{12} x_2 \amp {}+{} \amp \amp {}\cdots {} \amp \amp {}+{} a_{1n} x_n \amp {}={} \amp b_1 \\ a_{21} x_1 \amp {}+{} \amp a_{22} x_2 \amp {}+{} \amp \amp {}\cdots {} \amp \amp {}+{} a_{2n} x_n \amp {}={} \amp b_2 \\ \amp \amp \amp \amp \amp \vdots \amp \amp \amp \amp \\ a_{m1} x_1 \amp {}+{} \amp a_{m2} x_2 \amp {}+{} \amp \amp {}\cdots {} \amp \amp {}+{} a_{mn} x_n \amp {}={} \amp b_m \\ \end{alignedat} \end{equation*}
Note that the doubly-indexed quantities (\(a_{ij}\)) as well as the singly-indexed quantities (\(b_i\)) are real numbers and that the \(m\) variables are indicated by \(x\)'s (with subscripts).
The use of variables with multiple indices in the above definition bears comment. First of all, note that we are trying to deal with the general situation where there is an unknown number of equations (\(m\)) in an unknown number of variables (\(n\)). Let's consider the \(b\)'s first — these are the constants that appear on the right-hand sides of the equations, so there are \(m\) of them. The situation for the \(a\)'s is more complicated. The \(a\)'s are the coefficients, they are constant numbers that the variables are multiplied by, and there are two indices on each of them. The first index tells us which equation we are in. The second index matches with the subscript on the variable. For example \(\aij{14}{23}\) would be the coefficient of \(x_{23}\) in the \(14\)th equation in a system.
What does it mean to say we have found an “answer” to a system of equations? Essentially, it is this: we have found a set of values for the variables that “work” in all of the equations. Sometimes people say that this set of values “satisfies” the equations. To be completely clear, what is meant is that if one substitutes these values for the variables in the equations of the system, all of them (the equations) will be true. It is convenient to regard such a set of values as a vector. For example the solution we obtained in Example 1.2.3 would be regarded as the vector \(\langle 225, 167, 333 \rangle\).
Given a system of \(m\) linear equations in \(n\) unknowns, the solution set of the system is the set of all vectors of length \(n\) that satisfy all \(m\) of the equations in the system.
Two linear systems are called equivalent if and only if they have identical solution sets.
The equivalence of linear systems is an example of what is known as an equivalence relation. Equivalence relations are used in theoretical mathematics when we are trying to capture the notion that two things — while not actually equal — are similar enough that we can treat them as being sort of a junior version of equal…
For a relationship to earn the title “equivalence relation” it must have a short list of properties. These properties are certainly true of the ordinary equals sign:
A relation is reflexive iff all elements are related to themselves.
A relation is symmetric iff whenever \(x\) and \(y\) are a pair of elements that are related, then \(y\) and \(x\) are also a pair that are related. (I.e. the order can always be reversed.)
Perhaps you've heard the phrase “Two things that are equal to a third must be equal to each other.” That's the essence of transitivity.
There really is much more that we should say about equivalence relations in general and the consequences that ensue when we can show that some relation is an equivalence relation. We refer the interested reader to chapter 6 in GIAM. In the remainder of this book we are going to see how very useful the notion of equivalence of linear systems can be. Hopefully this will give you some indication of how useful equivalence relations in general can be!
One final word about equivalence relations (in general) and the equivalence of linear systems (in particular): It is customary, when introducing this notion, to ask students to come up with a proof that shows that some given relation (in this case, equivalence of linear systems) is indeed an equivalence relation. Such proofs are actually relatively straightforward, but relax, we're going to let you off the hook this time! Showing that equivalence of linear systems is an equivalence relation is actually too easy. What one needs to do is show that it has each of the three properties: reflexivity, symmetry and transitivity. Each of those is an almost immediate consequence of the way this equivalence is defined. We define two systems to be equivalent if and only if they have the same solution set. In other words, equivalence is defined in terms of set equality. Set equality is definitely an equivalence relation, so it has the three properties. Finally, the arguments that show that equivalence of linear systems has the three properties all have the same form: in order to show that the equivalence of linear systems has a property we use the fact that set equality has that property. This is called inheritance.
The general idea is this: there are lots and lots of different linear systems that are equivalent. They all have the same solution set. Some of these systems are in a nice form that allows us to see what the solution set is. Others are not. We need to transform the latter into the former!
More specifically, there are three operations that can be applied to linear systems which do not have any effect on solution sets. We can apply these three operations in any way we like! We'll just be transforming our linear system into a slightly different one that is equivalent to the original. Finally, you'll see that it is pretty easy to strategize a bit and transform difficult linear systems into the nice sort (where the solution set is very evident) using these three operations.
The three operations go by many names; we'll refer to them as Reordering, Scaling and Combining. In the next few paragraphs we'll discuss each of them in turn and explain why they don't have an effect on the solution set of a system.
Reordering means what it sounds like. The solution set is determined by checking whether a given solution vector satisfies all of the equations. It is pretty clear that the order that the equations are listed in is of little importance. In many treatments of linear algebra an operation called “swapping” is used instead — swapping two equations is a special (particularly simple) instance of reordering and any more general reordering can be accomplished by a succession of swaps.
Scaling is another operation where it is fairly obvious that there will be no effect on solution sets. Scaling involves multiplying both sides of an equation by some non-zero constant. Very often that non-zero constant will be the reciprocal of the coefficient of one of the variables; scaling by such a constant is useful in solving for that variable. Perhaps it is clear that multiplying both sides of an equation by the same thing will have no impact on what values of the variables satisfy the equation… But why does the constant need to be non-zero? Multiplying both sides of any equation by \(0\) will produce a new equation that looks like \(0 = 0\) which is certainly true! In fact, of course, that's what the problem is; if the equation was previously false for some vector of variable values (thus it served to exclude that vector from the solution set) it will now be true. So vectors of variable values that previously were not in the solution set will now be in it — that's the sort of thing we are trying to avoid!
Combining (a.k.a replacement) is the most difficult of the three operations and as you might guess, it is also the most useful. Combining consists of adding a multiple of some other equation to a given one. Another way to think of this is that we replace some equation by itself plus a multiple of some other equation. This is probably why some people call this operation Replacement.
When we added the equations \(x+y=42\) and \(x-y=6\), obtaining the new equation \(2x=48\) back in Section 1.1 we were really doing a “Combining” operation. By the way, when we divided both sides of that new equation by \(2\) we were “Scaling.”
We'll close this section by giving an example — using the three operations to find the solution of a small linear system.
There is a unique solution to the following system of 3 equations in 3 unknowns. What is it? \begin{equation*} \begin{alignedat}{4} x_1 \amp {}+{} \amp x_2 \amp {}+{} \amp x_3 \amp {}={} \amp 21 \\ 2 x_1 \amp {}-{} \amp x_2 \amp {}+{} \amp x_3 \amp {}={} \amp 12 \\ x_1 \amp {}+{} \amp 3 x_2 \amp {}-{} \amp x_3 \amp {}={} \amp 17 \end{alignedat} \end{equation*}