Section1.1Getting started¶ permalink

The first problem we're going to look at is fairly trivial. I bet you can solve this in your head:

I'm thinking of two numbers \(x\) and \(y\). Their sum is 42, and their difference is 6. What are they?

This word problem can be instantly translated into a pair of equations. Later, when we have more sophisticated problems there may be many more unknown quantities and there may be many more equations. Here we are dealing with a system of equations having 2 equations in 2 unknowns. \begin{gather*} x+y=42\\ x-y=6 \end{gather*}

This one is about as easy as a system of two equations in two variables can get. Actually, that's not quite true. The easiest form for a system of two equations in two unknowns is if they basically just are statements of the answer, like: \begin{gather*} x=24\\ y=18. \end{gather*} Solving a system of equations just means (somehow) transforming it from something like the first form to something like this latter form.

There are a small number of simple procedures that we can apply to systems without effecting their solutions. We can use these operations to convert almost any system into one that looks like that latter form (each equation just states what the value of some variable is). We'll get around to the full story in section 1.2, but for now, notice that if we add the two original equations together (adding equations means adding left sides and adding right sides separately) we get something that only involves \(x\). And of course, once we know one of the variables it isn't very hard to find the other.

For this example problem, finding the solution was very easy. There are more difficult systems where finding the solution by hand would be challenging so we are going to want to become familiar with some kind of computer tools for automating these things. In this book, we'll be using Sage, a free, open-source, computer algebra system developed by William Stein. Here is a sample of how Sage can be used to solve a system of equations:

We glossed-over a small but important issue in the above. How do we know that our answer was the only answer? And for that matter, is it necessarily true that there must be an answer to some system of equations? These are what are known as existence and uniqueness questions: Does there exist an answer to our problem? (Existence.) And, if there is an answer, how do we know it is the only answer? (Uniqueness.) There are systems of equations where all of the possible behaviors are exhibited: no solutions, unique solutions and lots of solutions.

Exercise1.1.1

That was a linear algebra problem seen from the “systems of equations” perspective. We still need to look at the “vector equations” and “transformations” viewpoints. So next we'll look at a question of the vector flavor. We're going to think about playing chess, not on a board, but on the infinite \(x\)—\(y\) plane.

Consider the piece known as a bishop. If you're not familiar with chess, this is the piece that can move in the diagonal directions. Think of the bishop as having two moves that it can do (but it can do them any number of times). It can do a move we'll refer to as UR; move one unit in the \(x\) direction while simultaneously moving one unit in the \(y\) direction — by doing this multiple times the bishop can travel in the upper right direction. It also has a move that allows it to travel along the other diagonal — move one unit in the \(x\) direction while simultaneously moving negative one unit in the \(y\) direction. We'll call that move LR.

For those who are familiar with chess, you'll know that bishops are forever trapped on the same color square — one of your bishops is always on black and the other always on white. This means that some “bishop moving questions” won't have solutions — for example, a bishop sitting at the origin, \((0, 0)\), can never move to \((0, 5)\); those squares have opposite colors! To get around this limitation we're going to let our bishops make fractional moves. For instance if it starts at the origin and makes \(1/2\) of the upper-right move then it will arrive at \((1/2, 1/2)\). Now, getting a little stranger, we're going to also allow our bishops to make negative moves. Maybe we should think of a negative move as “undoing” a regular move…

In any case negative moves allow us to move the bishop in the opposite directions along the diagonals. Finally, we may as well give our bishops the freedom to move any amount — that is, any real number can be used as a so-called scalar, shrinking or stretching either of the two basic moves. Got it? We can do things like \(\pi \cdot UR\) and \(\sqrt{2} \cdot LR\).

So, after all that setup, here's the question: If a bishop starts at \((0,0)\), can it make some number of UR and LR moves and wind up at \((42,6)\)? If so, how many URs and how many LRs?

The things we've been calling UR and LR are vectors. If you ask someone from the physical sciences to define a vector they'll say “it's a thing that has both a magnitude and a direction”. (Which is fine as far as it goes.) Meteorology provides some nice examples. A weather map often shows a lot of basic data about the conditions at various places — wind, temperature, barometric pressure and humidity are common. Of these, only the wind is a vector quantity, it needs to be specified with both a magnitude and a direction (e.g. 15 mph out of the Northeast), the others all just have magnitudes.

There is a different way of thinking about what a vector is, that is preferable in many circumstances. A vector is the difference between two positions. Let me put this another way: a vector gives you a set of directions to go from one point to another. (I mean “directions” in the sense of the things someone tells you if you ask “How do I get to the Kwik-E-Mart from here?”)

If you are currently at the point \((3,4)\) and you want to move to the point \((5,12)\) you need to increase your \(x\)-coordinate by 2 units and you must increase your \(y\)-coordinate by 8 units. We just described the vector \(\langle 2, 8 \rangle\), the numbers \(2\) and \(8\) are known as the components of the vector. Note that this is different in a not-so-subtle way from the point \((2,8)\). The point is stationary, the vector is there to describe a change. If you start at the origin and follow the directions specified by the vector \(\langle 2, 8 \rangle\) you will of course wind up at the point \((2,8)\), but if you start at some other point, it's equally obvious that you won't!. Sometimes people will talk about “position vectors” in this sort of context — the position vector \(\langle x,y \rangle\) goes from the origin to the point \((x,y)\). Generally, it is preferable to keep the distinction between points and vectors clear. When you treat a vector as a position vector (i.e. think of it as a point) you are loosing something. Ordinarily a vector is free; it can be slid around from one point to another so long as its components aren't changed.

Here's how solving the vector variant of our problem might look in Sage:

So, at this point we've looked at a simple linear algebra problem from the systems of equations perspective and from the vector equations perspective. The final perspective we want to illustrate is that of linear transformations.

Basically, a linear transformation is a function that takes vectors as inputs and spits out vectors as outputs. You're probably familiar with the following sort of diagram for functions.

In Multivariable Calculus you may also encounter functions that are diagramed like so:

The first is a real-valued function of two variables — think of it as taking a vector as input and returning a scalar. The second is a vector-valued function of a single real variable. The mapping that gives temperature as a function of position on a metal plate is an example of the first sort. When we represent the position of a particle moving around in space (as a function of time) we are using the second sort.

Linear transformations are functions where there are vectors on both the input and the output side.

Moreover, linear transformations are linear, which means the components of the output are computed in a very simplistic way from the components of the inputs. The only things that are allowed are adding things up and multiplying by constants.

So let's give an example of a linear transformation. This will be a function that takes a vector \(\langle x, y \rangle\) as input, and returns a vector \(\langle u, v \rangle\) as output. We will compute \(u\) and \(v\) (the components of the output vector) from \(x\) and \(y\) (the components of the input vector by “adding things up and multiplying by constants”: \begin{gather*} u = x+y\\ v = x-y \end{gather*}

By convention, people usually call a linear transformation \(T\) and use a notation that looks just like Euler notation for functions (because in fact, that's what it is!) \begin{gather*} T( \langle x,y \rangle ) = \langle u, v \rangle . \end{gather*} There are two kinds of problems one can ask: maybe you know the input vector and you'd like to find the output vector, or vice versa. When you've got the input it's very easy to find the output! You just plug in. The more interesting question is when it's vice versa, suppose you know that \(\langle u, v \rangle = \langle 42, 6 \rangle\) how can you arrive at the solution \( \langle x,y \rangle = \langle 24, 18 \rangle \)? We'll be looking at this kind of thing in more depth in Section 1.4.