A transformation is a function whose inputs and outputs are vectors. In order to discuss concepts like the range and domain of a transformation we'll need some terminology for sets of vectors. When we are considering the set of all possible vectors of some type it is known as a vector space. At first, we are going to be looking at the most basic and fundamental sorts of vector spaces — where the vectors are ordered tuples of real numbers — but be advised that later we will see that there are many other sorts of vectors!
Definition1.4.1Real Euclidean spaces
Given a positive integer \(n\) we define the real Euclidean space of dimension \(n\) (denoted \(\Reals^n\)) to
be the set of all ordered \(n\)-tuples of real numbers.
\begin{gather*}
\Reals^n \; = \; \{ \langle v_1, v_2, \ldots , v_n \rangle \, \suchthat \, \forall i, 1 \leq i \leq n, \, v_i \in \Reals \}
\end{gather*}
Recall that the domain of a function is the set from which the inputs come. The set where the outputs may appear is known as the codomain of the function. The codomain must be contrasted with the range which is the set of outputs that actually do occur. We are going to be presuming a certain familiarity with the basic terminology used with functions. You can skip over the following list of (informal) definitions if you are already familiar.
- domain
The set of all inputs for a function. The domain is sometimes specified while defining the function, but if it isn't, the convention is to use the biggest possible set for the domain.
- codomain
The set where the outputs of a function lie.
- range
The set of outputs that actually occur. (The range is generally a subset of the codomain.)
- image
If an element, \(x\), of the domain is given, we refer to \(f(x)\) as the image of \(x\).
- pre-image
If we have some \(y\) (an output) in mind, any \(x\) (an input) such that \(f(x) = y\) is called a pre-image of \(y\).
There is a bit of an asymmetry in the way we speak of the various sets that are related to a function. On the output side we have the codomain and the range. On the input side we have only the domain. There is no agreed upon name for a set that contains the domain, we simply insist that the function must be defined for every element of the domain (which basically sidesteps the issue). For the ordinary functions that one sees in calculus, the codomain is the real numbers; the range and domain are generally subsets of the real numbers. And so, the situation isn't terribly complex. When we are dealing with transformations things are harder. The domain and codomain of a transformation are generally real Euclidean spaces — potentially of different dimensions — so we will usually want to spell out what sorts of vectors are expected as inputs, what sorts of vectors will we see as outputs and only then do we get around to the heart of the matter: how do we compute the output from the input? We'll introduce the notation for a transformation via an example and then treat the general case.
Example1.4.2an example transformation
Let's look at a transformation that takes vectors of length 6 as inputs, and outputs vectors of length 3. We'll refer to the input vector as \(\vec{x}\) and, as usual, its components will be \(x\)'s with subscripts:
\(\vec{x} = \langle x_1, x_2, x_3, x_4, x_5, x_6 \rangle\). Similarly, the output will be \(\vec{y} = \langle y_1, y_2, y_3 \rangle\). This is only an example so we'll just make up the rules that determine those output components from the input components, the point here is simply to demonstrate how one should write such a thing — which is as follows:
\begin{gather*}
T:\Reals^6 \longrightarrow \Reals^3\\
T(\langle x_1, x_2, x_3, x_4, x_5, x_6 \rangle) \quad = \quad \langle x_1, x_3, x_5 \rangle .
\end{gather*}
So this transformation just picks out the odd-numbered components of \(\vec{x}\) and puts them in \(\vec{y}\).
The most important transformations for us in this context are the linear ones. In a linear transformation, the components of the output vector are computed from the components of the input vector by “multiplying by constants and adding everything up.” Because of the simplistic way that the outputs are computed there is really nothing that can go wrong! With ordinary functions from \(\Reals\) to \(\Reals\) we usually look at the rule for computing the output and recognize certain values that must be eliminated from the domain — typically where one sees “division by zero” or “square root of a negative” errors. No such problem can arise with linear transformations, the domain will always be a real Euclidean space of some dimension. Similarly, the codomain will be a real Euclidean space; one whose dimension is simply the number of components in the output vectors. The dimensions of the domain and codomain are easy to think about — how many components do the input and output vectors have? The range of a linear transformation is slightly more complicated. The output vectors that actually occur will certainly be vectors having the number of components as specified by the codomain, but do all such vectors necessarily have to appear as outputs? In general, no.
The notation for a linear transformation first spells out the domain and codomain and then gives the rule(s) for computing the output. Thus the domain and codomain are known in advance; we need to do a little extra work to figure out the range.
Before proceeding further we'll give some formal definitions.
Definition1.4.3Transformations
Given positive integers \(m\) and \(n\), a transformation from \(\Reals^m\) to \(\Reals^n\) is a function, \(T\), that takes vectors of length \(m\) as inputs and returns vectors of length \(n\). We write
\begin{gather*}
T:\Reals^m \longrightarrow \Reals^n\\
T(\vec{x}) \quad = \quad \vec{y},
\end{gather*}
where the components of the vector \(\vec{y}\) will need to be specified in terms of the components of \(\vec{x}\).
Definition1.4.4Domain of a transformations
The domain of a transformation, \(T\) is denoted by \(\Dom{T}\) and is generally a subset of \(\Reals^m\) (provided \(T\) is defined as above).
\begin{gather*}
\Dom{T} \; = \; \{ \vec{x} \in \Reals^m \suchthat T(\vec{x}) \, \mbox{is defined} \}
\end{gather*}
Definition1.4.5Co-domain of a transformations
The codomain of a transformation, \(T\) is denoted by \(\Cod{T}\) and is equal to \(\Reals^n\) (provided \(T\) is defined as above).
Definition1.4.6Linearity
A transformation \(T\) is linear if and only if given any two elements \(\vec{u},\vec{v} \in \Dom{T}\) and any two real numbers \(\alpha\) and \(\beta\) we have
\begin{equation*} T(\alpha \vec{u} + \beta \vec{v}) \; = \; \alpha T(\vec{u}) + \beta T(\vec{v}).\end{equation*}
Linearity is a really important concept! We will be using the definition above over and over again. Let's try to nail down our understanding of this definition by translating it into ordinary language: A transformation is linear if and only if when you apply it to a linear combination of vectors, the result is equal to what you get if you form the same linear combination of the images of those vectors. More succinctly: “The image of a linear combination is the same linear combination of the images.” My advice (seriously!) is to treat that last phrasing like a mantra — repeat it to yourself until you fully absorb the meaning and it becomes second nature to you.
Look back at the formal definition of linearity, and notice what it looks like symbolically: It appears as if the transformation \(T\) distributes over the sum and that the scalars can be moved to the outside of the \(T\)'s.
Sometimes an alternative definition of linearity is given which splits out these two issues. This is sometimes useful in formulating a proof that some transformation is linear (because it separates the argument into simpler parts).
Definition1.4.7Linearity (alternate definition)
A transformation \(T\) is linear if and only if given any two elements \(\vec{u},\vec{v} \in \Dom{T}\) and any real number \(\alpha\), both of the following hold:
\begin{equation*} T(\vec{u} + \vec{v}) \; = \; T(\vec{u}) + T(\vec{v}),\end{equation*}
and
\begin{equation*} T(\alpha \vec{u}) = \alpha T(\vec{u}). \end{equation*}
Before we can go any further we have a small moral obligation to take care of. Since we've just presented two definitions for a concept we have a duty to verify that they actually define the same concept. If we state that two things are the same, that really aren't, we're making a false equivalence. One of the hallmarks of a good critical thinker is that they won't be taken in by false equivalences. So, what do you think? Are they definitely the same idea, or are there transformations that are linear by one definition but not by the other?
Theorem1.4.8The two definitions of linearity are equivalent
Consider a given transformation \(T\) from \(\Reals^m\) to \(\Reals^n\). Let \(\vec{u}\) and \(\vec{v}\) be arbitrary vectors in \(\Reals^m\), also let \(\alpha\) and \(\beta\) be arbitrary real numbers. Then
\begin{gather*}
T(\alpha \vec{u} + \beta \vec{v}) \; = \; \alpha T(\vec{u}) + \beta T(\vec{v})\\
\end{gather*}
if and only if
\begin{gather*}
T(\vec{u} + \vec{v}) \; = \; T(\vec{u}) + T(\vec{v}) \quad \mbox{and} \quad T(\alpha \vec{u}) = \alpha T(\vec{u})
\end{gather*}
Proof
(⇒)
In this part of the proof we will be presuming the first statement (the definition of linearity given first) and showing that the second statement must be true.
Assume that \(T\) is a transformation and that for every pair of vectors \(\vec{u}\) and \(\vec{v}\), and every pair of real numbers \(\alpha\) and \(\beta\),
\begin{equation*}T(\alpha \vec{u} + \beta \vec{v}) \; = \; \alpha T(\vec{u}) + \beta T(\vec{v}).\end{equation*}
if we set \(\alpha = \beta = 1\) we get
\begin{equation*}T(\vec{u} + \vec{v}) \; = \; T(\vec{u}) + T(\vec{v}).\end{equation*}
Similarly, if we leave \(\alpha\) arbitrary but set \(\beta = 0\) we get
\begin{equation*}T(\alpha \vec{u}) = \alpha T(\vec{u}).\end{equation*}
(⇐)
In this part of the proof we will be working in the reverse direction, so we assume that both
\begin{equation*}T(\vec{u} + \vec{v}) \; = \; T(\vec{u}) + T(\vec{v}) \quad \mbox{and} \quad T(\alpha \vec{u}) = \alpha T(\vec{u})\end{equation*} hold.
It's important to realize that the hypotheses we are using above are generic statements. When we write
\(T(\alpha \vec{u}) = \alpha T(\vec{u})\) the scalar \(\alpha\) and the vector \(\vec{u}\) are really beside the point. We are really asserting a general rule about how \(T\) interacts with scaled vectors
— any other scalar times any other vector will work the same way. So for example, that hypothesis will also let us deduce that \begin{equation*}T(\beta \vec{v}) = \beta T(\vec{v}).\end{equation*}
Consider \(T(\alpha \vec{u} + \beta \vec{v})\). Using our first hypothesis (the one that shows how \(T\) distributes over sums) we get
\begin{equation*}T(\alpha \vec{u} + \beta \vec{v}) \; = \; T(\alpha \vec{u}) + T(\beta \vec{v})\end{equation*}.
Using the second hypothesis (twice) we get
\begin{equation*}T(\alpha \vec{u}) + T(\beta \vec{v}) \; = \; \alpha T(\vec{u}) + \beta T(\vec{v}).\end{equation*}
Finally, putting these pieces together we have
\begin{equation*}T(\alpha \vec{u} + \beta \vec{v}) \; = \; \alpha T(\vec{u}) + \beta T(\vec{v})\end{equation*}
which is the desired result.
Definition1.4.9Linear transformations
Given positive integers \(m\) and \(n\), a linear transformation from \(\Reals^m\) to \(\Reals^n\) is a transformation \(T\), that takes vectors of length \(m\) as inputs and returns vectors of length \(n\) and that is linear. We write
\begin{gather*}
T:\Reals^m \longrightarrow \Reals^n\\
T(\vec{x}) \quad = \quad \vec{y},
\end{gather*}
where the components of the vector \(\vec{y}\) will need to be specified in terms of the components of \(\vec{x}\).
There is an interesting connection between our use of the word “linear” in talking about linear transformations
and linear combinations. When a transformation is linear the functions that determine the output's components in terms of the input's components must be linear combinations. And vice versa, if the component functions are linear combinations then the transformation will be linear.
The content of the previous paragraph may not be surprising from a linguistic perspective; they wouldn't use the same word if the underlying concepts were really different, would they? From a mathematical perspective it's a bit less obvious. Indeed this is the sort of thing that mathematicians call a theorem. We'll state this theorem now, but we'll leave the proof to a later chapter.
Theorem1.4.10coefficients of a linear transformation
Given a transformation \(T: \Reals^m \longrightarrow \Reals^n\), \(T\) is linear if and only if,
for all input vectors \(\vec{x}\) the components of \(T(\vec{x})\) can be expressed as particular linear combinations of the components of \(\vec{x}\).
In order to fully specify a linear transformation we need to give values for all of the constants that are used in the linear combinations where the \(y_i\)'s are written in terms of the \(x_i\)'s. For each of the \(n\) components of \(\vec{y}\), we will need \(m\) numbers (as many as there are components in \(\vec{x}\)). In other words we must specify \(mn\) constants.
Definition1.4.11components of a linear transformation
Given \(mn\) real numbers, \(\aij{1}{1}, \ldots \aij{m}{n}\), we say they are the components of a
linear transformation \(T\),
\begin{gather*}
T:\Reals^m \longrightarrow \Reals^n\\
T(\vec{x}) \quad = \quad \vec{y},
\end{gather*}
provided
\begin{gather*}
y_1 = \aij{1}{1} x_1 + \ldots + \aij{1}{m} x_m \\
\vdots \\
y_n = \aij{n}{1} x_1 + \ldots + \aij{n}{m} x_m .
\end{gather*}