[section] thm]Lemma thm]Corollary thm]Proposition
thm]Example thm]Example thm]Remark thm]Remarks [section] Theorem Lemma Corollary Proposition
Example Example Remark Remarks Problem
In this section we begin the material from Chapters 3 and 4 of Artin.
We have defined the notion of a binary operation on a set X; this is a function from X×X to X. There is another kind of operation that occurs frequently, where we combine elements from sets X and Y and obtain an element of Y. The classic example, which is the model for our definition below, is the multiplication of a vector by a scalar. In this case, we take a number r Î \mathbbR and a vector v Î \mathbbRn, and we obtain a new vector rv = r·v Î \mathbbRn. An operation of this sort is a function X×Y® Y. Artin calls this an external law of composition . We will call it an operation of X on Y or an action of X on Y , and we will adopt the convention of writing the result of applying the function to (x,y) as either x.y or simply xy.
Let R be a ring and let (M,+) be an Abelian group. We say M is a (left) module over R (or an R-module) if there is an operation of R on M satisfying the following conditions for all a,b Î R, m,n Î M. The first condition is an associative law, while the final two conditions are distributive laws.
When R is a field, we use the name vector space instead of module. We refer to the operation of R on M as scalar multiplication .
We will give some general examples of modules for rings which are not necessarily fields, but after that we will only consider vector spaces. When R is not a field, the theory of modules is quite a bit more complicated.
[ 1
Let R be any ring. Then Rn, whether regarded as the
set of n×1 column vectors, the set of 1×n
row vectors, or the set of n-tuples from R, is an
R-module with componentwise operations. Thus rx is
just the product of the scalar r and the matrix x
that we used in Section 1 on matrices. The properties
required for M to be an R-module all follow from the
corresponding properties in R.
The preceding example is a special case of the following.
Given any positive integers m,n, the set Mm×n(R) of all m×n matrices over R is an
R-module with the usual definition of scalar
multiplication: r.A = rA.
Let R Í S be rings of numbers. Then S is an
R-module when we use the multiplication in S, that is,
r.s = rs.
[ 1
Let M be an R-module. Then for all r Î R,m Î M,
we have the following.
Because M is an Abelian group, we know cancellation holds in M, with respect to +. Cancellation with respect to . doesn't always hold, but we do have the following.
[ 2
Let F be a field and let V be a vector space over F.
If a,b Î F,v Î V and v ¹ 0, then av = bv implies
a = b.
If a Î F,v,w Î V and a ¹ 0, then av = aw implies
v = w.
(2) and (3) are corollaries of (1) - Exercise.
We will be interested in a much more general kind of
cancellation. We say v1,¼,vn Î V are
linearly independent
if whenever åi = 1naivi = 0 for a1,...,an Î F, we have ai = 0 for
each i = 1,...,n. We say v1,¼,vn Î V are
linearly dependent
if they are not linearly
independent, that is, if there exist a1,...,an Î F,
not all of which are 0, such that
a1v1+¼+anvn = 0.
Lemma 7 says that any single nonzero v Î V is linearly independent. If we consider the ordinary plane \mathbbR2, we see that two vectors v,w are linearly independent iff they do not lie on the same line through 0 = (0,0). However, any three vectors in \mathbbR2 are linearly dependent. The reason for this last claim is that if two vectors v,w in \mathbbR2 are independent, then any vector u Î \mathbbR2 can be written as u = av+bw for some a,b Î \mathbbR, whence av+bw+(-1)u = 0. Thus the condition of linear independence is related to another condition, that of expressing other vectors as a linear combination of the given vectors. Our goal in this section is to explore this link. This will lead us to the notion of a basis and the dimension of a vector space.
Before we give the necessary formal definitions, let us
consider the ideas discussed in the last paragraph. When
we deal with ordinary n-space \mathbbRn, it is usually
crucial for us to know we have a set of co-ordinate axes.
For example, in three dimensions, we express points,
functions, and so on, in terms of the co-ordinates
(x,y,z), which in turn are defined in terms of the
x,y,z-axes. Every point has a unique set of co-ordinates
relative to these axes. If i,j,k are the normal
unit vectors, then the point (x,y,z) corresponds to the
vector xi+yj+zk.
A basis of a vector space V can be thought of as the same thing - a set of vectors that define co-ordinate axes, such that every vector can be written as a unique combination of the basis vectors.
If v1,...,vn are elements of the vector space V
over the field F, a linear combination
of
v1,...,vn is a sum of the form åi = 1 aivi
for some a1,...,an Î F. We will be ambiguous here:
the term linear combination can refer either to the sum or
to the vector that is the result of that sum. Thus we
will refer to a1,...,an as the coefficients
in the linear combination, even though different
coefficients could yield the same vector. We leave it to
the reader to make our ambiguity clear in any given
situation.
A trivial linear combination is one in which every coefficient is 0. We say v1,...,vn are linearly independent if the only linear combination yielding 0 is the trivial one. (This is the same as the definition given earlier in this section.) We say v1,...,vn span V if every element of V can be written as a linear combination of v1,...,vn. We say v1,...,vn form a basis of V if every element of V can be written uniquely as a linear combination of v1,...,vn. (This means every v = åi = 1naivi for some a1,...,an Î F, and the n-tuple (a1,...,an) is unique.)
Thus a basis is a set of elements of V that can serve as a set of co-ordinate axes.
[ 2
More generally, the matrix units Eij, 1 £ i £ m,
1 £ j £ n form a basis for the vector space
Mm×n(F) of m×n matrices over F.
Let F = \mathbbR and V = \mathbbC. Then V is a vector space
over F, and every element of V can be written uniquely
in the form a+bi = a1+bi for some a,b Î F. This says
precisely that 1,i is a basis for V over F. It is
not the only basis: there are uncountably many bases.
Another example of a basis is 2+3i,5-7i.
Suppose v1,...,vn form a basis. Clearly they span V. We always have 0 = åi = 0n 0vi, so if åi = 1n aivi = 0, then ai = 0 for all i by uniqueness. Thus v1,...,vn are linearly independent.
Conversely, if v1,...,vn span V, any v Î V can be written v = åi = 1n aivi for some scalars ai Î F. If also v = åi = 1n bivi, then 0 = åi = 1n (ai-bi)vi. Hence if v1,...,vn are linearly independent, we conclude that ai = bi for each i. This proves v1,...,vn form a basis.
[ 1
Above we have applied the terms linearly
independent, span, basis to a group of objects
v1,...,vn and so we have used plural language
(``are'', ``span'', ``form''). We frequently think of
v1,...,vn as the set X = {v1,...,vn}, in which
case we use the singular: X is linearly independent, X
spans V, X is a basis. In what follows we will mix
these modes of usage.
There is still another way in which we regard a basis. If
our goal is to put a co-ordinate system on V, then we
will presumably associate the n-tuple (a1,...,an)
to the vector åi = 1n aivi. This implies an
ordering. Thus when we wish to use explicit co-ordinates,
we need to speak of an ordered basis
, which is an
n-tuple (v1,...,vn). As usual, we generally leave
it to the reader to figure out what we are talking about
at any given moment!
Once we decide to apply the terms ``linearly independent'',
``span'', and ``basis'' to sets, it becomes natural to
allow infinite sets, and hence to modify the definitions
slightly. Thus if X is a set, a linear
combination
of elements of X is a sum (or its result)
åi = 1n aivi for some finite collection
v1,...,vn of distinct elements of X and some
a1,...,an Î F. Linear independence, spanning, and
basis are defined solely in terms of such finite linear
combinations.
Our goal for the rest of this section is straightforward.
We wish to show that every vector space has a basis, and
that any two bases have the same number of elements. This
common number will be called the dimension of the
vector space.
In pursuit of this goal, it is convenient to introduce other notions, which fortuitously are natural and useful in their own right.
If W Í V, we say W is a subspace of V if it is a subgroup under + (i.e., is closed under + and - and contains 0) and it is closed under scalar multiplication, i.e., a Î F,w Î W implies aw Î W.
[ 4 Let W Í V. Then W is a subspace of V if and only if (a) W ¹ Æ; (b) If w,w¢ Î W, then w+w¢ Î W; and (c) If a Î F,w Î W, then aw Î W.
What must be shown is that if w Î W, then -w Î W. We leave this as an exercise.
If V is the plane \mathbbR2, then the subspaces of V are precisely {0}, V, and all of the lines through the origin. It is obvious that these are subspaces. It is also obvious that if a subspace contains a vector v, then it contains the line through v and the origin. Thus all that remains is to convince yourself that if a subspace contains two vectors that do not line on the same line through the origin, it contains the whole plane.
In general, a subspace of the n-dimensional vector space \mathbbRn is a flat or linear space, containing the origin, of smaller dimension. In fact, let us define X Í \mathbbRn to be a linear subset if X contains the entire line through any two points of X. The subspaces of \mathbbRn are precisely the linear subsets containing the origin. We will say more about this later.
Let X Í V. The span of X
is defined to be
the set of all linear combinations of finitely many
elements from X. Thus
|
[ 5 Let V be a vector space over F and let X Í V. Then \operatornamespanX is the smallest subspace of V containing X.
We need to show two things. First, that \operatornamespanX is a subspace of V (plainly X Í \operatornamespanX) and second, that if W Ê X is a subspace of V, then \operatornamespanX Í W.
We leave both as exercises.
If W is a subspace of V and X Í V and \operatornamespanX = W, then we say X spans W.
In terms of our vector analogy, the subspace spanned by a set of vectors is the smallest linear space through the origin containing all the vectors.
The following lemma links linear independence and spanning.
[ 6
Let V be a vector space over F, let X Í V be
linearly independent, and let y Î V. Then the
following statements are true.
First, suppose y Î \operatornamespanX. Then we have y = åi = 1naixi for some al,...,an Î F, x1,...,xn Î X. Thus a1x1+¼+anxn+(-1)y = 0 is a nontrivial linear combination from XÈ{y} that yields 0, so XÈ{y} is linearly dependent.
Next suppose XÈ{y} is linearly dependent, i.e., suppose that there is a non-trivial linear combination of elements from XÈ{y} that equals 0. Since X is linearly independent, this combination must involve y in a non-trivial way. That is, there must exist a1,...,an,a Î F with a ¹ 0 such that a1x1+¼+anxn+ay = 0. We can solve this equation for y and we find y = åi = 1n -(ai/a)xi Î \operatornamespanX.
We are now ready to pursue our goal of showing bases exist
and the cardinality of a basis of V is uniquely
determined by V. We begin with one of the key results.
Theorem 1 Let V be a vector space over a field F and let X,Y Í V. If X is linearly independent and Y spans V, then there is a subset Y¢ of Y such that XÈY¢ is a basis of X.
Assume Y is finite: we will discuss the case where Y is infinite in an appendix to this section.
Choose a subset Y¢ of Y with the following two properties: (1) XÈY¢ is linearly independent, and (2) Y¢ is the largest subset of Y satisfying condition (1). Note that Æ satisfies condition (1), so there are subsets of Y satisfying (1). Since Y is finite, there is a largest such subset. (It is here where we have to use more advanced techniques if Y is infinite.)
We claim X¢ = XÈY¢ is a basis of V. It is linearly independent by definition, so we must show X¢ spans V. We will first show that Y Í \operatornamespanX¢.
Let y Î Y and suppose y Ï \operatornamespanX¢. Then y Ï X¢ and by Lemma 7, X¢È{y} is linearly independent. But if we set Y¢¢ = Y¢È{y}, we have Y¢\subsetneq Y¢¢ and XÈY¢¢ linearly independent. This contradicts our choice of Y¢. If follows that Y Í \operatornamespanX¢.
Now \operatornamespanX¢ is a subspace containing Y, so by Lemma 7, \operatornamespanX¢ contains the subspace \operatornamespanY = V. Thus X¢ spans V, and we have proven that X¢ is a basis of V.
[ 1 Let V be a vector space. Then any linearly independent subset of V can be expanded to a basis, and any subset that spans V can be contracted to a basis.
If X is linearly independent, we can take Y = V and apply Theorem 7.
If Y spans V, we can take X = Æ and apply Theorem 7.
[ 2 There is one problem with the proof of the preceding corollary, and hence with the proof of the next corollary. In the proof, we used the set V as a spanning set, but V is likely to be infinite. Thus we are forced to confront the ``infinite case'' we tried to avoid. One way to avoid this problem is to prove all results only for finitely spanned vector spaces, that is, vector spaces which have a finite spanning set. (Such vector spaces are precisely the finite dimensional vector spaces .) The reader may either make this restriction or read the appendix to this section, where the ``infinite'' problem is discussed.
[ 2 Every vector space has a basis.
Apply Corollary 7 either to the linearly independent set Æ or the spanning set V.
Problem 1
Let X be a subset of a vector space V. Show that the
following statements are equivalent.
X is a maximal linearly independent subset of V. (That
is, X is linearly independent and if X\subsetneq Y Í V, then Y is linearly dependent.)
X is a minimal spanning set in V. (That is, X spans
V and if Y\subsetneq X, then Y does not span V.)
Our next goal is to compare the sizes of bases. This requires a lemma, which is related to Theorem 7, but is not quite the same.
[ 7 [Exchange Lemma]
Let V be a vector space, let X Í V be a linearly
independent set, and let Y Í V span V.
If X,Y are bases, then either X = Y or we can choose
x,y as in (1) such that X¢,Y¢ are bases.
The linear independence of X¢ is all that we will actually need below. However, we claimed we could choose y so that Y¢ spans V; to do this, we must be a little more careful.
We can still take any x Î X\Y. Since Y spans V, we can write x = åi = 1n aiyi for some y1,...,yn Î Y and some nonzero a1,...,an Î F. Since X is linearly independent, there must be at least one yj such that yj Ï \operatornamespanX1. Set y = yj for this j, and Y¢ = (Y\{yj})È{x}
Then X¢ is linearly independent as above. We can write y = yj = (1/aj)x+åi ¹ j -(ai/aj)yi, so y Î \operatornamespanY¢. Obviously for any other z Î Y, we have z Î \operatornamespanY¢. It follows (as in the proof of Theorem 7) that Y¢ spans V.
(2) Exercise.
Theorem 2 Let X,Y be subsets of a vector space V and suppose that X is linearly independent and Y spans V. Then |X| £ |Y|.
Again we will assume Y is finite, say |Y| = n; we will discuss the infinite case in the appendix to this section.
Suppose the theorem is false and that V,X,Y give us a counterexample. Keeping Y,V fixed and changing X if necessary, we can assume that |XÇY| is as large as possible, that is, if X¢ Í Y is such that V,X¢,Y give us a counterexample, then |X¢ÇY| £ |XÇY|. (This is possible because all the numbers involved are no greater than n = |Y|.)
If X\not Í Y, then by the Exchange Lemma, there are x Î X\Y, y Î Y\X such that X¢ = (X\{x})È{y} is linearly independent. Moreover, X¢ÇY = (XÇY)È{y} is strictly larger than XÇY. This contradicts our choice of X.
Thus we must have X Í Y and so we can conclude that |X| £ |Y|.
[ 3 Any two bases of a vector space have the same number of elements. That is, if V is a vector space over a field F and X,Y are bases of V, then |X| = |Y|.
By Theorem 7, we have |X| £ |Y| and |Y| £ |X|. Thus |X| = |Y|.
This result can also be proved directly using part (2) of the Exchange Lemma.
We define the dimension
of a vector space V to
be the size of any (and hence every) basis of V, and we
denote it dimV.
Thus for example, dimFn = n and dimMm×n(F) = mn. We have dim\mathbbR \mathbbC = 2. (If there are different fields in use, we sometimes write dimF V to make clear that V is a vector space over F.)
[ 4
Let V be a vector space over a field F and let n = dimV be finite.
Any subset of n elements that spans V is a basis.
(2) This proof is similar to the proof of (1).
Note that this proof would fail if dimV were infinite, and in that case, the corollary is not true.
This last result is very useful in deciding whether a given set is a basis. For example, we know F2 has dimension 2, since it has the standard basis e1,e2. Thus by Corollary 7, two vectors v,w Î F2 form a basis iff they are linearly independent. It is easy to tell when two vectors are linearly independent. We conclude that v,w Î F2 form a basis iff neither v nor w is a multiple of the other.
The following result is another very useful application.
[ 5
Let F be a field and let A Î Mn(F). Then the
following conditions are equivalent.
Recall from Section 1 that (1) holds if and only if the equation Ax = b can be solved for any b. If x = x1:xn and if Ai is column i of A, then Ax = åi = 1nxiAi. Thus Ax is a linear combination of the columns of A, and so the statement that Ax = b can always be solved is equivalent to the statement that the columns of A span Fn. This proves (1) is equivalent to (4).
A similar proof shows (1) is equivalent to (7), and this completes the proof.
Here is another nice application of bases. In Artin, this result is used instead of the Exchange Lemma to prove Theorem . We get it as a corollary.
[ 6
A homogeneous system of m linear equations in n
unknowns always has a nonzero solution if n > m.
Put in matrix terms, if A is an m×n matrix with
n > m, then there is a x Î Fn with x ¹ 0
but Ax = 0.
As in the proof of Corollary 7, the product Ax is a linear combination of the columns of A. There are n of these columns, and they are elements of Fm, a vector space of dimension m < n. Thus the set of columns must be linearly dependent, that is, some non-trivial linear combination of them must be 0. This says Ax = 0 for some nonzero x.
APPENDIX: Infinite-dimensional Vector Spaces
At two points in this section we made the assumption that
spanning sets or bases were finite. In this appendix we
will briefly discuss the general case.
The first place where the finiteness assumption was used was in the proof of Theorem . We had a linearly independent set X Í V and a spanning set Y Í V, and we needed the existence of a largest subset Y¢ of Y such that XÈY¢ remained linearly independent. In the proof what we needed for ``largest'' was that if Y¢\subsetneq Y¢¢, then XÈY¢¢ is linearly dependent. We usually express this by saying that Y¢ Í Y is maximal with respect to the property that XÈY¢ is linearly dependent. When Y is finite, we know such maximal sets exist because we can take a subset Y¢ satisfying this property that has as many elements as possible. When Y is infinite, however, there will be larger and larger subsets in a never-ending chain.
Instead, we have to appeal to a fundamental principle of ``infinite'' mathematics, Zorn's Lemma. This lemma asserts the existence of objects without giving any means of constructing them, and so it is viewed with disfavor by some. If one is willing to use it, however, it is extremely powerful. (Indeed, many results cannot be proven without Zorn's Lemma.) We will state it below but not prove it. The proof involves the Axiom of Choice and some form of transfinite induction - in fact, Zorn's Lemma is equivalent to the Axiom of Choice.
We first state a special version for sets, and then discuss the more general form.
[Special Zorn's Lemma for Sets] Let S be a non-empty collection of sets. A non-empty subset Ç of S is said to be a chain if for any A,B Î Ç, either A Í B or B Í A. Suppose that whenever Ç is a chain in S, the union ÈC Î Ç C is an element of S. Then S contains a maximal element.
Sorry, you'll have to look this one up.
Let us show that this applies to our situation. We are given X,Y Í V where X is linearly independent and Y spans V. We let S be the collection of all subsets Y¢ of V such that XÈY¢ is linearly independent. We need to show the hypothesis of Zorn's Lemma applies, and then we will be able to conclude that S contains a maximal element Y¢, which is exactly what we want.
Let Ç be a chain in S, and put Z = ÈC Î ÇC. Clearly Z Í Y; we must show XÈZ is linearly independent. Suppose not: then there are x1,...,xn Î X, y1,...,ym Î Z such that some non-trivial linear combination of all these elements is 0. Each yi Î Ci for some Ci Î Ç Í S. Since Ç is a chain, there is an index j, 1 £ j £ m, such that Ci Í Cj for all i = 1,...,m. The elements x1,...,xn,y1,...,ym all lie in XÈCj, and since Cj Î S, these elements must be linearly independent. This contradicts the choice of these elements and shows that XÈZ must be linearly independent after all. Thus Z Î S, as required.
The only properties of sets that occur in Zorn's Lemma
above are their properties relative to the partial order
Í . Thus it is reasonable to try to formulate Zorn's
Lemma in a more general setting, and it turns out that it
is both true and useful.
Let £ be a partial order on a set S. We say C Í S is a chain if C is linearly ordered under £ , that is, if for any c,d Î C, either c £ d or d £ c. We say x Î S is an upper bound for C Í S if c £ x for every c Î C. Finally, we say x Î S is a maximal element (or simply maximal) if x £ s for s Î S implies s = x. (Note that we do not require s £ x for all s Î S. We are not assuming S is totally ordered. In particular, S may have many maximal elements - or it may have none at all.)
If we take for S our collection of sets S and we take for £ the inclusion relation Í , then a chain in S relative to Í is precisely a chain in S in the sense we defined above. Moreover, ÈC Î ÇC is an upper bound for any Ç Í S. Our hypothesis in the special Zorn's Lemma for sets was that this union is in S for any chain Ç, and hence every chain in S has an upper bound in S. This is a special case of the general form of Zorn's Lemma.
[General Zorn's Lemma] Let £ be a partial order on a non-empty set S and suppose that every chain in S has an upper bound. Then S has a maximal element.
This version is potentially far more powerful than the version we gave for sets, but we can prove the general version using the special one. Let S be the set of all chains in S. If Ç is a chain in S, then Ç is a collection of chains in S that are linearly ordered under Í . It follows that ÈC Î Ç C is a chain in S. (Good exercise.) Thus ÈC Î Ç C Î S.
By the special version of Zorn's Lemma for sets, it follows that S contains a maximal element C, that is, a chain C that cannot be added onto. By our hypothesis for the general Zorn's Lemma, this chain C has an upper bound x. We claim x is a maximal element in S.
If this is false, there is an element y Î S with x < y. But then s < y for every s Î C, so CÈ{y} is a chain that properly contains C. This is impossible, and so x must be maximal.
The other place we appealed to finiteness was in our proof
of Theorem 7. We
were given X,Y Í V where X is linearly independent
and Y spans V and we wished to show |X| £ |Y|. We
showed this under the hypothesis that Y is finite, using
the Exchange Lemma, by fixing V,Y and taking a
counterexample V,X,Y with XÇY maximal.
By Zorn's Lemma (verify!) we can find a maximal X such that V,X,Y is a counterexample, but can we find one with XÇY maximal? It is not clear how to order our counterexamples X so that XÇY is maximized. For example, it seems quite possible to have counterexamples X,X¢ with X\subsetneq X¢ but XÇY = X¢ÇY (if there are any counterexamples at all!).
We can instead solve this particular problem by a counting argument. We need two facts about infinite sets. If S is a set, let F(S) denote the set of all finite subsets of S. The first fact we need is that if S is infinite, then |S| = |F(S)|.
Suppose f:S® T. The second fact we need is that if S is infinite and |S| > |T|, then there exists a t Î T such that f(t) = { s Î S\mid f(s) = t } is infinite.
Both of these facts are consequences of the fact that for non-empty sets A,B, if at least one of them is infinite, then |A×B| = max(|A|,|B|). (If À,\beth are infinite cardinals, then À+\beth = À·\beth = max(À,\beth).) This implies, for example, that if À is an infinite cardinal, a countable union of sets of cardinality À has cardinality À.
Assume these two facts and assume Y is infinite. Since Y spans V, for every x Î X, there is a finite subset Z Í Y such that x = åz Î Zazz for some scalars az Î F. For each x, pick a particular set Z - or shrink Y to a basis in which case Z is unique if we assume each az ¹ 0 - and define a function f:X®F(Y) by letting f(x) be the chosen set Z.
Since Y is infinite, our first fact above tell us |F(Y)| = |Y|. If |X| > |Y|, our second fact tells us there is a finite Z Í Y and an infinite X¢ Í X with f(x) = Z for all x Î X¢. Thus each element of X¢ lies in the finite dimensional subspace of V spanned by Z. Since X¢ Í X, we know X¢ is linearly independent. Thus we have an infinite linearly independent set contained in a vector space of finite dimension. We proved this is impossible in the finite case of Theorem 7.