In this section we begin the material from Chapters 3 and 4 of Artin. We have defined the notion of a binary operation on a set X ; this is a function from X × X to X . There is another kind of operation that occurs frequently, where we combine elements from sets X and Y and obtain an element of Y . The classic example, which is the model for our definition below, is the multiplication of a vector by a scalar. In this case, we take a number r Î R and a vector v Î Rn, and we obtain a new vector rv = r · v Î Rn. An operation of this sort is a function X × Y ® Y . Artin calls this an external law of composition. We will call it an operation of X on Y or an action of X on Y , and we will adopt the convention of writing the result of applying the function to (x, y) as either x.y or simply xy. Let R be a ring and let (M, +) be an Abelian group. We say M is a (left) module over R (or an R-module) if there is an operation of R on M satisfying the following conditions for all a, b Î R, m, n Î M . The first condition is an associative law, while the final two conditions are distributive laws.
  1. a.(b.m) = (ab).m;
  2. 1R.m = m;
  3. (a + b).m = a.m + b.m;
  4. a.(m + n) = a.m + a.n.
When R is a field, we use the name vector space instead of module. We refer to the operation of R on M as scalar multiplication. We will give some general examples of modules for rings which are not necessarily fields, but after that we will only consider vector spaces. When R is not a field, the theory of modules is quite a bit more complicated.

Example 7.1.

  1. Let R = Z and let M be an Abelian group. Then there is a unique way to make M into a Z-module. To see this, note that the condition that 1m = m for all m Î M together with distributivity implies nm equals the sum of m with itself n times for any positive integer n. We will shortly see that (-r)m = -(rm) always holds, and so it follows that there is only one way to define nm that might make M into a Z-module. This definition does make M into a Z-module: in fact, the axioms (1)-(4) above are just the laws of exponents for M written in additive form.
  2. Let R be any ring. Then Rn, whether regarded as the set of n × 1 column vectors, the set of 1 × n row vectors, or the set of n-tuples from R, is an R-module with componentwise operations. Thus rx is just the product of the scalar r and the matrix x that we used in Section 1 on matrices. The properties required for M to be an R-module all follow from the corresponding properties in R.
  3. The preceding example is a special case of the following. Given any positive integers m, n, the set Mm×n(R) of all m × n matrices over R is an R-module with the usual definition of scalar multiplication: r.A = rA.
  4. Let R <FONT FACE=Í"class="10-120--12"> S be rings of numbers. Then S is an R-module when we use the multiplication in S, that is, r.s = rs.

Let's record some expected, and easily proven, properties of modules.

Lemma 7.2. Let M be an R-module. Then for all r Î R, m Î M , we have the following.

  1. 0Rm = 0M .
  2. r0M = 0M .
  3. (-1)m = -m.

Proof. Exercise. Hints: to show (1), write 0m = (0 + 0)m = 0m + 0m and conclude 0m = 0. The proof of (2) is similar. Use (1) to prove (3). Q.E.D.

Because M is an Abelian group, we know cancellation holds in M , with respect to +. Cancellation with respect to . doesn't always hold, but we do have the following.

Lemma 7.3. Let F be a field and let V be a vector space over F .

  1. If a Î F, v Î V and av = 0, then a = 0 or v = 0.
  2. If a, b Î F, v Î V and v/= 0, then av = bv implies a = b.
  3. If a Î F, v, w Î V and a/= 0, then av = aw implies v = w.

Proof. (1) If a/= 0 and av = 0, then v = 1v = a-1av = a-10 = 0.

(2) and (3) are corollaries of (1) -- Exercise. Q.E.D.

We will be interested in a much more general kind of cancellation. We say v1, · · · , vn Î V are linearly independent if whenever å i = 1naivi = 0 for a1, . . . , an Î F , we have ai = 0 for each i = 1, . . . , n. We say v1, · · · , vn Î V are linearly dependent if they are not linearly independent, that is, if there exist a1, . . . , an Î F , not all of which are 0, such that a1v1 + · · · + anvn = 0.

Lemma 7.3 says that any single nonzero v Î V is linearly independent. If we consider the ordinary plane R2, we see that two vectors v, w are linearly independent iff they do not lie on the same line through 0 = (0, 0). However, any three vectors in R2 are linearly dependent. The reason for this last claim is that if two vectors v, w in R2 are independent, then any vector u Î R2 can be written as u = av + bw for some a, b Î R, whence av + bw + (-1)u = 0. Thus the condition of linear independence is related to another condition, that of expressing other vectors as a linear combination of the given vectors. Our goal in this section is to explore this link. This will lead us to the notion of a basis and the dimension of a vector space.

Before we give the necessary formal definitions, let us consider the ideas discussed in the last paragraph. When we deal with ordinary n-space Rn, it is usually crucial for us to know we have a set of co-ordinate axes. For example, in three dimensions, we express points, functions, and so on, in terms of the co-ordinates (x, y, z), which in turn are defined in terms of the x, y, z-axes. Every point has a unique set of co-ordinates relative to these axes. If i, j, k are the normal unit vectors, then the point (x, y, z) corresponds to the vector xi + yj + zk.

A basis of a vector space V can be thought of as the same thing -- a set of vectors that define co-ordinate axes, such that every vector can be written as a unique combination of the basis vectors.

If v1, . . . , vn are elements of the vector space V over the field F , a linear combination of v1, . . . , vn is a sum of the form å i=1aivi for some a1, . . . , an Î F . We will be ambiguous here: the term linear combination can refer either to the sum or to the vector that is the result of that sum. Thus we will refer to a1, . . . , an as the coefficients in the linear combination, even though different coefficients could yield the same vector. We leave it to the reader to make our ambiguity clear in any given situation.

A trivial linear combination is one in which every coefficient is 0. We say v1, . . . , vn are linearly independent if the only linear combination yielding 0 is the trivial one. (This is the same as the definition given earlier in this section.) We say v1, . . . , vn span V if every element of V can be written as a linear combination of v1, . . . , vn. We say v1, . . . , vn form a basis of V if every element of V can be written uniquely as a linear combination of v1, . . . , vn. (This means every v = å i = 1naivi for some a1, . . . , an Î F , and the n-tuple (a1, . . . , an) is unique.)

Thus a basis is a set of elements of V that can serve as a set of co-ordinate axes.

Example 7.4.

  1. The quintessential example of a basis is the set of vectors e1 = (1, 0, . . . , 0), . . . , en = (0, 0, . . . , 1) in F n.
  2. More generally, the matrix units Eij, 1 £ i £ m, 1 £ j £ n form a basis for the vector space Mm×n(F ) of m × n matrices over F .
  3. Let F = R and V = C. Then V is a vector space over F , and every element of V can be written uniquely in the form a + bi = a1 + bi for some a, b Î F . This says precisely that 1, i is a basis for V over F . It is not the only basis: there are uncountably many bases. Another example of a basis is 2 + 3i, 5 - 7i.

Lemma 7.5. Let v1, . . . , vn Î V . Then v1, . . . , vn form a basis for V if and only if they are linearly independent and span V .

Proof. Suppose v1, . . . , vn form a basis. Clearly they span V . We always have 0 = å i = 0n0vi, so if å i = 1naivi = 0, then ai = 0 for all i by uniqueness. Thus v1, . . . , vn are linearly independent.

Conversely, if v1, . . . , vn span V , any v Î V can be written v = å i = 1naivi for some scalars ai Î F . If also v = å i = 1nbivi, then 0 = å i = 1n(ai - bi)vi. Hence if v1, . . . , vn are linearly independent, we conclude that ai = bi for each i. This proves v1, . . . , vn form a basis. Q.E.D.

Remark 7.6. Above we have applied the terms linearly independent, span, basis to a group of objects v1, . . . , vn and so we have used plural language ("are", "span", "form"). We frequently think of v1, . . . , vn as the set X = {v1, . . . , vn}, in which case we use the singular: X is linearly independent, X spans V , X is a basis. In what follows we will mix these modes of usage. There is still another way in which we regard a basis. If our goal is to put a co-ordinate system on V , then we will presumably associate the n-tuple (a1, . . . , an) to the vector å i = 1naivi. This implies an ordering. Thus when we wish to use explicit co-ordinates, we need to speak of an ordered basis, which is an n-tuple (v1, . . . , vn). As usual, we generally leave it to the reader to figure out what we are talking about at any given moment!

Once we decide to apply the terms "linearly independent", "span", and "basis" to sets, it becomes natural to allow infinite sets, and hence to modify the definitions slightly. Thus if X is a set, a linear combination of elements of X is a sum (or its result) å i = 1naivi for some finite collection v1, . . . , vn of distinct elements of X and some a1, . . . , an Î F . Linear independence, spanning, and basis are defined solely in terms of such finite linear combinations.

Our goal for the rest of this section is straightforward. We wish to show that every vector space has a basis, and that any two bases have the same number of elements. This common number will be called the dimension of the vector space.

In pursuit of this goal, it is convenient to introduce other notions, which fortuitously are natural and useful in their own right.

If W <FONT FACE=Í"class="10-120--12"> V , we say W is a subspace of V if it is a subgroup under + (i.e., is closed under + and - and contains 0) and it is closed under scalar multiplication, i.e., a Î F, w Î W implies aw Î W .

Lemma 7.7. Let W <FONT FACE=Í"class="10-120--12"> V . Then W is a subspace of V if and only if (a) W /= Ø; (b) If w, w' Î W , then w + w' Î W ; and (c) If a Î F, w Î W , then aw Î W .

Proof. What must be shown is that if w Î W , then -w Î W . We leave this as an exercise. Q.E.D.

If V is the plane R2, then the subspaces of V are precisely {0}, V , and all of the lines through the origin. It is obvious that these are subspaces. It is also obvious that if a subspace contains a vector v, then it contains the line through v and the origin. Thus all that remains is to convince yourself that if a subspace contains two vectors that do not line on the same line through the origin, it contains the whole plane.

In general, a subspace of the n-dimensional vector space Rn is a flat or linear space, containing the origin, of smaller dimension. In fact, let us define X <FONT FACE=Í"class="10-120--12"> Rn to be a linear subset if X contains the entire line through any two points of X . The subspaces of Rn are precisely the linear subsets containing the origin. We will say more about this later.

Let X <FONT FACE=Í"class="10-120--12"> V . The span of X is defined to be the set of all linear combinations of finitely many elements from X . Thus span X = { å i = 1naivi | a1, . . . , an Î F, v1, . . . , vn Î X, n ³ 0 }. In this last equation, we allowed the possibility that n = 0. We will make the convention that asum of 0 terms is 0V . This convention is only necessary to deal with the case X = Ø, and so our convention amounts to defining span Ø = {0}.

Lemma 7.8. Let V be a vector space over F and let X <FONT FACE=Í"class="10-120--12"> V . Then span X is the smallest subspace of V containing X .

Proof. We need to show two things. First, that span X is a subspace of V (plainly X <FONT FACE=Í"class="10-120--12"> span X ) and second, that if W <FONT FACE=Ë"class="10-120--13"> X is a subspace of V , then span X <FONT FACE=Í"class="10-120--12"> W .

We leave both as exercises. Q.E.D.

If W is a subspace of V and X <FONT FACE=Í"class="10-120--12"> V and span X = W , then we say X spans W .

In terms of our vector analogy, the subspace spanned by a set of vectors is the smallest linear space through the origin containing all the vectors.

The following lemma links linear independence and spanning.

Lemma 7.9. Let V be a vector space over F , let X <FONT FACE=Í"class="10-120--12"> V be linearly independent, and let y Î V . Then the following statements are true.

  1. X È {y} is linearly independent iff y /Î span X .
  2. If y Î/ X , then X È {y} is linearly dependent iff y Î span X .

Proof. (1) and (2) are plainly equivalent; we will prove (2).

First, suppose y Î span X . Then we have y = å i = 1naixi for some al, . . . , an Î F , x1, . . . , xn Î X . Thus a1x1 + · · · + anxn + (-1)y = 0 is a nontrivial linear combination from X È {y} that yields 0, so X È {y} is linearly dependent.

Next suppose XÈ{y} is linearly dependent, i.e., suppose that there is a non-trivial linear combination of elements from X È{y} that equals 0. Since X is linearly independent, this combination must involve y in a non-trivial way. That is, there must exist a1, . . . , an, a Î F with a/= 0 such that a1x1 + · · · + anxn + ay = 0. We can solve this equation for y and we find y = å i = 1n - (ai/a)xi Î span X . Q.E.D.

We are now ready to pursue our goal of showing bases exist and the cardinality of a basis of V is uniquely determined by V . We begin with one of the key results.

Theorem 7.10. Let V be a vector space over a field F and let X, Y <FONT FACE=Í"class="10-120--12"> V . If X is linearly independent and Y spans V , then there is a subset Y ' of Y such that X È Y ' is a basis of X .

Proof. Assume Y is finite: we will discuss the case where Y is infinite in an appendix to this section.

Choose a subset Y ' of Y with the following two properties: (1) X È Y ' is linearly independent, and (2) Y ' is the largest subset of Y satisfying condition (1). Note that Ø satisfies condition (1), so there are subsets of Y satisfying (1). Since Y is finite, there is a largest such subset. (It is here where we have to use more advanced techniques if Y is infinite.)

We claim X' = X È Y ' is a basis of V . It is linearly independent by definition, so we must show X' spans V . We will first show that Y <FONT FACE=Í"class="10-120--12"> span X'.

Let y Î Y and suppose y Î/ span X'. Then y /Î X' and by Lemma 7.9, X' È {y} is linearly independent. But if we set Y '' = Y ' È {y}, we have Y ' (/= Y '' and X È Y '' linearly independent. This contradicts our choice of Y '. If follows that Y <FONT FACE=Í"class="10-120--12"> span X'.

Now span X' is a subspace containing Y , so by Lemma 7.8, span X' contains the subspace span Y = V . Thus X' spans V , and we have proven that X' is a basis of V . Q.E.D.

Corollary 7.11. Let V be a vector space. Then any linearly independent subset of V can be expanded to a basis, and any subset that spans V can be contracted to a basis.

Proof. If X is linearly independent, we can take Y = V and apply Theorem 7.10.

If Y spans V , we can take X = Ø and apply Theorem 7.10. Q.E.D.

Remark 7.12. There is one problem with the proof of the preceding corollary, and hence with the proof of the next corollary. In the proof, we used the set V as a spanning set, but V is likely to be infinite. Thus we are forced to confront the "infinite case" we tried to avoid. One way to avoid this problem is to prove all results only for finitely spanned vector spaces, that is, vector spaces which have a finite spanning set. (Such vector spaces areprecisely the finite dimensional vector spaces.) The reader may either make this restriction or read the appendix to this section, where the "infinite" problem is discussed.

Corollary 7.13. Every vector space has a basis.

Proof. Apply Corollary 7.11 either to the linearly independent set Ø or the spanning set V . Q.E.D.

Problem 7.A. Let X be a subset of a vector space V . Show that the following statements are equivalent.

  1. X is a basis of V .
  2. X is a maximal linearly independent subset of V . (That is, X is linearly independent and if X (/= Y <FONT FACE=Í"class="10-120--12"> V , then Y is linearly dependent.)
  3. X is a minimal spanning set in V . (That is, X spans V and if Y (/= X , then Y does not span V .)

Our next goal is to compare the sizes of bases. This requires a lemma, which is related to Theorem 7.10, but is not quite the same.

Lemma 7.14 (Exchange Lemma). Let V be a vector space, let X <FONT FACE=Í"class="10-120--12"> V be a linearly independent set, and let Y <FONT FACE=Í"class="10-120--12"> V span V .

  1. Either X <FONT FACE=Í"class="10-120--12"> Y or there are x Î X \ Y , y Î Y \ X such that if we create sets X' and Y ' by exchanging x and y, that is X' = (X \ {x}) È {y} and Y ' = (Y \ {y}) È {x}, then X' is linearly independent and Y ' spans V .
  2. If X, Y are bases, then either X = Y or we can choose x, y as in (1) such that X', Y ' are bases.

Proof. (1) Suppose X /<FONT FACE=Í"class="10-120--12"> Y : then there exists an x Î X \ Y . Since X is linearly independent, so is X1 = X \ {x}, and moreover, x /Î span X1 by Lemma 7.9. As in the proof of Theorem 7.10, this implies there exists a y Î Y such that y /Î span X1. Again by Lemma 7.9, we see that X' = X 1 È {y} is linearly independent.

The linear independence of X' is all that we will actually need below. However, we claimed we could choose y so that Y ' spans V ; to do this, we must be a little more careful.

We can still take any x Î X \ Y . Since Y spans V , we can write x = å i = 1naiyi for some y1, . . . , yn Î Y and some nonzero a1, . . . , an Î F . Since X is linearly independent, there must be at least one yj such that yj Î/ span X1. Set y = yj for this j, and Y ' = (Y \ {y j}) È {x}

Then X' is linearly independent as above. We can write y = y j = (1/aj)x + å i/=j - (ai/aj)yi, so y Î span Y '. Obviously for any other z Î Y , we have z Î span Y '. It follows (as in the proof of Theorem 7.10) that Y ' spans V .

(2) Exercise. Q.E.D.

Theorem 7.15. Let X, Y be subsets of a vector space V and suppose that X is linearly independent and Y spans V . Then |X| £ |Y |.

Proof. Again we will assume Y is finite, say |Y | = n; we will discuss the infinite case in the appendix to this section.

Suppose the theorem is false and that V, X, Y give us a counterexample. Keeping Y, V fixed and changing X if necessary, we can assume that |X Ç Y | is as large as possible, that is, if X' <FONT FACE=Í"class="10-120--12"> Y is such that V, X', Y give us a counterexample, then |X' ÇY | £ |X ÇY |. (This is possible because all the numbers involved are no greater than n = |Y |.)

If X /<FONT FACE=Í"class="10-120--12"> Y , then by the Exchange Lemma, there are x Î X \ Y , y Î Y \ X such that X' = (X \{x})È{y} is linearly independent. Moreover, X' ÇY = (X ÇY )È{y} is strictly larger than X Ç Y . This contradicts our choice of X .

Thus we must have X <FONT FACE=Í"class="10-120--12"> Y and so we can conclude that |X| £ |Y |. Q.E.D.

Corollary 7.16. Any two bases of a vector space have the same number of elements. That is, if V is a vector space over a field F and X, Y are bases of V , then |X| = |Y |.

Proof. By Theorem 7.15, we have |X| £ |Y | and |Y | £ |X|. Thus |X| = |Y |.

This result can also be proved directly using part (2) of the Exchange Lemma. Q.E.D.

We define the dimension of a vector space V to be the size of any (and hence every) basis of V , and we denote it dim V .

Thus for example, dim F n = n and dim M m×n(F ) = mn. We have dim RC = 2. (If there are different fields in use, we sometimes write dim F V to make clear that V is a vector space over F .)

Corollary 7.17. Let V be a vector space over a field F and let n = dim V be finite.

  1. Any linearly independent subset of V with n elements is a basis.
  2. Any subset of n elements that spans V is a basis.

Proof. (1) Let X be a linearly independent set of n elements. By Corollary 7.11, there is a basis X' <FONT FACE=Ë"class="10-120--13"> X . But |X'| = dim V = n = |X|, so we must have X = X'.

(2) This proof is similar to the proof of (1).

Note that this proof would fail if dim V were infinite, and in that case, the corollary is not true. Q.E.D.

This last result is very useful in deciding whether a given set is a basis. For example, we know F 2 has dimension 2, since it has the standard basis e 1, e2. Thus by Corollary 7.17, two vectors v, w Î F 2 form a basis iff they are linearly independent. It is easy to tell when two vectors are linearly independent. We conclude that v, w Î F 2 form a basis iff neither v nor w is a multiple of the other.

The following result is another very useful application.

Corollary 7.18. Let F be a field and let A Î Mn(F ). Then the following conditions are equivalent.

  1. A is invertible.
  2. The columns of A form a basis for F n.
  3. The columns of A are linearly independent.
  4. The columns of A span F n.
  5. The rows of A form a basis for F n.
  6. The rows of A are linearly independent.
  7. The rows of A span F n.

Proof. Since F n is a vector space of dimension n -- we know the standard basis has n elements -- the equivalence of (2),(3),(4) and the equivalence of (5),(6),(7) follow immediately from Corollary 7.17.

Recall from Section 1 that (1) holds if and only if the equation Ax = b can be solved for any b. If x = ( x1)
   ..
   .
  xn and if Ai is column i of A, then Ax = å i = 1nxiAi. Thus Ax is a linear combination of the columns of A, and so the statement that Ax = b can always be solved is equivalent to the statement that the columns of A span F n. This proves (1) is equivalent to (4).

A similar proof shows (1) is equivalent to (7), and this completes the proof. Q.E.D.

Here is another nice application of bases. In Artin, this result is used instead of the Exchange Lemma to prove Theorem 7.15. We get it as a corollary.

Corollary 7.19. A homogeneous system of m linear equations in n unknowns always has a nonzero solution if n > m. Put in matrix terms, if A is an m × n matrix with n > m, then there is a x Î F n with x/= 0 but Ax = 0.

Proof. As in the proof of Corollary 7.18, the product Ax is a linear combination of the columns of A. There are n of these columns, and they are elements of F m, a vector space of dimension m < n. Thus the set of columns must be linearly dependent, that is, some non-trivial linear combination of them must be 0. This says Ax = 0 for some nonzero x. Q.E.D.

APPENDIX: Infinite-dimensional Vector Spaces

At two points in this section we made the assumption that spanning sets or bases were finite. In this appendix we will briefly discuss the general case.

The first place where the finiteness assumption was used was in the proof of Theorem 7.10. We had a linearly independent set X <FONT FACE=Í"class="10-120--12"> V and a spanning set Y <FONT FACE=Í"class="10-120--12"> V , and we needed the existence of a largest subset Y ' of Y such that X È Y ' remained linearly independent. In the proof what we needed for "largest" was that if Y ' (/= Y '', then X È Y '' is linearly dependent. We usually express this by saying that Y ' <FONT FACE=Í"class="10-120--12"> Y is maximal with respect to the property that X È Y ' is linearly dependent. When Y is finite, we know such maximal sets exist because we can take a subset Y ' satisfying this property that has as many elements as possible. When Y is infinite, however, there will be larger and larger subsets in a never-ending chain.

Instead, we have to appeal to a fundamental principle of "infinite" mathematics, Zorn's Lemma. This lemma asserts the existence of objects without giving any means of constructing them, and so it is viewed with disfavor by some. If one is willing to use it, however, it is extremely powerful. (Indeed, many results cannot be proven without Zorn's Lemma.) We will state it below but not prove it. The proof involves the Axiom of Choice and some form of transfinite induction -- in fact, Zorn's Lemma is equivalent to the Axiom of Choice.

We first state a special version for sets, and then discuss the more general form.

Lemma (Special Zorn's Lemma for Sets). Let S be a non-empty collection of sets. A non-empty subset C of S is said to be a chain if for any A, B Î C, either A <FONT FACE=Í"class="10-120--12"> B or B <FONT FACE=Í"class="10-120--12"> A. Suppose that whenever C is a chain in S, the union ÈCÎCC is an element of S. Then S contains a maximal element.

Proof. Sorry, you'll have to look this one up. Q.E.D.

Let us show that this applies to our situation. We are given X, Y <FONT FACE=Í"class="10-120--12"> V where X is linearly independent and Y spans V . We let S be the collection of all subsets Y ' of V such that X È Y ' is linearly independent. We need to show the hypothesis of Zorn's Lemma applies, and then we will be able to conclude that S contains a maximal element Y ', which is exactly what we want.

Let C be a chain in S, and put Z = ÈCÎCC. Clearly Z <FONT FACE=Í"class="10-120--12"> Y ; we must show X È Z is linearly independent. Suppose not: then there are x1, . . . , xn Î X , y1, . . . , ym Î Z such that some non-trivial linear combination of all these elements is 0. Each yi Î Ci for some Ci Î C <FONT FACE=Í"class="10-120--12"> S. Since C is a chain, there is an index j, 1 £ j £ m, such that Ci <FONT FACE=Í"class="10-120--12"> Cj for all i = 1, . . . , m. The elements x1, . . . , xn, y1, . . . , ym all lie in X È Cj, and since Cj Î S, these elements must be linearly independent. This contradicts the choice of these elements and shows that X È Z must be linearly independent after all. Thus Z Î S, as required.

The only properties of sets that occur in Zorn's Lemma above are their properties relative tothe partial order <FONT FACE=Í"class="10-120--12">. Thus it is reasonable to try to formulate Zorn's Lemma in a more general setting, and it turns out that it is both true and useful.

Let £ be a partial order on a set S. We say C <FONT FACE=Í"class="10-120--12"> S is a chain if C is linearly ordered under £, that is, if for any c, d Î C, either c £ d or d £ c. We say x Î S is an upper bound for C <FONT FACE=Í"class="10-120--12"> S if c £ x for every c Î C. Finally, we say x Î S is a maximal element (or simply maximal) if x £ s for s Î S implies s = x. (Note that we do not require s £ x for all s Î S. We are not assuming S is totally ordered. In particular, S may have many maximal elements -- or it may have none at all.)

If we take for S our collection of sets S and we take for £ the inclusion relation <FONT FACE=Í"class="10-120--12">, then a chain in S relative to <FONT FACE=Í"class="10-120--12"> is precisely a chain in S in the sense we defined above. Moreover, ÈCÎCC is an upper bound for any C <FONT FACE=Í"class="10-120--12"> S. Our hypothesis in the special Zorn's Lemma for sets was that this union is in S for any chain C, and hence every chain in S has an upper bound in S. This is a special case of the general form of Zorn's Lemma.

Lemma (General Zorn's Lemma). Let £ be a partial order on a non-empty set S and suppose that every chain in S has an upper bound. Then S has a maximal element.

Proof. This version is potentially far more powerful than the version we gave for sets, but we can prove the general version using the special one. Let S be the set of all chains in S. If C is a chain in S, then C is a collection of chains in S that are linearly ordered under <FONT FACE=Í"class="10-120--12">. It follows that ÈCÎCC is a chain in S. (Good exercise.) Thus ÈCÎCC Î S.

By the special version of Zorn's Lemma for sets, it follows that S contains a maximal element C, that is, a chain C that cannot be added onto. By our hypothesis for the general Zorn's Lemma, this chain C has an upper bound x. We claim x is a maximal element in S.

If this is false, there is an element y Î S with x < y. But then s < y for every s Î C, so C È {y} is a chain that properly contains C. This is impossible, and so x must be maximal. Q.E.D.

The other place we appealed to finiteness was in our proof of Theorem 7.15. We were given X, Y <FONT FACE=Í"class="10-120--12"> V where X is linearly independent and Y spans V and we wished to show |X| £ |Y |. We showed this under the hypothesis that Y is finite, using the Exchange Lemma, by fixing V, Y and taking a counterexample V, X, Y with X Ç Y maximal.

By Zorn's Lemma (verify!) we can find a maximal X such that V, X, Y is a counterexample, but can we find one with X Ç Y maximal? It is not clear how to order our counterexamples X so that X Ç Y is maximized. For example, it seems quite possible to have counterexamples X, X' with X (/= X' but X Ç Y = X' Ç Y (if there are any counterexamples at all!).

We can instead solve this particular problem by a counting argument. We need two facts about infinite sets. If S is a set, let F(S) denote the set of all finite subsets of S. The first fact we need is that if S is infinite, then |S| = |F(S)|.

Suppose f : S ® T . The second fact we need is that if S is infinite and |S| > |T |, then there exists a t Î T such that ¬-f(t) = { s Î S | f(s) = t } is infinite.

Both of these facts are consequences of the fact that for non-empty sets A, B, if at least one of them is infinite, then |A × B| = max(|A|, |B|). (If À, _]_ are infinite cardinals, then À + _]_ = À · _]_ = max(À, _]_).) This implies, for example, that if À is an infinite cardinal, a countable union of sets of cardinality À has cardinality À.

Assume these two facts and assume Y is infinite. Since Y spans V , for every x Î X , there is a finite subset Z <FONT FACE=Í"class="10-120--12"> Y such that x = å zÎZazz for some scalars az Î F . For each x, pick a particular set Z -- or shrink Y to a basis in which case Z is unique if we assume each az/= 0 -- and define a function f : X ® F(Y ) by letting f(x) be the chosen set Z.

Since Y is infinite, our first fact above tell us |F(Y )| = |Y |. If |X| > |Y |, our second fact tells us there is a finite Z <FONT FACE=Í"class="10-120--12"> Y and an infinite X' <FONT FACE=Í"class="10-120--12"> X with f(x) = Z for all x Î X'. Thus each element of X' lies in the finite dimensional subspace of V spanned by Z. Since X' <FONT FACE=Í"class="10-120--12"> X , we know X' is linearly independent. Thus we have an infinite linearly independent set contained in a vector space of finite dimension. We proved this is impossible in the finite case of Theorem 7.15.