Groups of Matrices (Also, Extended Euclidean Algorithm, Row Equivalence, Determinants, Inverses)

11. Groups of Matrices

Motivation

The focus now shifts to group ${GL(n, R)}$, the general linear group of ${n\times n}$ invertible matrices over a number system (field) ${R}$. In particular, ${GL(2,\mathbb{Z})}$ is crucial to the theory of modular forms to be introduced later.

To get a multiplicative group from a set of matrices, the rules of matrix multiplication argue that only square matrices will do. The identity matrix becomes the neutral element, written ${I}$ for identity. ${I}$ was introduced in the prior section, shown by example to contain ${1's}$ on the principal diagonal and ${0's}$ elsewhere, where ${0}$ and ${1}$ are the additive and multiplicative identities in the field ${R}$. To complete the group, a multiplicative inverse of each matrix must be provided. As was shown in the prior chapter, matrix ${A}$ has an inverse when there is a matrix ${A^{-1}}$ such that ${AA^{-1}=A^{-1}A=I}$. ${A^{-1}}$ is called the inverse of ${A}$.

A square matrix ${A}$ is invertible just when its determinant, det${A\in R}$, has a multiplicative inverse in ${R}$, which it will when it is non-zero, which it is when the rows of ${A}$ are linearly independent. Letting ${d=\rm{det}A}$, ${d}$ has an inverse when there is ${c\in R : cd=1}$.

Consider as an example the group of ${2\times 2}$ invertible matrices ${GL(2,\mathbb{F}_2)}$, of which there are just six:

$\displaystyle \begin{bmatrix}1&0\\{0}&1\end{bmatrix}\ \ \begin{bmatrix}1&1\\{0}&1\end{bmatrix}\ \ \begin{bmatrix}1&0\\1&1\end{bmatrix}\ \ \begin{bmatrix}0&1\\1&1\end{bmatrix}\ \ \begin{bmatrix}0&1\\1&0\end{bmatrix}\ \ \begin{bmatrix}1&1\\1&0\end{bmatrix}$

That each of these six have inverses is verified using the formula for the inverse of a ${2\times 2}$ matrix

$\displaystyle A=\begin{bmatrix}a&b\\c&d\end{bmatrix},\ \ \ A^{-1}=\begin{bmatrix}\frac{d}{ad-bc}&\frac{-b}{ad-bc}\\\frac{-c}{ad-bc}&\frac{a}{ad-bc}\end{bmatrix}\ \ {\rm where\ det} A = ad-bc$

In the general case, the set of invertible ${n\times n}$ matrices form a group. Matrix multiplication is associative, but not commutative. The proof that ${A(BC)=(AB)C}$ is somewhat messy, requiring one to consider individual columns of the matrices. The other group properties are straightforward. To show closure under multiplication, it is sufficient to show that ${(AB)^{-1}=B^{-1}A^{-1}}$:

$\displaystyle ((AB)B^{-1})A^{-1})=(A(BB^{-1})A^{-1}=(AI)A^{-1}=AA^{-1}=I$

Similarly, ${(B^{-1}A^{-1})(AB)=I}$.

The group is called the general linear group because a general determinant is allowed; the determinant is not required to be ${1}$.

The group ${GL(2,\mathbb{Z})}$ consists of ${2\times 2}$ matrices whose determinants are either ${1}$ or ${-1}$ (the only elements in ${\mathbb{Z}}$ that have multiplicative inverses in ${\mathbb{Z}}$). Since now matrix ${A}$ has det${A=ad-bc=1}$, ${A^{-1}=\begin{bmatrix}d&-b\\-c&a\end{bmatrix}}$. To construct matrices in ${GL(2,\mathbb{Z})}$, pick the top row ${{a,b}}$ relatively prime. Then the Euclidean algorithm (see supplement) can provide choices of ${{c, d}}$ for matrices of ${GL(2,\mathbb{Z})}$ having top row ${{a,b}}$. A heuristic approach constructs two lists, one of multiples of ${a}$ and the other of multiples of ${b}$, and then picks pairs, one from each row, which differ by ${1}$. Their multipliers then become ${c}$ and ${d}$. This approach further demonstrates the existence of an infinite number of such linear combinations, and hence, matrices.

11.1. Supplement: The Extended Euclidean Algorithm (GCD as a Linear Combination)

The Euclidean algorithm is a method for calculating the greatest common divisor of two integers, written ${(a,b)}$, by successive divisions. Observing that ${a=qb+r}$ for integer quotient ${q}$ and remainder ${r}$, the algorithm steps to successively reduce the remainders ${r_{i-1}}$ modulo ${r_{i}}$, producing ${r_{i+1}}$, until ${0}$ is obtained. Then the last non-zero remainder is ${(a,b)}$. Picking the larger integer as ${a}$:

$\displaystyle a=q_0b+r_0$

$\displaystyle b=q_1r_0+r_1$

$\displaystyle r_0=q_2r_1+r_2$

$\displaystyle r_1=q_3r_2+r_3$

$\displaystyle \vdots$

${(a,b)}$ can also be represented as a linear combination of terms ${ax+by}$ for some integers ${x, y}$. There are set-theoretic and induction proofs of this statement, but the entire generality is not needed here. The linear combination is not unique; there are infinitely many ${x,y}$ combinations satisfying the equality. The extended Euclidean algorithm is used to determine values of ${x}$ and ${y}$.

In the particular case of ${a}$ and ${b}$ coprime, another way of saying ${(a,b)=1}$, there are values of ${x}$ and ${y}$ such that ${ax-by=1}$ per the prior paragraph. This is equivalent to the linear congruence ${ax\equiv 1\pmod b}$, which can be solved for ${x}$ since ${a}$ and ${b}$ are coprime. Note that ${x}$ is ${a^{-1}\pmod b}$, so the following is also an existence statement and algorithm for ${a^{-1}}$. To demonstrate the existence of such an ${ax}$ congruence, consider the residues ${\pmod b}$ of the sequence of ${b-1}$ multiples of ${a: {a, 2a, \cdots, (b-1)a}}$. Observe that if ${x}$ goes through a complete set of residues ${\pmod b}$, then so does ${ax}$, because ${ax_1\equiv ax_2\pmod b}$ implies, when ${a}$ and ${b}$ are coprime, that ${x_1\equiv x_2\pmod b}$, a contradiction. Therefore, there is ${x}$ for which ${xa \equiv 1 \pmod b}$. In other words, ${ax = by + 1}$, for some ${y}$.

The case at hand, finding values for elements of ${GL(2,\mathbb{Z})}$ involves picking coprime ${a}$ and ${b}$ for the first row, and then solving ${ax_1-bx_2=1}$ for the values of ${x_i}$, which will become the 2nd row values of the matrix. The extended Euclidean algorithm computes these ${x_i}$ coefficients stepwise during each step of the algorithm. There is an algebraic formula for each step as above, but more feeling for the process can be derived from applying the algorithm to the example matrix: beginning with ${a=6, b=11}$, find coefficients ${x_i}$ such that ${6x_1-11x_2=1}$. The first two columns below show the Euclidean algorithm applied to finding ${(6,11)}$. The second two columns show the auxiliary calculations provided by the extended algorithm to compute the coefficients of ${a}$ and ${b}$. These columns start with a unit vector used to accumulate changes that will result in an integer pair, the two needed coefficients. Each column is labeled with the name of the variable involved:

$\displaystyle \begin{array}{llll}b&a&b&a\\ \scriptstyle 11&\scriptstyle 6&\scriptstyle (1,0)&\scriptstyle (0,1)\\ \scriptstyle 11-1*6=5&\scriptstyle 6&\scriptstyle (1,0)-1*(0,1)=(1,-1)&\scriptstyle (0,1)\\ \scriptstyle 5&\scriptstyle 6-1*5=1&\scriptstyle (1,-1)&\scriptstyle (0,1)-1*(1,-1)=(-1,2)\\ \scriptstyle 5-5*1=0&\scriptstyle 1&\scriptstyle (1,-1)-5*(-1,2)=(6, -11)&\scriptstyle (-1,2) \end{array}$

Thus, ${1=-1*11+2*6}$ and a matrix in ${GL(2,\mathbb{Z})}$ with 6, 11 in row 1 is:

$\displaystyle \begin{bmatrix}6&11\\1&2\end{bmatrix}$

11.2. Supplement: Row Equivalence, Determinants, Inverses

Elementary row operations are defined for matrices: row transposition, row multiplication by a scalar, and addition of a row to any other. Two matrices are called row-equivalent if one can be derived from the other by some succession of elementary row operations, as in the Gaussian elimination method. A row operation can be captured as a transformation of the identity matrix, ${I\rightarrow E}$, where E is called an elementary matrix. A sequence of such transformations can be represented by a product of such elementary matrices: ${E_sE_{s-1}\cdots E_1}$. For example, there are only five elementary ${2\times 2}$ matrices, representing three types of transformation: shear, 1-dimensional elongation/compression, and reflection. Thus in the example, any one-to-one, homogeneous linear transformation of the plane can be represented as some product of these three types of elementary transformation.

Let ${M_n}$ be the set of all ${n\times n}$ matrices over number system ${R}$ (most generally a commutative ring with identity, here specialized to a field at no immediate loss of generality). Consider any function ${D: M_n\rightarrow R}$ which satisfies the following properties, where ${A\in M_n}$:

${n}$-linear Condition: ${D(A)}$ is linear on each row (or column), when other rows are held unchanged. Write ${D(A)=D(\alpha_1,\cdots,\alpha_n)}$, where ${\alpha_i}$ is the ${i}$th row of ${A}$. Further, since only the ${i}$th row is being operated on, the others held fixed, the abbreviated form ${D(\alpha_i)}$ is a convenience. ${D(A)}$ is ${n}$-linear means ${D(c\alpha_i+\alpha_i^{\prime})=cD(\alpha_i)+D(\alpha_i^{\prime})}$. ${n}$-linearity implies ${D(A)}$ is unchanged by linear combinations of rows. Letting ${\alpha_i^{\prime}=\alpha_j}$, another row of ${A}$, then ${D(\alpha_i+c\alpha_j)=D(\alpha_i)+cD(\alpha_j)}$. But since the row ${\alpha_j}$ appears in both the ${i}$th and ${j}$th positions in the rightmost term, that term is ${0}$, and hence ${D(\alpha_i+c\alpha_j)=D(\alpha_i)}$. Thus, the elimination method of elementary row operations, described above for solving a system of linear equations, does not change the determinant of its matrix.

Alternating Condition: ${D(A)=0}$ if two adjacent columns are equal. Equivalently*, the sign of ${D(A)}$ changes when two adjacent columns (or rows) are exchanged. (*The two statements are logically mutually consequential only over fields where ${1+1\ne 0}$).

Faithful Condition: ${D(I)=1}$, where ${I}$ is the ${n\times n}$ identity matrix and ${1}$ is the unit in ${R}$.

The object of interest is the image of such ${D(A)}$, called the determinant of ${A}$, det${A}$. Any function ${D(A)}$ satisfying the above three conditions also satisfies ${D(A)=}$ (det${A)D(I)}$, so det${A}$ is uniquely fixed by these conditions. ${D(A)}$ and det${A}$ are used interchangeably below.

There are existence proofs for ${D}$. There are algebraic formulae for det${A}$, historically due to Leibniz and Laplace. There are also geometric interpretations of det${A}$. In 2-space, consider ${A}$ as the ${2\times 2}$ matrix representation of a linear transformation of the unit square into a parallelogram, where the matrix expresses the vertices. Then ${| \hbox{\rm{det}} A|}$ is the area of the parallelogram, where the absolute value is due to the sign depending on the order in which the vertices are recorded in the matrix. The area of a triangle is expressed similarly, as 1/2 of a parallelogram. In higher dimension spaces, a similar situation holds. For example, in 3-space, consider ${A}$ as a ${3\times 3}$ matrix representing a linear transformation of the unit cube into a parallelopiped. Then ${| \hbox{\rm{det}} A|}$ is the volume of the transformed solid. Also of geometric interest, when a row or column of a matrix is multiplied by a scalar ${s}$, then its determinant is multiplied by ${s^n}$, as one would expect for a volume measure in n-space.

${D}$ is called an alternating function because it satisfies both parts of the Alternating Condition. Related to this, the definition of ${D}$ involves permutations of degree ${n}$. Consider this version of the Leibniz formula:

$\displaystyle D(A) = \sum_{\sigma \in S_n} \hbox{\rm{sgn}}(\sigma) \prod_{i=1}^n A_{i,\sigma(i)}$

${D}$ is seen to have ${n!}$ sums each with ${n}$ products. Recall ${S_n}$ is the symmetric group, the ${n!}$ permutations of the integers from ${1}$ to ${n}$. ${\sigma}$ is one element of this group. Recall also that ${\sigma}$ is even or odd depending on whether an even or odd number of transpositions is involved; ${\hbox{\rm{sgn}}(\sigma)}$ returns ${(-1)^m}$, where ${m}$ is the number of transpositions in ${\sigma}$. ${A_{i,\sigma(i)}}$ represents one of set of ${n}$ matrix entries with permuted subscripts whose product is formed. For example, when ${n = 4}$ and ${\sigma = (1, 4, 3, 2)}$, then ${\hbox{\rm{sgn}}(\sigma) = -1}$ (one transposition) and the product of the matrix entries is ${A_{11}A_{24}A_{33}A_{42}}$. The alternating nature of D is revealed by the alternating sign function, as subscripts (rows) get transposed. Note in the formula that some authors use det${P}$ rather then ${\hbox{\rm{sgn}}(\sigma)}$, where ${P}$ is the ${n\times n}$ identity matrix permuted by the current ${\sigma}$. This is another way to specify the sign of each summand.

Given the number of computations required to calculate the determinant of a sizable matrix by the brute force of Leibniz’s formula, it is best to finesse the situation if possible. Three such finesses follow.

By Gaussian elimination, a square invertible matrix ${A}$ can be placed in upper-triangular form, ${U}$, where ${U}$ has ${0}$‘s below the diagonal. The diagonal entries are called the ${n}$ pivots ${p_i}$ of ${U}$ (and ${A}$), and det${A=(-1)^mp_1p_2\cdots p_n}$, the product of the pivots, where ${m}$ is the number of row transpositions needed to produce ${U}$ from ${A}$.

If an ${n\times n}$ matrix ${A}$ has a block of ${0}$‘s isolated in the lower left (for instance, but the usual place), such that it has a block representation:

$\displaystyle A=\begin{bmatrix}B&C\\{0}&D\end{bmatrix}$

where ${B}$ is an ${r\times r}$ matrix, ${D}$ is an ${s\times s}$ matrix, ${C}$ is an ${r\times s}$ matrix, and ${0}$ denotes the ${(n-s)\times r}$ zero matrix, then det${A=(}$det${B)(}$det${D)}$.

Every entry in ${A}$, say ${\alpha_{ij}}$, has a cofactor, ${(-1)^{i+j}}$ times the determinant of the ${(n-1)\times (n-1)}$ matrix that is ${A}$ minus the row and column occupied by ${\alpha_{ij}}$. The dot product of any row or column with its ${n}$ cofactors is equal to det${A}$.

Determinants arise naturally in two common operations: solving a system of linear equations and computing the inverse of a matrix. Discussion of determinants in these contexts provides additional insight into the nature of such operations. But both these types of operation are most efficiently computed using Gaussian elimination rather than determinant evaluation.

Cramer’s rule is a method of using determinants to solve the non-homogeneous equation ${A\mathbf{x}=\mathbf{b}}$. Let ${B_j}$ be the matrix obtained by replacing the ${j}$th column of ${A}$ by ${\mathbf{b}}$. Then the solution vector is specified by

$\displaystyle x_i=\frac{\hbox{\rm{det}}B_i}{\hbox{\rm{det}}A}$

The inverse of a matrix is expressible in terms of its matrix of cofactors and its determinant. If ${C}$ is the cofactor matrix for ${A}$, then

$\displaystyle A^{-1}=\frac{C^T}{\hbox{\rm{det}}A}$

Gauss-Jordan elimination is more efficient than the above formula for finding ${A^{-1}}$. Gauss-Jordan is an extended version of Gaussian elimination, which applies the machinery of elimination to the ${n\times 2n}$ block matrix ${\begin{bmatrix}A&&I\end{bmatrix}}$ (analogously to the way the extended Euclidean algorithm maintains parallel computations). Operating on ${A}$ and ${I}$ simultaneously takes advantage of the property: Gaussian elimination operations used to reduce ${A\rightarrow I}$, when applied to ${I}$ in the same sequence, give ${A^{-1}}$. The reasoning is that ${A}$ is row-equivalent to ${I}$, so that ${I=E_sE_{s-1}\cdots E_1A}$. Multiplying by ${A^{-1}}$, ${E_sE_{s-1}\cdots E_1I=A^{-1}}$, the stated property. Gauss-Jordan keeps reducing and eliminating until the ${n\times 2n}$ block matrix ${\begin{bmatrix}I&&A^{-1}\end{bmatrix}}$ results.

Summarizing some attributes of linear independence, the rows of a square ${n\times n}$ matrix ${A}$ over ${R}$ are linearly independent if:

The only solution to ${A\mathbf{x}=\mathbf{0}}$ is ${\mathbf{0}}$.
The rows form a basis for ${row(A)=V^n(R)}$.
${A=E_sE_{s-1}\cdots E_1}$ for some set of elementary matrices.
${A}$ is row-equivalent to ${I}$.
The rank of ${A}$ is ${n}$.
${A^{-1}}$ exists.
${\hbox{\rm{det}}A\ne 0}$; det${A}$ has an inverse in ${R}$; ${A}$ is non-singular.