Lecture 14 Summary
« Previous: Lecture 13 Summary | Next: Lecture 15 Summary » |
Discussed how to use the power method to get multiple eigenvalues/vectors of Hermitian matrices by "deflation" (using orthogonality of eigenvectors). Discussed how, in principle, QR factorization of An for large n will give the eigenvectors and eigenvalues in descending order of magnitude, but how this is killed by roundoff errors.
Unshifted QR method: proved that repeatedly forming A=QR, then replacing A with RQ (as in pset 3) is equivalent to QR factorizing An. But since we do this while only multiplying repeatedly by unitary matrices, it is well conditioned and we get the eigenvalues accurately.
To make the QR method faster, we first reduce to Hessenberg form; you will show in pset 4 that this is especially fast when A is Hermitian and the Hessenberg form is tridiagonal. Second, we use shifts.
In particular, the worst case for the QR method is when all of the eigenvalues are nearly equal, so that |λ1/λm| is nearly 1. In this case, our previous analysis shows that a large number of iterations may be required to discriminate between the eigenvectors. However, if we instead do QR on A-μI where μ is approximately λm, then |(λ1-μ)/(λm-μ)| will be large and the algorithm will converge quickly. μ, in turn, can be estimated from the Rayleigh quotient of the last column of the current An matrix, since that column should be approximately the eigenvector qm; as the algorithm progresses, this will approach λm and the algorithm will converge more and more quickly. This insight leads to the shifted QR algorithm.
There are a number of additional tricks to further improve things, the most important of which is probably the Wilkinson shift: estimating μ from a little 2×2 problem from the last two columns to avoid problems in cases e.g. where there are two equal and opposite eigenvalues. Some of these tricks (e.g. the Wilkinson shift) are described in the book, and some are only in specialized publications. If you want the eigenvectors as well as eigenvalues, it turns out to be more efficient to use a more recent "divide and conquer" algorithm, summarized in the book, but where the details are especially tricky and important. However, at this point I don't want to cover more gory details in 18.335. Although it is good to know the general structure of modern algorithms, especially the fact that they look nothing like the characteristic-polynomial algorithm you learn as an undergraduate, as a practical matter you are always just going to call LAPACK if the problem is small enough to solve directly. Matters are different for much larger problems, where the algorithms are not so bulletproof and one might need to get into the guts of the algorithms; this will lead us into the next topic of the course, iterative algorithms for large systems, in subsequent lectures.