PRIMES AND POWERS Hendrik Lenstra MUSA lecture, March 16, 1993, 4-5 pm. ------------------------------------------------------- The lecture deals with Fermat's little theorem---theory and applications. This belongs to number theory, and lots of numbers will appear on the blackboard. ------------------------------------------------------- 1. Fermat's little theorem. Fermat (1601-1665). He found this theorem in 1640. However, any kid can find it, by proceeding in the following way. How many sequences of five black-or-white dots are there? Count: 00000: 1 10000 + cyclic shifts: 5 11000 + cyclic shifts: 5 10100 + cyclic shifts: 5 00111 + cyclic shifts: 5 01011 + cyclic shifts: 5 01111 + cyclic shifts: 5 11111: 1 ------------------------- + Total: 32 Hey, that's 2^5. Ah, of course! Two choices for each position, that gives 2^5 in all. Still funny: 2 `special' ones, and the others come in groups of 5. In other words: 2^5 = 2 + (multiple of 5). NOTATION (Gauss, 1777-1855): a = b mod c MEANS: a - b = some multiple of c. Pronounce: `a is congruent to b modulo c'. (Equivalently: a and b leave the same remainder when divided by c.) So Fermat: 2^5 = 2 mod 5. (Sure enough, 32 - 2 = 30 = 6*5.) Can we generalize this? Well, let's first work on the 5. The proof looks pretty general, so you might expect that 2^6 = 2 mod 6. But no: 2^6 = 64, and 64 - 2 = 62 is NOT divisible by 6. So what goes wrong in the same proof? Well, you get groups that do not exist of 6 elements; for example, 100100 gives only THREE cyclic shifts, 101010 gives only TWO! So you have a problem when the sequence has a `period'. That period must be a divisor of the length of your sequence. This won't happen when that length is a prime number, like 5, but otherwise you may have a problem. And indeed: 2^3 = 2 mod 3, 2^7 = 2 mod 7, and even 2^2 = 2 mod 2. In general, 2^p = 2 mod p when p is prime. That deals with generalizing `5'. What about `2'? Suppose that instead of TWO symbols we had THREE; then the number of strings would be 3^p, of which 3 would be `constant', and the rest consists of groups of p each. So for the same reason 3^p = 3 mod p when p is prime. The argument is perfectly general. One gets: a^p = a mod p when p is prime, for ANY integer a; even NEGATIVE a are OK!! (What does that mean, sequences of pebbles that are STOLEN?)---Something to think about. This is Fermat's theorem. It's often called his "little" theorem, since he has another one, Fermat's "last" theorem, which is SO big that nobody has been able to prove it. Historical note. Fermat found this in 1640, so not really as a kid playing with pebbles. Sometimes one finds the statement that the Chinese knew 2^p = 2 mod p well before Christ; but if one traces this assertion down one finds that it is probably based on a misunderstanding [Needham, p. 54 note d]. The Chinese discovered Fermat's little theorem indeed independently from the West, but not v \ / until 1872 (Li Shanlan). Example. Suppose we want to check for some LARGE p whether 3^p = 3 mod p. For example, we may not feel too confident of the validity of the proof. Also, as we'll see, there are good reasons to want to be able to do this, even when p has (say) a HUNDRED digits. Then we can never calculate 2^p, since 2^(10^99) is TOO LARGE to write down, even in a computer memory---there not enough atoms in the universe. However, we can still do it. Take for example p = 43 (a baby example, but anyway). 43 = 101011 (in binary), 3^1 = 3 mod 43 3^2 = 9 mod 43 3^4 = 9^2 = 81 = -5 mod 43 (use negative numbers to keep numbers small!) 3^5 = 3*3^4 = 3*(-5) = -15 mod 43 3^10 = (3^5)^2 = (-15)^2 = 225 = 10 mod 43 (since 43 divides 215) 3^20 = (3^10)^2 = 10^2 = 100 = 14 mod 43 3^21 = 3*(3^20) = 3*14 = 42 = -1 mod 43 3^42 = (-1)^2 = 1 mod 43 3^43 = 3 mod 43 So, that is consistent with the fact that 43 is prime. What is important about this example is: even when p is very large, you can calculate a^p mod p using ONLY numbers < p^2, and doing a number of multiplications and divisions by p that is no more than twice the number of binary digits of p. Computers can deal with HUGE examples. 2. Primality testing One of the applications of Fermat's little theorem is to recognize prime numbers. Why? (a) it is fun, and nice mathematics, (c) it is useful, as we'll see. Suppose I have a large number n, of hundreds or even thousands of digits, and I want to know whether n is prime. What can I do? One way to proceed is to test whether 2^n = 2 mod n. This can by the method above really be done very quickly! If the answer is NO, then n is not prime. You can then ask yourself how you get a factor of n, but unfortunately there is no known proof of Fermat's little theorem that readily gives a factor when the congruence does not hold. In fact, the problem of factoring composite (= non-prime) numbers is MUCH harder than the problem of telling whether n is prime, which is the problem I`m now discussing. Now what if one finds that indeed 2^n = 2 mod n? Can one conclude that n is prime? Unfortunately NOT: n = 341 satisfies the condition, and 341 = 11*31. Such a number is called a "pseudoprime", and there are infinitely many of them. Just a year ago it was proved that this is EVEN the case if we don't fix the 2: there are infinitely many NON-prime numbers n such that for ALL integers a one has a^n = a mod n. Examples: n = 561 (= 3*11*17), 1105 (= 5*13*17), 1729 (= 7*13*19), ... . [Alford, Granville, Pomerance]. Such numbers are called Carmichael numbers; Carmichael was an American mathemati- cian who was active early this century. To exclude these bad numbers, people have changed the test a little. Factor a^n - a, assuming n odd; first 3 factors, next t+2 with t = ord_2(n-1). Find: if n is an odd prime then for each a at least one of these t+2 factors is divisible by n. And if n is a composite number, then for MOST a NONE of the factors is divisible by n. So if you try several a, and they all work, you may be convinced that n is prime; since if it isn't, how could you have chosen so many unlucky a's? This is not a proof, but it is a method to produce industrial-strength primes. You can sell them as such with a lifetime guarantee---if found non-prime, the customer gets another number, which is prime in the same sense of the word. 3. Euler (1707-1783) 2 = 2^5 = 2^9 = ... mod 5 a^(1 mod p-1) = a mod p a^13 = a mod (2.3.5.7.13 = 2730). Examples: 2^13 - 2 = 8190 = 3*2730, 3^13 - 3 = 1594320 = 584*2730. Another example: 561 = 3*11*17, 561 = 1 mod 2, 1 mod 10, 1 mod 16, and THEREFORE it is a Carmichael number. Explain. Generally, if n = p*q*r*... is a product of distinct primes (for simplicity) then a^(1 mod (p-1)(q-1)(r-1)...) = a mod p*q*r... . (Instead of the product (p-1)(q-1)(r-1)... we can take their lcm.) The number (p-1)(q-1)(r-1)... is called the "Euler function" of n = pqr..., notation phi(n). So a^(1 + phi(n)) = a mod n (a squarefree). This is Euler's theorem; he has a generalization to non-squarefree n, which I do not want to discuss now. 4. Rivest-Shamir-Adleman, 1977: these guys applied all this stuff to "public-key" cryptography. Suppose I want to enable someone to send secret messages to me, but we can only communicate via public channels. Then I can do the following. (a) Pick two large prime numbers p and q---to be safe, about 100 digits. Keep p and q SECRET! But publish n = p*q. (b) Pick an "encryption exponent" e, an integer coprime to (p-1)(q-1). Also e is public. (c) Instruct people to send messages as follows. First, using A = 01, B = 02, ..., Z = 26 convert your message into a sequence of numbers a_i between 0 and n; and instead of sending a_i tell him to send you (a_i^e mod n). These are PUBLIC instructions! Anybody can follow them! And exponentiation modulo n is easy. Now nobody can read this message, because nobody knows how to take the e-th root (mod n). However, YOU can do this, since with the SECRET information p, q the e-th root can be taken!! How and why? Example. p = 31, q = 47, n = 1457, e = 1001. (These are again baby numbers.) Suppose a = 100. Then one finds a^e = 1353 (mod 1457). How do you recover 100 from 1353 using p and q? Answer: (d) Find the "decryption exponent" d: a positive integer with d*e = 1 mod (p-1)(q-1); then modulo n one has a = a^(de) = (a^e)^d, so the e-th root is the same as the d-th power!! To find d, one proceeds as follows. Back to the example. Be sure to do this computation in Total Secrecy! (p-1)(q-1) = 30*46 = 1380. We need to solve x*1001 = 1 mod 1380. 0*1001 = 1380 mod 1380 1*1001 = 1001 mod 1380 subtract from previous -1*1001 = 379 mod 1380 subtract 3* from previous 4*1001 = -136 mod 1380 add 3* to previous 11*1001 = -29 mod 1380 subtract 5* from previous -51*1001 = 9 mod 1380 add 3* to previous -142*1001 = -2 mod 1380 add 4* to previous -619*1001 = 1 mod 1380 add 1380*1001 = 0 761*1001 = 1 mod 1380. So d = 761 does it. This method is perfectly general and very fast. It is due to Euclid, 250 BC. How safe is this method? And which information is really to be kept secret? Clearly, p and/or q are secret. Also (p-1)(q-1) is secret, since that is all we need. However, anybody who knows (p-1)(q-1) knows p and q too, since p and q can be solved from pq = n, (p-1)(q-1) = known: two equations in two unknowns. So knowing p, q is equivalent to knowing [n and] (p-1)(q-1)---"there is really no other way to compute (p-1)(q-1) than through p and q". Is it generally true that the prime factorization of a number n can be recovered from knowing n and phi(n)? YES! What about d? Clearly I want to keep it secret, so that nobody can decode my messages; but if someone finds out about d, is n still safe? Can I still use the same n with a different pair e', d'? NO! Since (a^e)^d = a mod n kills it. Both follow from a general result: if someone knows a number m>1 such that a^m = a mod n for all a, then that information can be used to factor n completely. [Apply this to m = 1 + phi(n) or to m = ed.] Namely, m will be odd (put a = -1), and for all a the number n will divide u*2^i a^m - a = a*(a^u-1)\prod_{i=0}^{t-1}(a + 1), but, for many a, none of the factors. That will factor n. 5 Example. n = 1457, m = ed - 1 = 761*1001 - 1 = 761760 = 2 *23805. Pick a=2. 2^23805 = 1 mod n bad luck 3^23805 = 1270 mod n 3^(2*23805) = 1270^2 = 1 mod n (= 1457). gcd(1271, 1457) = gcd(1271,186) = gcd(31,186) (7*186=1302) = 31. Home! So giving away phi(n) factors n, giving away a single encryption- decryption pair does the same. Is there another way to break the code? Nobody knows. Perhaps one of YOU will invent a way of factoring n and of making the whole scheme vulnerable. ----------------------------------------------------------------------