PRIMES AND POWERS
Hendrik Lenstra
MUSA lecture, March 16, 1993, 4-5 pm.

-------------------------------------------------------
The lecture deals with Fermat's little theorem---theory
and applications. This belongs to number theory, and
lots of numbers will appear on the blackboard.
-------------------------------------------------------

1. Fermat's little theorem.

Fermat (1601-1665). He found this theorem in 1640. However,
any kid can find it, by proceeding in the following way.
How many sequences of five black-or-white dots are there?
Count:

	00000:			1
	10000 + cyclic shifts:	5
	11000 + cyclic shifts:	5
	10100 + cyclic shifts:	5
	00111 + cyclic shifts:	5
	01011 + cyclic shifts:	5
	01111 + cyclic shifts:	5
	11111:			1
	------------------------- +
	Total:		       32

Hey, that's  2^5.  Ah, of course! Two choices for each position,
that gives  2^5  in all.
	Still funny:  2 `special' ones, and the others come in
groups of  5.  In other words:  2^5 = 2 + (multiple of 5).

NOTATION (Gauss, 1777-1855): a = b mod c  MEANS:  a - b = some
multiple of c.  Pronounce: `a is congruent to b modulo c'.
(Equivalently:  a  and  b  leave the same remainder when divided
by  c.)

So Fermat:  2^5 = 2 mod 5.  (Sure enough,  32 - 2 = 30 = 6*5.)

	Can we generalize this? Well, let's first work on the  5.
The proof looks pretty general, so you might expect that  2^6 = 2
mod 6.  But no:  2^6 = 64,  and  64 - 2 = 62  is NOT divisible by
6.  So what goes wrong in the same proof?
	Well, you get groups that do not exist of  6  elements;
for example,
	100100  gives only THREE cyclic shifts,
	101010  gives only TWO!
So you have a problem when the sequence has a `period'. That
period must be a divisor of the length of your sequence. This
won't happen when that length is a prime number, like  5,  but
otherwise you may have a problem.
	And indeed:
		2^3 = 2 mod 3,
		2^7 = 2 mod 7,
and even  2^2 = 2 mod 2.  In general,  2^p = 2 mod p  when  p
is prime.
	That deals with generalizing  `5'.  What about `2'?
Suppose that instead of TWO symbols we had THREE; then the
number of strings would be  3^p,  of which  3  would be `constant',
and the rest consists of groups of  p  each. So for the same
reason
	3^p = 3 mod p	when  p  is prime.
The argument is perfectly general. One gets:
	a^p = a mod p	when  p  is prime,
for ANY integer  a;  even NEGATIVE  a  are OK!!  (What does that
mean, sequences of pebbles that are STOLEN?)---Something to
think about. This is Fermat's theorem. It's often called his
"little" theorem, since he has another one, Fermat's "last"
theorem, which is SO big that nobody has been able to prove it.

Historical note. Fermat found this in 1640, so not really as a kid
playing with pebbles. Sometimes one finds the statement that the
Chinese knew  2^p = 2 mod p  well before Christ; but if one traces
this assertion down one finds that it is probably based on a
misunderstanding [Needham, p. 54 note d]. The Chinese discovered
Fermat's little theorem indeed independently from the West, but not
	     v   \  /
until 1872 (Li Shanlan).

Example.
Suppose we want to check for some LARGE  p  whether  3^p = 3 mod p.
For example, we may not feel too confident of the validity of the
proof. Also, as we'll see, there are good reasons to want to
be able to do this, even when  p  has (say) a HUNDRED digits. Then
we can never calculate  2^p,  since  2^(10^99)  is TOO LARGE to
write down, even in a computer memory---there not enough atoms in
the universe. However, we can still do it. Take for example  p = 43
(a baby example, but anyway).
	43 = 101011 (in binary),
	3^1 = 3 mod 43
	3^2 = 9 mod 43
	3^4 = 9^2 = 81 = -5 mod 43
		(use negative numbers to keep numbers small!)
	3^5 = 3*3^4 = 3*(-5) = -15 mod 43
	3^10 = (3^5)^2 = (-15)^2 = 225 = 10 mod 43
		(since  43  divides 215)
	3^20 = (3^10)^2 = 10^2 = 100 = 14 mod 43
	3^21 = 3*(3^20) = 3*14 = 42 = -1 mod 43
	3^42 = (-1)^2 = 1 mod 43
	3^43 = 3 mod 43
So, that is consistent with the fact that  43  is prime. What is
important about this example is: even when  p  is very large, you
can calculate  a^p mod p  using ONLY numbers < p^2,  and doing a
number of multiplications and divisions by  p  that is no more
than twice the number of binary digits of  p.  Computers can deal
with HUGE examples.

2. Primality testing
One of the applications of Fermat's little theorem is to recognize
prime numbers. Why? (a) it is fun, and nice mathematics, (c) it is
useful, as we'll see.
	Suppose I have a large number  n,  of hundreds or even
thousands of digits, and I want to know whether  n  is prime. What
can I do?
	One way to proceed is to test whether  2^n = 2 mod n.  This
can by the method above really be done very quickly! If the answer
is NO, then  n  is not prime. You can then ask yourself how you get
a factor of  n,  but unfortunately there is no known proof of Fermat's
little theorem that readily gives a factor when the congruence does
not hold. In fact, the problem of factoring composite (= non-prime)
numbers is MUCH harder than the problem of telling whether  n  is
prime, which is the problem I`m now discussing.
	Now what if one finds that indeed  2^n = 2 mod n?  Can one
conclude that  n  is prime?
	Unfortunately NOT:  n = 341  satisfies the condition, and
341 = 11*31.  Such a number is called a "pseudoprime", and there are
infinitely many of them. Just a year ago it was proved that this is
EVEN the case if we don't fix the  2:  there are infinitely many
NON-prime numbers  n  such that for ALL integers  a  one has
a^n = a mod n.  Examples:  n = 561 (= 3*11*17), 1105 (= 5*13*17),
1729 (= 7*13*19), ... . [Alford, Granville, Pomerance]. Such numbers
are called Carmichael numbers; Carmichael was an American mathemati-
cian who was active early this century.
	To exclude these bad numbers, people have changed the test a
little. Factor  a^n - a,  assuming  n  odd; first  3  factors, next
t+2  with  t = ord_2(n-1).  Find: if  n  is an odd prime then for each
a  at least one of these  t+2  factors is divisible by  n.  And if
n  is a composite number, then for MOST  a  NONE of the factors is
divisible by  n.  So if you try several  a,  and they all work, you
may be convinced that  n  is prime; since if it isn't, how could you
have chosen so many unlucky  a's?  This is not a proof, but it is a
method to produce industrial-strength primes. You can sell them as
such with a lifetime guarantee---if found non-prime, the customer
gets another number, which is prime in the same sense of the word.

3. Euler (1707-1783)
	2 = 2^5 = 2^9 = ... mod 5

	a^(1 mod p-1) = a mod p

	a^13 = a mod (2.3.5.7.13 = 2730).

Examples: 2^13 - 2 = 8190 = 3*2730,
	 3^13 - 3 = 1594320 = 584*2730.

Another example:
	561 = 3*11*17,  561 = 1 mod 2, 1 mod 10, 1 mod 16,
and THEREFORE it is a Carmichael number. Explain.

Generally, if  n = p*q*r*...  is a product of distinct primes (for
simplicity) then

	a^(1 mod (p-1)(q-1)(r-1)...) = a mod p*q*r... .

(Instead of the product  (p-1)(q-1)(r-1)...  we can take their lcm.)
The number  (p-1)(q-1)(r-1)...  is called the "Euler function" of
n = pqr..., notation  phi(n).  So

	a^(1 + phi(n)) = a mod n

(a  squarefree). This is Euler's theorem; he has a generalization
to non-squarefree  n,  which I do not want to discuss now.

4. Rivest-Shamir-Adleman, 1977: these guys applied all this stuff
to "public-key" cryptography.
	Suppose I want to enable someone to send secret messages
to me, but we can only communicate via public channels. Then I
can do the following.

(a) Pick two large prime numbers  p  and  q---to be safe, about
100  digits. Keep  p  and  q  SECRET! But publish  n = p*q.
(b) Pick an "encryption exponent"  e,  an integer coprime to
(p-1)(q-1).  Also  e  is public.
(c) Instruct people to send messages as follows. First, using
A = 01, B = 02, ..., Z = 26  convert your message into a
sequence of numbers  a_i  between  0  and  n;  and instead of
sending  a_i  tell him to send you  (a_i^e mod n). These are
PUBLIC instructions! Anybody can follow them! And exponentiation
modulo n  is easy.

Now nobody can read this message, because nobody knows how to
take the  e-th root (mod n). However, YOU can do this, since
with the SECRET information  p, q  the  e-th root can be taken!!
	How and why?
	Example.  p = 31, q = 47,  n = 1457,  e = 1001.  (These
are again baby numbers.) Suppose  a = 100.  Then one finds
a^e = 1353 (mod 1457).
	How do you recover  100  from  1353  using  p  and  q?
Answer:
(d) Find the "decryption exponent"  d:  a positive integer with
d*e = 1 mod (p-1)(q-1);  then modulo  n  one has
	a = a^(de) = (a^e)^d,
so the  e-th root is the same as the  d-th power!!
	To find  d,  one proceeds as follows. Back to the example.
Be sure to do this computation in Total Secrecy!

	(p-1)(q-1) = 30*46 = 1380.
	We need to solve  x*1001 = 1 mod 1380.

	   0*1001 = 1380 mod 1380
	   1*1001 = 1001 mod 1380	subtract from previous
	  -1*1001 =  379 mod 1380	subtract 3* from previous
	   4*1001 = -136 mod 1380	add 3* to previous
	  11*1001 =  -29 mod 1380	subtract 5* from previous
	 -51*1001 =    9 mod 1380	add 3* to previous
	-142*1001 =   -2 mod 1380	add 4* to previous
	-619*1001 =    1 mod 1380	add  1380*1001 = 0
	 761*1001 =    1 mod 1380.

So  d = 761  does it. This method is perfectly general and very
fast. It is due to Euclid, 250 BC.

How safe is this method? And which information is really to be kept
secret?
	Clearly,  p  and/or  q  are secret. Also  (p-1)(q-1)  is
secret, since that is all we need. However, anybody who knows
(p-1)(q-1)  knows  p  and  q  too, since  p  and  q  can be
solved from  pq = n,  (p-1)(q-1) = known:  two equations in two
unknowns. So knowing  p, q  is equivalent to knowing  [n  and]
(p-1)(q-1)---"there is really no other way to compute  (p-1)(q-1)
than through  p  and  q".
	Is it generally true that the prime factorization of a
number  n  can be recovered from knowing  n  and  phi(n)?  YES!
	What about  d?  Clearly I want to keep it secret, so that
nobody can decode my messages; but if someone finds out about  d,
is  n  still safe? Can I still use the same  n  with a different
pair  e', d'? NO! Since  (a^e)^d = a mod n  kills it.

Both follow from a general result: if someone knows a number  m>1
such that  a^m = a mod n  for all  a,  then that information can be
used to factor  n  completely.
[Apply this to  m = 1 + phi(n)  or to  m = ed.]
Namely,  m  will be odd (put  a = -1), and for all  a  the number
n  will divide
					      u*2^i
	a^m - a = a*(a^u-1)\prod_{i=0}^{t-1}(a     + 1),

but, for many  a,  none of the factors. That will factor  n.

							   5
Example.  n = 1457,  m = ed - 1 = 761*1001 - 1 = 761760 = 2 *23805.
Pick  a=2.
	2^23805 = 1 mod n	bad luck
	3^23805 = 1270 mod n
	3^(2*23805) = 1270^2 = 1 mod n (= 1457).
	gcd(1271, 1457) = gcd(1271,186) = gcd(31,186) (7*186=1302)
		= 31.  Home!

So giving away  phi(n)  factors  n,  giving away a single encryption-
decryption pair does the same. Is there another way to break the
code? Nobody knows. Perhaps one of YOU will invent a way of factoring
n  and of making the whole scheme vulnerable.
----------------------------------------------------------------------