Finding primes is a common exercise in programming, and for good reason: it's a good exercise that combines mathematical thinking with programmatic thinking. The exercise comes in various forms, most commonly "find the nth prime" or "find the first n primes". Also common is "find all primes less than n". It turns out that these are all basically the same problem.
For a new programmer, particularly a new programmer not used to doing advanced math, this can be a tricky problem, and it's one of the most common questions I see here at DIC. So in this post, I'll try to give an overview of the mathematical thinking which underlies the problem. There will be no actual code here - that's your job, as the programmer - but I hope that working through the math will make it very obvious how the code needs to work. The math here is pretty much low-level number theory. I assume no knowledge of the theorems of number theory, just some basic arithmetic. I leave two problems as exercises. Depending on where you are with your math, these may be difficult or not so very much so, but they're meant to be solvable, so don't give up if you find they're tricky.
In particular, I'm going to consider the workings of a function that we can call isPrime(n). This function should return true if n is prime, and false otherwise. Here's how I think of it: By definition, a number n is prime if it has no divisors d such that 1 < p < n. We wish to test this by testing all relevant potential divisors d to see if they are actually divisors of n. The simplest way to do this is the exhaustive search, as suggested in this pseudocode:
This turns out to be inefficient: we have to test all numbers from 2 through n-1, and we know that many of them can be rejected from consideration out of hand. For example, if n == 100, we know that no number in the range 50 .. 99 is a possible contender, since 2 * 50 = 100, and there is no integer less than 2. (so no divisor for 100 can exist in the range 50 < p < 100) More than this, we know that when we find a divisor p for n, we have also implicitly found a second divisor q = n/p. This is what it means to be a divisor. And we can pair these up in this way: we can generate a list [(p1, q1), (p2, q2)...] such that all pairs (p, q) where pq = n are represented. If we consider such a list, sorted such that the ps are in ascending order, we find there's a pattern to the qs: namely, they are in descending order.
If you look at this for a little while, you'll see that there's a "multiplicative midpoint" (m) for any number n, such that any pair of divisors (p,q) obey this inequality: p <=m, q >=m. Can you find that midpoint m? If so, you'll have found the limit for your exhaustive search: there is no need to test divisors greater than n, since any such divisor will be a q, and matched with another divisor p which you will have already found.
So all of this implies that your loop can be bounded as
I'll leave it to you to discover what m must be, because it's fun to find things like that and I don't want to spoil the fun.
Another inefficiency in the naive algorithm is that we test a lot of numbers we don't need to test, even in that range 1<p <=m
I claim that this theorem is true:
for any number n, if r divides n, and p divides r, then p divides n.
For example, if 6 divides 12 and 2 divides 6, then 2 divides 12.
This implies that there's a particular set of numbers (P) that we can select our candidate divisors p from, and this will be sufficient to establish primality of n. I'll let you consider what that set P is - the answer is surprisingly obvious!
For a new programmer, particularly a new programmer not used to doing advanced math, this can be a tricky problem, and it's one of the most common questions I see here at DIC. So in this post, I'll try to give an overview of the mathematical thinking which underlies the problem. There will be no actual code here - that's your job, as the programmer - but I hope that working through the math will make it very obvious how the code needs to work. The math here is pretty much low-level number theory. I assume no knowledge of the theorems of number theory, just some basic arithmetic. I leave two problems as exercises. Depending on where you are with your math, these may be difficult or not so very much so, but they're meant to be solvable, so don't give up if you find they're tricky.
In particular, I'm going to consider the workings of a function that we can call isPrime(n). This function should return true if n is prime, and false otherwise. Here's how I think of it: By definition, a number n is prime if it has no divisors d such that 1 < p < n. We wish to test this by testing all relevant potential divisors d to see if they are actually divisors of n. The simplest way to do this is the exhaustive search, as suggested in this pseudocode:
for each p in the range 2..n-1 (inclusive) if n % p == 0, return False, since we have found a divisor of n, namely p, such that 1< d<p. if no p causes us to return False, then return True, since we have tried all potential divisors and none were actually divisors
This turns out to be inefficient: we have to test all numbers from 2 through n-1, and we know that many of them can be rejected from consideration out of hand. For example, if n == 100, we know that no number in the range 50 .. 99 is a possible contender, since 2 * 50 = 100, and there is no integer less than 2. (so no divisor for 100 can exist in the range 50 < p < 100) More than this, we know that when we find a divisor p for n, we have also implicitly found a second divisor q = n/p. This is what it means to be a divisor. And we can pair these up in this way: we can generate a list [(p1, q1), (p2, q2)...] such that all pairs (p, q) where pq = n are represented. If we consider such a list, sorted such that the ps are in ascending order, we find there's a pattern to the qs: namely, they are in descending order.
If you look at this for a little while, you'll see that there's a "multiplicative midpoint" (m) for any number n, such that any pair of divisors (p,q) obey this inequality: p <=m, q >=m. Can you find that midpoint m? If so, you'll have found the limit for your exhaustive search: there is no need to test divisors greater than n, since any such divisor will be a q, and matched with another divisor p which you will have already found.
So all of this implies that your loop can be bounded as
while p <= m:
I'll leave it to you to discover what m must be, because it's fun to find things like that and I don't want to spoil the fun.
Another inefficiency in the naive algorithm is that we test a lot of numbers we don't need to test, even in that range 1<p <=m
I claim that this theorem is true:
for any number n, if r divides n, and p divides r, then p divides n.
For example, if 6 divides 12 and 2 divides 6, then 2 divides 12.
This implies that there's a particular set of numbers (P) that we can select our candidate divisors p from, and this will be sufficient to establish primality of n. I'll let you consider what that set P is - the answer is surprisingly obvious!
9 Comments On This Entry
Page 1 of 1
CTphpnwb
23 October 2014 - 08:50 AM
I'd add that if a number is not prime then it has prime factors. This means that if you've got a list of primes <= m then you only need check to see if any of those primes are factors.
CTphpnwb
23 October 2014 - 09:09 AM
Oh, sorry. I thought you were pointing out the usual first optimization people use: removing even integers other than 2.
DK3250
16 February 2018 - 11:04 AM
Following this thread in the forum: http://www.dreaminco...&#entry2354672; I landed in this blog.
I totally agree with jon.kiparsky and the first replyer that P are primes less than sqrt(prime_candidate). I have advocated this for years and implemented it in my code.
Just today (2 years after last blog entry) I did some test of the time used by Python code for different approaches. To my surprise it is much faster to loop through all uneven numbers in range(3, int(sqrt(candidate))+1, 2) than for prim in primes[1:]: where primes is a list of primes: [2, 3, 5, 7, 11, 13, 17,...]. The time for finding the first 10,000 primes was more than 5 times longer when using the primes list.
I guess this is partly due to the efficiency of the build-in range generator, and partly due the the skip'ed test for prim > sqrt(candidate).
So, even if in theory the use of numbers from P is beneficial over all odd numbers; the efficiency of the code may show otherwise in practice.
I totally agree with jon.kiparsky and the first replyer that P are primes less than sqrt(prime_candidate). I have advocated this for years and implemented it in my code.
Just today (2 years after last blog entry) I did some test of the time used by Python code for different approaches. To my surprise it is much faster to loop through all uneven numbers in range(3, int(sqrt(candidate))+1, 2) than for prim in primes[1:]: where primes is a list of primes: [2, 3, 5, 7, 11, 13, 17,...]. The time for finding the first 10,000 primes was more than 5 times longer when using the primes list.
I guess this is partly due to the efficiency of the build-in range generator, and partly due the the skip'ed test for prim > sqrt(candidate).
So, even if in theory the use of numbers from P is beneficial over all odd numbers; the efficiency of the code may show otherwise in practice.
DK3250
17 February 2018 - 07:31 AM
Ok, I've looked deeper into the execution speed of code variations.
My original base code:
Running this code 100 times took 40 sec.
Just a small change significantly reduced the time: In line 10 I did:
Further modification to:
Running all odd numbers in range (3, int(sqrt(test))+1, 2) was 16 sec.
Conclusion: I have to correct my input from yesterday, best result is obtained using divisors from the P group.
Naturally, many more efficient methods exists (sieves of various kind), but they are outside scope here...
My original base code:
def prime_list(): N = 10000 primes = [2, 3] test = 5 while len(primes) < N: for prim in primes: if test % prim == 0: break if prim ** 2 > test: primes.append(test) break test += 2 print(primes[-1])
Running this code 100 times took 40 sec.
Just a small change significantly reduced the time: In line 10 I did:
if prim * prim > test:and the time fell to 12.5 sec.
Further modification to:
def prime_list_2(): N = 10000 primes = [2, 3] test = 5 while len(primes) < N: sq = int(test**0.5) for prim in primes: if test % prim == 0: break if prim > sq: primes.append(test) break test += 2..gave a little better result:; 12.0 sec.
Running all odd numbers in range (3, int(sqrt(test))+1, 2) was 16 sec.
Conclusion: I have to correct my input from yesterday, best result is obtained using divisors from the P group.
Naturally, many more efficient methods exists (sieves of various kind), but they are outside scope here...
Page 1 of 1
Trackbacks for this entry [ Trackback URL ]
← October 2020 →
S | M | T | W | T | F | S |
---|---|---|---|---|---|---|
1 | 2 | 3 | ||||
4 | 5 | 6 | 7 | 8 | 9 | 10 |
11 | 12 | 13 | 14 | 15 | 16 | 17 |
18 | 19 | 20 | 21 | 22 | 23 | 24 |
25 | 26 | 27 | 28 | 29 | 30 | 31 |
Tags
My Blog Links
Recent Entries
Recent Comments
Search My Blog
1 user(s) viewing
1 Guests
0 member(s)
0 anonymous member(s)
0 member(s)
0 anonymous member(s)