Subscribe to ...considered harmful        RSS Feed
***** 1 Votes

How to think about finding primes

Icon 9 Comments
Finding primes is a common exercise in programming, and for good reason: it's a good exercise that combines mathematical thinking with programmatic thinking. The exercise comes in various forms, most commonly "find the nth prime" or "find the first n primes". Also common is "find all primes less than n". It turns out that these are all basically the same problem.

For a new programmer, particularly a new programmer not used to doing advanced math, this can be a tricky problem, and it's one of the most common questions I see here at DIC. So in this post, I'll try to give an overview of the mathematical thinking which underlies the problem. There will be no actual code here - that's your job, as the programmer - but I hope that working through the math will make it very obvious how the code needs to work. The math here is pretty much low-level number theory. I assume no knowledge of the theorems of number theory, just some basic arithmetic. I leave two problems as exercises. Depending on where you are with your math, these may be difficult or not so very much so, but they're meant to be solvable, so don't give up if you find they're tricky.

In particular, I'm going to consider the workings of a function that we can call isPrime(n). This function should return true if n is prime, and false otherwise. Here's how I think of it: By definition, a number n is prime if it has no divisors d such that 1 < p < n. We wish to test this by testing all relevant potential divisors d to see if they are actually divisors of n. The simplest way to do this is the exhaustive search, as suggested in this pseudocode:



for each p in the range 2..n-1 (inclusive)
  if n % p == 0, return False, since we have found a divisor of n, namely p, such that 1< d<p.
if no p causes us to return False, then return True, since we have tried all potential divisors and none were actually divisors



This turns out to be inefficient: we have to test all numbers from 2 through n-1, and we know that many of them can be rejected from consideration out of hand. For example, if n == 100, we know that no number in the range 50 .. 99 is a possible contender, since 2 * 50 = 100, and there is no integer less than 2. (so no divisor for 100 can exist in the range 50 < p < 100) More than this, we know that when we find a divisor p for n, we have also implicitly found a second divisor q = n/p. This is what it means to be a divisor. And we can pair these up in this way: we can generate a list [(p1, q1), (p2, q2)...] such that all pairs (p, q) where pq = n are represented. If we consider such a list, sorted such that the ps are in ascending order, we find there's a pattern to the qs: namely, they are in descending order.

If you look at this for a little while, you'll see that there's a "multiplicative midpoint" (m) for any number n, such that any pair of divisors (p,q) obey this inequality: p <=m, q >=m. Can you find that midpoint m? If so, you'll have found the limit for your exhaustive search: there is no need to test divisors greater than n, since any such divisor will be a q, and matched with another divisor p which you will have already found.

So all of this implies that your loop can be bounded as

while p <= m:



I'll leave it to you to discover what m must be, because it's fun to find things like that and I don't want to spoil the fun.

Another inefficiency in the naive algorithm is that we test a lot of numbers we don't need to test, even in that range 1<p <=m

I claim that this theorem is true:

for any number n, if r divides n, and p divides r, then p divides n.

For example, if 6 divides 12 and 2 divides 6, then 2 divides 12.

This implies that there's a particular set of numbers (P) that we can select our candidate divisors p from, and this will be sufficient to establish primality of n. I'll let you consider what that set P is - the answer is surprisingly obvious!

9 Comments On This Entry

Page 1 of 1

CTphpnwb 

23 October 2014 - 08:50 AM
I'd add that if a number is not prime then it has prime factors. This means that if you've got a list of primes <= m then you only need check to see if any of those primes are factors.
0

jon.kiparsky 

23 October 2014 - 08:57 AM

CTphpnwb, on 23 October 2014 - 10:50 AM, said:

I'd add that if a number is not prime then it has prime factors. This means that if you've got a list of primes <= m then you only need check to see if any of those primes are factors.



Indeed. That was the point I was trying to suggest with the last few lines... :)
0

CTphpnwb 

23 October 2014 - 09:09 AM
Oh, sorry. I thought you were pointing out the usual first optimization people use: removing even integers other than 2.
1

jon.kiparsky 

23 October 2014 - 09:26 AM

CTphpnwb, on 23 October 2014 - 11:09 AM, said:

Oh, sorry. I thought you were pointing out the usual first optimization people use: removing even integers other than 2.


I suppose I should make that more explicit, then.
0

BlackWaterFischer 

14 November 2016 - 10:19 AM
i lost you in the 'q' part..
1

jon.kiparsky 

14 November 2016 - 10:12 PM
Ah, that might be because I screwed up one of the terms. This:

Quote

we have also implicitly found a second divisor n = q/p.


should have read

Quote

we have also implicitly found a second divisor q = n/p.


(fixed in the entry)

What this means is that if I know that 2 divides 100, then I also know that 50 = 100/2, and I have a pair (p,q) = (2,100). If I then find 4 divides 100, I also have (p',q')=(4,25), and so forth.

Better?
0

DK3250 

16 February 2018 - 11:04 AM
Following this thread in the forum: http://www.dreaminco...&#entry2354672; I landed in this blog.

I totally agree with jon.kiparsky and the first replyer that P are primes less than sqrt(prime_candidate). I have advocated this for years and implemented it in my code.

Just today (2 years after last blog entry) I did some test of the time used by Python code for different approaches. To my surprise it is much faster to loop through all uneven numbers in range(3, int(sqrt(candidate))+1, 2) than for prim in primes[1:]: where primes is a list of primes: [2, 3, 5, 7, 11, 13, 17,...]. The time for finding the first 10,000 primes was more than 5 times longer when using the primes list.
I guess this is partly due to the efficiency of the build-in range generator, and partly due the the skip'ed test for prim > sqrt(candidate).

So, even if in theory the use of numbers from P is beneficial over all odd numbers; the efficiency of the code may show otherwise in practice.
1

jon.kiparsky 

16 February 2018 - 06:50 PM
Interesting. Did you try filtering the prime list? [il]for p in primes [1:sqrt(candidate)+1]: ...[il]? That might gain you back some of the time lost on the test, particularly if you were taking the square root each time through.
0

DK3250 

17 February 2018 - 07:31 AM
Ok, I've looked deeper into the execution speed of code variations.
My original base code:
def prime_list():
    N = 10000
    primes = [2, 3]
    test = 5

    while len(primes) < N:
        for prim in primes:
            if test % prim == 0:
                break
            if prim ** 2 > test:
                primes.append(test)
                break

        test += 2        

    print(primes[-1])

Running this code 100 times took 40 sec.

Just a small change significantly reduced the time: In line 10 I did:
if prim * prim > test:
and the time fell to 12.5 sec.

Further modification to:
def prime_list_2():
    N = 10000
    primes = [2, 3]
    test = 5

    while len(primes) < N:
        sq = int(test**0.5)
        for prim in primes:
            if test % prim == 0:
                break
            if prim > sq:
                primes.append(test)
                break

        test += 2
..gave a little better result:; 12.0 sec.

Running all odd numbers in range (3, int(sqrt(test))+1, 2) was 16 sec.

Conclusion: I have to correct my input from yesterday, best result is obtained using divisors from the P group.
Naturally, many more efficient methods exists (sieves of various kind), but they are outside scope here...
0
Page 1 of 1

Trackbacks for this entry [ Trackback URL ]

There are no Trackbacks for this entry

November 2019

S M T W T F S
     12
3456789
10111213141516
1718 19 20212223
24252627282930

Tags

    Recent Entries

    Recent Comments

    Search My Blog

    0 user(s) viewing

    0 Guests
    0 member(s)
    0 anonymous member(s)

    Categories