Wednesday, June 4, 2014

Single line prime numbers in Python

I’m working through some Project Euler problems with some co-workers. For some reason, I decided to see what it would take to create a single line python expression that calculates prime numbers to X. And do it in a reasonable amount of time. Yeah, its a pointless exercise, you would never use it in real life. There are faster methods of finding primes. But I like a challenge.

Here’s what I’ve come up with:

[2] + sorted(set(xrange(3,max,2)) - { x for step in xrange(3, int(max**0.5) + 1, 2) if step %3 or step==3 for x in xrange(step * 3, max, step * 2)} )

You pass in ‘max’ and let’er rip. I’ll explain how it works by breaking it down into its smaller components. First, lets break this into some higher level components:

[2] + sorted(set(SETA) - set(SETB) )

For performance reasons, I skip doing anything with the first prime, 2. This immediately cuts the set of data that I’m working with in half by eliminating anything thats even. Then because I’m working with sets, I have to sort them. Otherwise, its possible to get the primes out of order. If you don’t care about the order, you can eliminate the sorted() call.

The SETA is really just a set of all the even numbers from 3 to max. Using 20 as an example max, this part will give you {3, 5, 7, 9, 11, 13, 15, 17, 19}. Then SETB is all the non-prime odd numbers, in this case  {9, 15}.  When you subtract SETB from SETA, you end up with {3,5,7,11,13,17,19}. Then append that to the list of [2], and you end up with [2,3,5,7,11,13,17,19]. 

Thats the high level view. Now lets move to the craziness that calculates the non-primes. This is known as a prime number sieve, which you can read about on wikipedia and others. Lets start in the ‘outer most’ loop

for step inxrange(3, int(max**0.5) + 1, 2)

Right in the middle, you’ll find the above section. This iterates from three to the square root  of the max. Plus one to avoid of by one numbers. Why use the square root? Because it cuts down on the amount of work.  So if our max it 20, we’ll get a list of [3,5]. The square root of 20 is roughly 4.4, which will get truncated to 4. We then add one to avoid missing any values. Why only up to the square root, and not the max? Because we’re looking for composite numbers. Any numbers larger than the square root of the max multiplied together will be larger than the max. 

for x in xrange(step * 3, max, step * 2)

The next inner loop. Here, we’re creating a list using the value from the outer loop. We’re starting with step * 3. Why? Well, step by itself may be prime, so we don’t want it in this list. And if its not prime, it would have been added by a previous iteration of the loop. So thats the original step value. The next step value will be double that, and will be even. We’re ignoring even numbers here, so we don’t want them in our list. So we need to go one more, step * 3. Then the step on this loop is step * 2, as to skip the even values. Taking the max = 20 example, we have have the step values of [3,5]. Going through this loop with 3, we’ll end up with [9,15]. Then when  we go through with 5, we get [15]. Since these go into a set, we end up with the final product of {9, 15}.

 if step %3 or step==3

What is this all about? This was an optimization I added. It seems like this creates more work, but it actually eliminates 1/3 of the work that would happened without it.  The only times we hit in the inner loop is if the step is not divisible by 3, or the step is actually 3. I ordered the if statement that way, because 2/3 of the time, the first part of the if will be true, and only hits the second part 1/3 of the time. I tried added step%5 or step==5, but that slowed things down. Diminishing returns.

So there you have it. A reasonably fast prime number finder in a single line of python. I’d be interested in faster/better implementations. I do have a faster two line version of this, thats about 10% faster. But thats not one line.

No comments:

Post a Comment