Subscribe to Knowledge is Power        RSS Feed
-----

A Lesson in Coding Efficiently

Icon 5 Comments
Be forewarned. This post looks a lot longer than it is. There are just a lot of code samples.

Hello again everyone. While answering some questions in the PHP forum, I came upon a post that I actually learned a lot from. A poster had a question about striping a string of all characters except those that were numeric. After some revisions, the first function that was supplied was the following:

function getNumbersOnly($string)
{
    $numbers = '';
    for($i = 0; $i < strlen($string); $i++){
        $temp = ord(substr($string, $i, 1));
        if($temp > 47 && $temp < 58){
            $numbers .= chr($temp);
        }
    }
    return $numbers;
}



It worked, but looking at it, I thought it could be improved. Originally I suggested the use of is_numeric().

function using_is_numeric($string)
{
    $numbers = '';
    for($i = 0; $i < strlen($string); $i++){
        $temp = (substr($string, $i, 1));
        if ( is_numeric($temp))
        {
          $number .= $temp;
        }        
    }
    return $numbers;    
}



Finally, it was asked how you could accomplish this with regular expressions, so I created the following function:

function strip_non_numeric($string)
{
  $pattern = '/[^0-9]/';
  return preg_replace($pattern, '', $string);
}



All three of those functions accomplish the same thing. They take a string like 'asd2094832jkl23$2acj034#', and return the string with only the numbers left.

We started discussing which piece of code was more efficient, so I went ahead and wrote some benchmarks. I only tested the original function against the regular expression function. The results were as follows:

Quote

String length: 4
Strip took: 0.57837104797363 Seconds
Loop took: 1.3016378879547 Seconds

String length: 44
Strip took: 1.2594850063324 Seconds
Loop took: 5.3986718654633 Seconds


It quickly became obvious that the use of the regular expression function was exponentially faster. The more characters there were to strip, the faster the function was comparatively.

There were two major lessons to learn from this experience. The first is the importance of coding efficiently. When working on your smaller apps, a difference like this may not be much. Typically, you won't be stripping an 88 character string, so either method may be fine for you. However, once you begin developing larger systems with a lot of users, it becomes important to understand how to write your code to perform as quickly as possible. Techniques such as benchmarking can help you find your bottlenecks and isolate where a program may be running slow.

The second lesson is to remember that PHP has many pre-built functions to help you accomplish whatever task you're looking for. In the first code sample, the function converts each character in a string to its ascii code, and then checks to see if it falls within the range of ascii codes for numeric numbers. While this method works, it would have been much more efficient to simply use the is_numeric() function. Remember not to reinvent the wheel, and use the php.net manual as a reference. However, I did benchmark the is_numeric() method and while it was a tad bit faster, it was nowhere near as fast as the regex method.

Below you'll find my benchmarking script, as well as all of the functions in this post. Feel free to run your own tests and play with the script.

Challenge: Can anybody write a faster function than the regex one?

I hope this post has been helpful to some, and will at least help influence you to reconsider how you write your functions. Take care, and thanks for reading!

Benchmarking Script
<?php

function strip_non_numeric($string)
{
  $pattern = '/[^0-9]/';
  return preg_replace($pattern, '', $string);
}

function loop_non_numeric($string)
{
    $numbers = '';
    for($i = 0; $i < strlen($string); $i++){
        $temp = ord(substr($string, $i, 1));
        if($temp > 47 && $temp < 58){
            $numbers .= chr($temp);
        }
    }
    return $numbers;
}

function getting_there($string)
{
    $numbers = '';
    for($i = 0; $i < strlen($string); $i++){
        $temp = (substr($string, $i, 1));
        if ( is_numeric($temp))
        {
          $number .= $temp;
        }        
    }
    return $numbers;    
}

function even_closer($string)
{
    $numbers = '';
    
    for($i = 0; $i < strlen($string); $i++){
        if (is_numeric($string[$i]))
          $numbers .= $string[$i];         
    }
    return $numbers;
}

function microtime_float()
{
  list($usec, $sec) = explode(" ", microtime());
  return ((float)$usec + (float)$sec);
}

function benchmark($string)
{
  echo "String length: " . strlen($string) . "<br />";
  
  // Start original looping function
  $start = microtime_float();
  for ($i=0; $i<100000; $i++)
  {
    $temp = loop_non_numeric($string);
  }
  $end = microtime_float(); 
  echo "Loop took: " . ($end-$start) . " Seconds<br />";

  // Replaced ASCII value check with is_numeric()
  $start = microtime_float();
  for ($i=0; $i<100000; $i++)
  {
    $temp = getting_there($string);
  }
  $end = microtime_float();
  echo "Getting There took: " . ($end-$start) . " Seconds<br />";

  // Start function with only numeric, treating as array
  $start = microtime_float();
  for ($i=0; $i<100000; $i++)
  {
    $temp = even_closer($string);
  }
  $end = microtime_float();
  echo "Even Closer took: " . ($end-$start) . " Seconds<br />";

  // Final function using preg_replace()
  $start = microtime_float();
  for ($i=0; $i<100000; $i++)
  {
    $temp = strip_non_numeric($string);
  }
  $end = microtime_float();
  echo "Strip took: " . ($end-$start) . " Seconds<br /><br />";

}

$string = '123k';
benchmark($string);

$string = '348jfakl324928249majs';
benchmark($string);

?>

5 Comments On This Entry

Page 1 of 1

girasquid Icon

01 October 2008 - 04:04 PM
Here's an odd(but faster) way of doing it:
function assign_strip($string) {
 return 0 + $string;
}


It exploits loose-typing, because in situations where PHP/Perl needs to do addition(Perl is where the behavior comes from originally as far as I'm aware), it will strip out all of the non-digit characters.

I originally tried it using abs(), but I believe that just using addition is faster. You could try both - but according to your benchmarking script, the addition method came in at about 1/4 of what the Strip did.
0

akozlik Icon

01 October 2008 - 07:57 PM
Ha, very nice. I'm wary of using an exploit in production code, but for this sort of demonstration that's awesome. I'm going to see the benchmark myself tomorrow. Thanks a lot girasquid, and I'm glad someone's reading this!
0

grimpirate Icon

03 October 2008 - 03:32 PM
Gotta tell ya akozlik. I just tested the regular expressions against another function and it beat out the regex. Here's the code:
<?php
error_reporting(E_ALL);

$text = 'Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.';

$start = microtime(true);
foobar($text);
echo microtime(true) - $start;

echo '<br />';

$start = microtime(true);
strip_non_numeric($text);
echo microtime(true) - $start;
die();

function strip_non_numeric($string)
{
  $pattern = '/[^0-9]/';
  return preg_replace($pattern, '', $string);
}

function foobar($string){
	$blah = array_fill(0, 256, 0);
	for($i = 48; $i < 58; $i++) unset($blah[$i]);
	$blah = array_keys($blah);
	$blah = array_map('chr', $blah);
	return str_replace($blah, '', $string);
}
?>
So I think there may be something amiss with the benchmarks.
0

akozlik Icon

06 October 2008 - 06:48 AM
I had tried that function before and it ran for over 30 seconds. The benchmarking should be fine. I think the reason why that might have been quicker on your machine is because your test text has no numbers in it. That means it would have simply looped through the string without replacing anything. Try a string like

$text = '234jklfa349jaklj3409jlk;uja32$@';



You'll see a difference then.
0

silverblaze Icon

14 January 2009 - 03:03 AM
hello,

today i had some free time so i thought i can play around with your challenge a bit..

here is what i got ..

String length: 4
Loop took: 1.02861499786 Seconds
Getting There took: 1.00581598282 Seconds
Even Closer took: 0.754460096359 Seconds
Strip took: 0.445683956146 Seconds
Strip Ereg took: 0.333122014999 Seconds

String length: 21
Loop took: 4.17803192139 Seconds
Getting There took: 3.67785286903 Seconds
Even Closer took: 3.04734897614 Seconds
Strip took: 1.03313398361 Seconds
Strip Ereg took: 0.721390962601 Seconds

String length: 31
Loop took: 5.77383112907 Seconds
Getting There took: 5.1412229538 Seconds
Even Closer took: 4.14170503616 Seconds
Strip took: 1.41505217552 Seconds
Strip Ereg took: 1.16831302643 Seconds



and the code:

<?php

function strip_non_numeric($string)
{
  $pattern = '/[^0-9]/';
  return preg_replace($pattern, '', $string);
}

function strip_non_numeric_ereg($string)
{
  $pattern = '[^0-9]';
  return ereg_replace($pattern,"",$string);
}

function loop_non_numeric($string)
{
	$numbers = '';
	for($i = 0; $i < strlen($string); $i++){
		$temp = ord(substr($string, $i, 1));
		if($temp > 47 && $temp < 58){
			$numbers .= chr($temp);
		}
	}
	return $numbers;
}

function getting_there($string)
{
	$numbers = '';
	for($i = 0; $i < strlen($string); $i++){
		$temp = (substr($string, $i, 1));
		if ( is_numeric($temp))
		{
		  $number .= $temp;
		}        
	}
	return $numbers;    
}

function even_closer($string)
{
	$numbers = '';

	for($i = 0; $i < strlen($string); $i++){
		if (is_numeric($string[$i]))
		  $numbers .= $string[$i];         
	}
	return $numbers;
}

function microtime_float()
{
  list($usec, $sec) = explode(" ", microtime());
  return ((float)$usec + (float)$sec);
}

function benchmark($string)
{
  echo "String length: " . strlen($string) . "<br />";

  // Start original looping function
  $start = microtime_float();
  for ($i=0; $i<100000; $i++)
  {
	$temp = loop_non_numeric($string);
  }
  $end = microtime_float(); 
  echo "Loop took: " . ($end-$start) . " Seconds<br />";

  // Replaced ASCII value check with is_numeric()
  $start = microtime_float();
  for ($i=0; $i<100000; $i++)
  {
	$temp = getting_there($string);
  }
  $end = microtime_float();
  echo "Getting There took: " . ($end-$start) . " Seconds<br />";

  // Start function with only numeric, treating as array
  $start = microtime_float();
  for ($i=0; $i<100000; $i++)
  {
	$temp = even_closer($string);
  }
  $end = microtime_float();
  echo "Even Closer took: " . ($end-$start) . " Seconds<br />";

  // Final function using preg_replace()
  $start = microtime_float();
  for ($i=0; $i<100000; $i++)
  {
	$temp = strip_non_numeric($string);
  }
  $end = microtime_float();
  echo "Strip took: " . ($end-$start) . " Seconds<br />";
  
  $start = microtime_float();
  for ($i=0; $i<100000; $i++)
  {
	$temp = strip_non_numeric_ereg($string);
  }
  $end = microtime_float();
  echo "Strip Ereg took: " . ($end-$start) . " Seconds<br /><br />";

}

$string = '123k';
benchmark($string);

$string = '348jfakl324928249majs';
benchmark($string);

$string = '234jklfa349jaklj3409jlk;uja32$@';
benchmark($string);

?>



Well i tried using ereg_replace insted of preg_replace. In this case it proves faster than preg, even though in most cases it wont. :)
0
Page 1 of 1

September 2014

S M T W T F S
 1 2 3456
78910111213
14151617181920
21222324252627
282930    

Tags

    Recent Entries

    Recent Comments

    Search My Blog

    0 user(s) viewing

    0 Guests
    0 member(s)
    0 anonymous member(s)