2 Replies - 277 Views - Last Post: 24 June 2011 - 06:31 AM Rate Topic: -----

#1 cyb1n  Icon User is offline

  • New D.I.C Head
  • member icon

Reputation: 3
  • View blog
  • Posts: 27
  • Joined: 08-May 09

Optimizing Directory Indexer

Posted 23 June 2011 - 01:56 PM

I'm new to programming in PHP and I've run into a snag with a bit of code. I've got a function that indexes a directory of photos, it's used in conjunction with a database query to get info about the photo based on the name. The function runs fine when used on a small scale but during a stress test (8000+ files) it took 10 minutes to load the page (as well as any time the page was refreshed). If any has any suggestions for ways to optimize it I'd be very grateful. My initial thought was to store the array in the session just to prevent it being called every time the page was loaded.

function dirIndex($dirpath) {
	$filelist = array();

	if (is_dir($dirpath)) {
		if ($dh = opendir($dirpath)) {
			// Index files within directory

			$count = 0;

			while (($file = readdir($dh)) !== false) {
				if ($file != '.' && $file != '..') {
					$filepath = $dirpath . $file;
					$mimetype = fileMimeType($filepath);

					if (strstr($mimetype, 'image/')) {
						array_push($filelist, $file);
					}
				}
			}
			closedir($dh);
		}
	}

	return $filelist;
}



Is This A Good Question/Topic? 0
  • +

Replies To: Optimizing Directory Indexer

#2 AdaHacker  Icon User is offline

  • Resident Curmudgeon

Reputation: 452
  • View blog
  • Posts: 811
  • Joined: 17-June 08

Re: Optimizing Directory Indexer

Posted 23 June 2011 - 02:56 PM

Well, you could start by not running this on a directory with 8000 files. :)

Seriously, though, this function is not your problem. Directory traversal is fast. Even with 8000 files, just building up the list shouldn't take more than a second on any half-way decent hardware. And just to be sure, I tried it out. Using a simple regex for the fileMimeType() function, I was able to run this over a directory of 8000 images on my laptop in 0.098 seconds.

Of course, if you're doing something different in your fileMimeType() function, that could seriously slow things down. That's your only real optimization point just given what you've posted. I don't know what your requirements are in terms of the accuracy of the MIME type check, but the fastest way would simply be to determine the file type based on the name, i.e. check for known image file extensions like ".jpg" and ".png". Granted, it's not as accurate, but anything that requires you to read the file content is going to slow things down. I tried implementing that function using a system call to the "file" command and the execution time skyrocketed from 0.98 seconds to about 40 seconds.

However, even with a slow version of fileMimeType(), this still shouldn't take anywhere near 10 minutes to complete on reasonable hardware. What exactly do you mean when you say "it took 10 minutes to load the page"? Are you talking about the time for the server to respond with the HTML, or the time to render the page? Because if you're trying to display thumbnails of all the images or something like that, pretty much your only choice is to not display everything at once. Downloading 8000 images is going to be slow no matter how you slice it.
Was This Post Helpful? 0
  • +
  • -

#3 cyb1n  Icon User is offline

  • New D.I.C Head
  • member icon

Reputation: 3
  • View blog
  • Posts: 27
  • Joined: 08-May 09

Re: Optimizing Directory Indexer

Posted 24 June 2011 - 06:31 AM

View PostAdaHacker, on 23 June 2011 - 05:56 PM, said:

Well, you could start by not running this on a directory with 8000 files. :)

Seriously, though, this function is not your problem. Directory traversal is fast. Even with 8000 files, just building up the list shouldn't take more than a second on any half-way decent hardware. And just to be sure, I tried it out. Using a simple regex for the fileMimeType() function, I was able to run this over a directory of 8000 images on my laptop in 0.098 seconds.

Of course, if you're doing something different in your fileMimeType() function, that could seriously slow things down. That's your only real optimization point just given what you've posted. I don't know what your requirements are in terms of the accuracy of the MIME type check, but the fastest way would simply be to determine the file type based on the name, i.e. check for known image file extensions like ".jpg" and ".png". Granted, it's not as accurate, but anything that requires you to read the file content is going to slow things down. I tried implementing that function using a system call to the "file" command and the execution time skyrocketed from 0.98 seconds to about 40 seconds.

However, even with a slow version of fileMimeType(), this still shouldn't take anywhere near 10 minutes to complete on reasonable hardware. What exactly do you mean when you say "it took 10 minutes to load the page"? Are you talking about the time for the server to respond with the HTML, or the time to render the page? Because if you're trying to display thumbnails of all the images or something like that, pretty much your only choice is to not display everything at once. Downloading 8000 images is going to be slow no matter how you slice it.


The 10 minute load time was for the page to load in the browser, which consisted of the first image in the list a two forms. The fileMimeType function is using finfo to retrieve the mimetype of the file. The only reason it is included in the function is because the dirIndex function was also used on a separate page to batch rename files and in some cases convert .bmp files to .jpg so the accuracy of the mimetype was important there, but not so much here. I'll try excluding it from this call to the function and see if the load time improves. Thank you for the suggestion.
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1