Hrm...

file system questions

  • (2 Pages)
  • +
  • 1
  • 2

18 Replies - 2218 Views - Last Post: 24 April 2003 - 07:45 PM Rate Topic: -----

#1 klewlis  Icon User is offline

  • cur tu me vexas?

Reputation: 8
  • View blog
  • Posts: 1,723
  • Joined: 09-November 01

Hrm...

Posted 14 April 2003 - 08:52 AM

so, how many folders can i put in one directory before the system starts to get bogged down? i need to find a way to organize a large number of files (possibly VERY large) and i'm looking for the best way to break it down so that there aren't thousands of files in each directory...

i don't know the specs on the client's hardware yet, since they haven't bought it yet, but it will be good (they're budgetting 10,000 just for a web server...) and running windows, apache, mysql...

i could do it with the same hierarchy as the front end, but i'm not sure that's the most efficient way... ideas?

Is This A Good Question/Topic? 0
  • +

Replies To: Hrm...

#2 arniie  Icon User is offline

  • D.I.C Addict

Reputation: 0
  • View blog
  • Posts: 999
  • Joined: 08-October 02

Re: Hrm...

Posted 14 April 2003 - 08:57 AM

sorry klewlis i'm trying to understand what your asking ?

could you say it a bit 'plain'er'....

you want to knoiw the best way to organise folders and files ??
Was This Post Helpful? 0
  • +
  • -

#3 klewlis  Icon User is offline

  • cur tu me vexas?

Reputation: 8
  • View blog
  • Posts: 1,723
  • Joined: 09-November 01

Re: Hrm...

Posted 14 April 2003 - 09:24 AM

the system i'm building off of saves all files in the same directory. this means that if i have thousands of files, they're all going to be in the same directory, and that's bad for the system (bogs it down when it has to search through that many files in one directory). so:

1) how many files/subdirectories can i have before it's bad?
2) is there a good/standard way of breaking them up?
Was This Post Helpful? 0
  • +
  • -

#4 arniie  Icon User is offline

  • D.I.C Addict

Reputation: 0
  • View blog
  • Posts: 999
  • Joined: 08-October 02

Re: Hrm...

Posted 14 April 2003 - 09:30 AM

what are the types of files. are they all the same ? or are some images, some php files, some text files etc ?
Was This Post Helpful? 0
  • +
  • -

#5 klewlis  Icon User is offline

  • cur tu me vexas?

Reputation: 8
  • View blog
  • Posts: 1,723
  • Joined: 09-November 01

Re: Hrm...

Posted 14 April 2003 - 11:58 AM

arniie, on Apr 14 2003, 10:30 AM, said:

what are the types of files. are they all the same ? or are some images, some php files, some text files etc ?

it doesn't matter...

they can't be sorted by file type. :)
Was This Post Helpful? 0
  • +
  • -

#6 gneato  Icon User is offline

  • <title>Untitled Document</title>

Reputation: 0
  • View blog
  • Posts: 1,311
  • Joined: 03-September 01

Re: Hrm...

Posted 14 April 2003 - 12:47 PM

You're not really being specific enough... and are you talking about an XP server?

if the filesystem is NTFS... you'll have to look up the limitations on it.

You *could* put files in a directory corresponding to the first letter of the filename, for example, all files starting with "a" go in a/ and all files starting with s go in the s/ folder...

What's it for? You should try to choose a system that will balance the files between the folders as much as possible, I think.

I don't know if your worries are really necessary though, you might be able to do this system and store everything in one directory.

You might consider an sql database with the file attributes and stuff... would make searching for a particular file much faster than doing a query on the filesystem..
Was This Post Helpful? 0
  • +
  • -

#7 klewlis  Icon User is offline

  • cur tu me vexas?

Reputation: 8
  • View blog
  • Posts: 1,723
  • Joined: 09-November 01

Re: Hrm...

Posted 14 April 2003 - 01:10 PM

the files are definitely recorded in the db for easier searching.

it's for a document management system. the system that i'm building off of just stores them by number (so 00001, 00002, etc) which is fine if you only have a few hundred, but i'm expecting thousands. i thought that i could break them up into folders of 500 files each, and then if i had 500 of those it would give me space for 250,000 files, which could work...

i'm not sure if it will be xp or nt... but i've been told that i shouldn't have more than 500 or so in each directory...

This post has been edited by klewlis: 14 April 2003 - 01:11 PM

Was This Post Helpful? 0
  • +
  • -

#8 gneato  Icon User is offline

  • <title>Untitled Document</title>

Reputation: 0
  • View blog
  • Posts: 1,311
  • Joined: 03-September 01

Re: Hrm...

Posted 14 April 2003 - 01:17 PM

Well, sounds like you have it figured out then.
Was This Post Helpful? 0
  • +
  • -

#9 klewlis  Icon User is offline

  • cur tu me vexas?

Reputation: 8
  • View blog
  • Posts: 1,723
  • Joined: 09-November 01

Re: Hrm...

Posted 14 April 2003 - 03:32 PM

well.... except that's only one way of doing it and i'm not sure it's the best way... especially if i end up with more than that many files....

so i was looking for other ideas :P
Was This Post Helpful? 0
  • +
  • -

#10 cyberscribe  Icon User is offline

  • humble.genius
  • member icon

Reputation: 10
  • View blog
  • Posts: 1,062
  • Joined: 05-May 02

Re: Hrm...

Posted 14 April 2003 - 05:43 PM

Hi Klew,

I would handle this by creating the following two functions:

<?PHP

$root = "C:\Program Files\The Program\Data";
$max = 500;

// inputs a file in numeric format and returns a string with a sub-diretory
// in the file path and a re-numbered file that guarantees the number of
// files for the directory will not exceed $max
function file_out($numeric_file) {
    global $root;
    global $max;
    $dir = strtok((int)$numeric_file / $max,".");
    $file = $root."\\".$dir."\\".((int)$numeric_file % $max);
    return $file;
}

// inputs the sub-directory and re-numbered file and returns the original
// numbered file as a string
function file_in($my_numeric_file, $my_directory) {
    global $root;
    global $max;
    $file = ((int)$my_directory * $max) + (int)$my_numeric_file;
    return $file;
}


?>



and then modifying the codebase you are using to run a file_out operation on the filenames the program creates by default. Then, when the program wants to reference the file by its original name, run file_in to convert it back to the "000001", "000002", etc. naming structure.

Basically, the file_out program will convert your numeric file names into:

Quote

C:\Program Files\The Program\Data\0\0 for 0 ( or 000000 as the case may be)
C:\Program Files\The Program\Data\0\1 for 1
C:\Program Files\The Program\Data\0\2 for 2
...
C:\Program Files\The Program\Data\0\499 for 499
C:\Program Files\The Program\Data\1\0 for 500
C:\Program Files\The Program\Data\1\1 for 501
...
C:\Program Files\The Program\Data\49\499 for 24999


This ensures you only get exactly $max (in this case 500) number of files in a directory but can scale your number of directories indefinitely.

This approach is very similar to simple database hashing (note the mod symbol -- % -- is also called "hash"). You could create a sub-directory and a sub-sub-directory by using a method called "double hashing." And on it goes. For your purposes, I think one hash will do the trick.

Hashing is often used to create search functions, since you only have to run mod (a very fast built in function) to get to the directory and simple division and truncation to find the file number. It creates a very nice tree structure with very predictable (that is, mathematical) organization.

Of course, the $root header is optional and you may want to adjust the function (up or down) by one if, for example, the codebase you are using starts incrementing from 1 instead of zero.

Hope that helps.

Cheers,
Robert

This post has been edited by cyberscribe: 14 April 2003 - 05:45 PM

Was This Post Helpful? 0
  • +
  • -

#11 kf6yvd  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 30
  • Joined: 27-March 03

Re: Hrm...

Posted 14 April 2003 - 05:48 PM

I'm to lazy to login to my account, this is skyhawk...

I think one of the best persons to ask about this would be the folks that programmed Ikonboard (the old one prior to the database driven) everything was stored in flat-file databases... meaning EVERY post has a file and every THREAD had a file, and I know there were some pretty big sites out there running the old Ikonboard.

Just a thought :)
Was This Post Helpful? 0
  • +
  • -

#12 cyberscribe  Icon User is offline

  • humble.genius
  • member icon

Reputation: 10
  • View blog
  • Posts: 1,062
  • Joined: 05-May 02

Re: Hrm...

Posted 14 April 2003 - 06:16 PM

kf6yvd, on Apr 14 2003, 04:48 PM, said:

everything was stored in flat-file databases... meaning EVERY post has a file and every THREAD had a file, and I know there were some pretty big sites out there running the old Ikonboard.

I'm guessing the structure for creating message threads would be a very specialized version of the simple hashing described above. Because, of course, you're doing something that is ideally suited for a relational database (parent - child structures linked by keys) but in flat file format. I know mail applications like UnityMail use a similar approach in that they create text files, but the relationships are stored in SQL (where, imho, they belong).
Was This Post Helpful? 0
  • +
  • -

#13 klewlis  Icon User is offline

  • cur tu me vexas?

Reputation: 8
  • View blog
  • Posts: 1,723
  • Joined: 09-November 01

Re: Hrm...

Posted 14 April 2003 - 08:38 PM

yes, in this case the actual relationships are stored in a mysql db, which is nicer.

thanks for the suggestion, cyber, i will try it out this week...
Was This Post Helpful? 0
  • +
  • -

#14 cyberscribe  Icon User is offline

  • humble.genius
  • member icon

Reputation: 10
  • View blog
  • Posts: 1,062
  • Joined: 05-May 02

Re: Hrm...

Posted 15 April 2003 - 10:52 AM

klewlis, on Apr 14 2003, 07:38 PM, said:

thanks for the suggestion, cyber, i will try it out this week...

cool ... lemme know how it goes :)
Was This Post Helpful? 0
  • +
  • -

#15 klewlis  Icon User is offline

  • cur tu me vexas?

Reputation: 8
  • View blog
  • Posts: 1,723
  • Joined: 09-November 01

Re: Hrm...

Posted 15 April 2003 - 11:55 AM

k i'm getting a weird error:
Warning: mkdir(e:/www/data/0/0/) [function.mkdir]: No such file or directory in E:\www\dms\inc\inc.FileUtils.php on line 53

why would it say that when i'm trying to *create* the directory?

it's not a permissions problem... what else could it be?
Was This Post Helpful? 0
  • +
  • -

  • (2 Pages)
  • +
  • 1
  • 2