Large number of files in main directory - any issues?

Discussion in 'Site & Server Administration' started by names2buy, Aug 24, 2006.

  1. #1
    I am beginning a new project and would appreciate any words of advice before I get too far into it.

    In general, the site will be an online dictionary. I use all flat static file html, and my ideal format will be http://xxxxxxxx.com/anyword.htm, corresponding to one file per entry.

    Producing the files will not be a problem, but in previous projects I have used a large number of directories. For ease of searching and linking, I would prefer a single directory, so all files will be in the public_html folder.

    The project may eventually reach 600,000 files, so here is my question:

    Will I experience any performance issues or server limitations by using such a large directory?

    The server setup is a standard apache / whm, and it is dedicated so I can control settings if needed (although I am new to server admin.)

    Thanks in advance for your advice!
     
    names2buy, Aug 24, 2006 IP
  2. soniqhost.com

    soniqhost.com Notable Member

    Messages:
    5,887
    Likes Received:
    96
    Best Answers:
    0
    Trophy Points:
    240
    #2
    No there shouldn't be a problem placing a large numbers of files into the main directory.
     
    soniqhost.com, Aug 24, 2006 IP
  3. wheel

    wheel Peon

    Messages:
    477
    Likes Received:
    19
    Best Answers:
    0
    Trophy Points:
    0
    #3
    You should consider breaking them up somehow. I've got a site I took over that has 32,000 pages in the directory and while there's been no server limitations, it's a pain in the butt to work with or find anything. Try and find the header file by looking at the directory listing.......:).
     
    wheel, Aug 25, 2006 IP
  4. fatinfo guy

    fatinfo guy Peon

    Messages:
    586
    Likes Received:
    34
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Why go static when you can use a database to generate the same files?
     
    fatinfo guy, Aug 25, 2006 IP
  5. names2buy

    names2buy Peon

    Messages:
    61
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Thanks to all for the advice and questions to consider...

    Yes, I agree it has been a hassle sometimes to deal with directories of even 1k files...so I guess I will just have to deal with that. I may need to break them up for my own use before I dump everything onto the server.

    As for static vs. dynamic, I've had good experiences with search engines via the static format. Also (although I'm only guessing) people may be more likely to link to a static url.

    Perhaps the static files are also an issue of convenience, as my programming abilities lend more readily to creating files out of a text database...but I have very little knowledge in the sql area.

    ...of course the other issue is getting enough links to get you fully indexed, etc, so quality and usefulness are major issues, as well as relevant linking within your site.

    So to summarize (feel free to correct me here) it seems I can expect a directory of 600k files will function well enough, though the task of getting them there may present quite a challenge. But there is no "brick wall" I will hit, such as the 65k row limit in an excel file???

    Thanks for your time.
     
    names2buy, Aug 25, 2006 IP
  6. wheel

    wheel Peon

    Messages:
    477
    Likes Received:
    19
    Best Answers:
    0
    Trophy Points:
    0
    #6
    This was discussed before here a few months back, and no, there's no brick wall on linux.

    Now might be a good time to pick up mysql. It's not at all difficult. Heck, I could learn it myself.
     
    wheel, Aug 27, 2006 IP
  7. names2buy

    names2buy Peon

    Messages:
    61
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #7
    Thanks...I'm a little more confident going into it now. I'll hope to take you up on the rec. of learning mysql. (in between doing these sites and my 2 "real" jobs).
     
    names2buy, Aug 29, 2006 IP
  8. design2host

    design2host Active Member

    Messages:
    209
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    53
    #8
    Why Go static??? Cos people love Goog. No better reasons
     
    design2host, Aug 31, 2006 IP
  9. Gnet

    Gnet Peon

    Messages:
    5,340
    Likes Received:
    529
    Best Answers:
    0
    Trophy Points:
    0
    #9
    yea ive worked with huge numbers of pages before its a pain to mantain and find anything at all
     
    Gnet, Sep 1, 2006 IP
  10. theblight

    theblight Peon

    Messages:
    246
    Likes Received:
    9
    Best Answers:
    0
    Trophy Points:
    0
    #10
    sometimes you will experience problem in the loading on the FTP client, while accessing your directory.
     
    theblight, Sep 1, 2006 IP
  11. wheel

    wheel Peon

    Messages:
    477
    Likes Received:
    19
    Best Answers:
    0
    Trophy Points:
    0
    #11
    These days, there's no difference between static and dynamic when viewed externally. There's a bunch of different ways to mask this, most folks use mod_rewrite to map static page names to a dynamic URL. Then you get the benefits of static urls (which are pretty limited these days anyway, Google looks after most dynamic urls just fine) and the benefit of managing page addresses dynamically.

    I do it a bit differently. I've got a database of static page names to dynamic urls. Then I point my 404 page to a script that does a lookup in the database. If it finds the page it servers a 200 and the page, if it doesn't find the page it servers a 404.
     
    wheel, Sep 1, 2006 IP
  12. names2buy

    names2buy Peon

    Messages:
    61
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #12
    Thanks all...

    Looks like I could make life easier with sql.

    My model is reference type sites that will only need updated when I want to, so...thus the static format has worked fairly well so far. I write pascal code to convert my excel files to html pages (seems tedious but once the format is set it is fairly efficient, except for uploading a large number of files).

    I have also run into ftp problems with large directories on shared hosting accounts...I'm assuming this should not be a problem on a dedicated account with whm control?

    While I'm on that subject, I'd appreciate any advice on speeding up those large uploads...the best I've come up with so far is filezilla.
     
    names2buy, Sep 2, 2006 IP
  13. alemcherry

    alemcherry Guest

    Best Answers:
    0
    #13
    Even though there is no limitations with number of files, this is going to make FTP a real pain in the a**. Your idea of generating static files dynamically, sounds good. That will give a great performance boost, when you have a lot of traffic.

    My advice, generate the files dynamically and place them within different directories. A directory structure like "aa" to "zz" will let you split up the files logically (eg: foo.html will go to directory 'fo'). And then you have the option to use apache mod_rewrite () to make the URL normal (foo.html resolving to fo/foo.htl).

    If you are not familier with mod_rewrite, spend some time learing. Its not difficult and can be a great help in creating clean URLs. Optionally you can have a dynamic script and still make it look static. But if static page generation is easy for you, just go for it.
     
    alemcherry, Sep 3, 2006 IP
  14. alemcherry

    alemcherry Guest

    Best Answers:
    0
    #14
    BTW, on dedicated servers you have the option to uploading the files zipped and unzipping on server. But still I would recommend a logical directory structure. BTW, you may not actually need a dedicated server unless and untill you have enough traffic.
     
    alemcherry, Sep 3, 2006 IP
  15. names2buy

    names2buy Peon

    Messages:
    61
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #15
    Alemcherry,

    Thanks for the advice. Unzipping the files on the server will save a lot of time. Is there an unzip utility already installed on whm? (or pls recommend software).

    The dedicated server was needed for other sites...hopefully this one will eventually need its own (but probably 2 years to get that kind of traffic from SE).

    Thanks to all for your help.
     
    names2buy, Sep 4, 2006 IP
  16. names2buy

    names2buy Peon

    Messages:
    61
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #16
    Again, thanks to everyone for the advice. The beta version is finally online and here are my experiences thus far:

    1. Server performance is excellent with 170,000 files in the main directory.
    2. Static files are working nicely. I'm using a simple jump to url script for search (since the site is a dictionary the searches are single words). Seems to be a good bit faster than dictionary.com, as there is no real cpu time needed.
    3. Working with the folders on my computer is a mess. I have split these up several times. The main problem is that windows can't handle the large directories.

    beta site is at http://wordlist.com
     
    names2buy, Feb 7, 2007 IP