I am beginning a new project and would appreciate any words of advice before I get too far into it. In general, the site will be an online dictionary. I use all flat static file html, and my ideal format will be http://xxxxxxxx.com/anyword.htm, corresponding to one file per entry. Producing the files will not be a problem, but in previous projects I have used a large number of directories. For ease of searching and linking, I would prefer a single directory, so all files will be in the public_html folder. The project may eventually reach 600,000 files, so here is my question: Will I experience any performance issues or server limitations by using such a large directory? The server setup is a standard apache / whm, and it is dedicated so I can control settings if needed (although I am new to server admin.) Thanks in advance for your advice!
You should consider breaking them up somehow. I've got a site I took over that has 32,000 pages in the directory and while there's been no server limitations, it's a pain in the butt to work with or find anything. Try and find the header file by looking at the directory listing........
Thanks to all for the advice and questions to consider... Yes, I agree it has been a hassle sometimes to deal with directories of even 1k files...so I guess I will just have to deal with that. I may need to break them up for my own use before I dump everything onto the server. As for static vs. dynamic, I've had good experiences with search engines via the static format. Also (although I'm only guessing) people may be more likely to link to a static url. Perhaps the static files are also an issue of convenience, as my programming abilities lend more readily to creating files out of a text database...but I have very little knowledge in the sql area. ...of course the other issue is getting enough links to get you fully indexed, etc, so quality and usefulness are major issues, as well as relevant linking within your site. So to summarize (feel free to correct me here) it seems I can expect a directory of 600k files will function well enough, though the task of getting them there may present quite a challenge. But there is no "brick wall" I will hit, such as the 65k row limit in an excel file??? Thanks for your time.
This was discussed before here a few months back, and no, there's no brick wall on linux. Now might be a good time to pick up mysql. It's not at all difficult. Heck, I could learn it myself.
Thanks...I'm a little more confident going into it now. I'll hope to take you up on the rec. of learning mysql. (in between doing these sites and my 2 "real" jobs).
sometimes you will experience problem in the loading on the FTP client, while accessing your directory.
These days, there's no difference between static and dynamic when viewed externally. There's a bunch of different ways to mask this, most folks use mod_rewrite to map static page names to a dynamic URL. Then you get the benefits of static urls (which are pretty limited these days anyway, Google looks after most dynamic urls just fine) and the benefit of managing page addresses dynamically. I do it a bit differently. I've got a database of static page names to dynamic urls. Then I point my 404 page to a script that does a lookup in the database. If it finds the page it servers a 200 and the page, if it doesn't find the page it servers a 404.
Thanks all... Looks like I could make life easier with sql. My model is reference type sites that will only need updated when I want to, so...thus the static format has worked fairly well so far. I write pascal code to convert my excel files to html pages (seems tedious but once the format is set it is fairly efficient, except for uploading a large number of files). I have also run into ftp problems with large directories on shared hosting accounts...I'm assuming this should not be a problem on a dedicated account with whm control? While I'm on that subject, I'd appreciate any advice on speeding up those large uploads...the best I've come up with so far is filezilla.
Even though there is no limitations with number of files, this is going to make FTP a real pain in the a**. Your idea of generating static files dynamically, sounds good. That will give a great performance boost, when you have a lot of traffic. My advice, generate the files dynamically and place them within different directories. A directory structure like "aa" to "zz" will let you split up the files logically (eg: foo.html will go to directory 'fo'). And then you have the option to use apache mod_rewrite () to make the URL normal (foo.html resolving to fo/foo.htl). If you are not familier with mod_rewrite, spend some time learing. Its not difficult and can be a great help in creating clean URLs. Optionally you can have a dynamic script and still make it look static. But if static page generation is easy for you, just go for it.
BTW, on dedicated servers you have the option to uploading the files zipped and unzipping on server. But still I would recommend a logical directory structure. BTW, you may not actually need a dedicated server unless and untill you have enough traffic.
Alemcherry, Thanks for the advice. Unzipping the files on the server will save a lot of time. Is there an unzip utility already installed on whm? (or pls recommend software). The dedicated server was needed for other sites...hopefully this one will eventually need its own (but probably 2 years to get that kind of traffic from SE). Thanks to all for your help.
Again, thanks to everyone for the advice. The beta version is finally online and here are my experiences thus far: 1. Server performance is excellent with 170,000 files in the main directory. 2. Static files are working nicely. I'm using a simple jump to url script for search (since the site is a dictionary the searches are single words). Seems to be a good bit faster than dictionary.com, as there is no real cpu time needed. 3. Working with the folders on my computer is a mess. I have split these up several times. The main problem is that windows can't handle the large directories. beta site is at http://wordlist.com