Hi y'all, I'm trying to make a search for a site with approximately one million products. I have a php search script that directs MySQL to search a text field of every product for an exact match of queried words (not any combination). The products fall into several categories, so I can easily break them up into tables. The search, however, I don't know what to do about. If I break that up (into multiple search tables), it has the potential to take just as long (if not longer) to search because still the same amount of products. Unfortunately, there is no clean line between products (dvds may be in more than one category), so I can't make a list of where to search for certain keywords. I've thought to make a hidden input in the form (that will be parsed by the search, and then allow for querying the most likely table), but then I would lose the other prods. I could put a note saying something like, "didn't find what you were looking for, try here" making it a link to another category, but that doesn't make me a happy camper. Any way, any advice? Keep in mind this is a growing company that will eventually need some kind of table separation, no matter what (even if you consider one million products small). =) Thanks for any help. Rob
The category is information related to the product so ideally you would add a column category to your existing table. Creating a split based upon information about the product would be a strange design choice and something of premature optimisation.
So, it's not a good idea to break up a table that has over one million rows? Maybe a little normalization?
Depending on your application the usage of queries might break down as well, just try using queries on the netflixdataset, its 100M records in size, so 1M is rather small, tiny actually Running through that index takes some time as well, not much, but it is not something that scales out/up very well. Splitting your table doesn't really help, unless you let the DB do it for you using a table partitioning system, you get one table to use, but in the background the DB splits it in multiple tables, when used with query constraints(PostgreSQL term) it can be used to narrow down the set of data the server needs to parse, or scan, or get the index for. It can also be used to send your query to multiple DB servers, I think Mysql can be setup to use this technique. The question remains what are your business needs ? How many products are realisticly needed, (I am still wondering how someone can offer 1M products on an E-Commerce site) How many visitors on average and peak for the functions in your site (Find product, Order product, Contact Form, Create account) Required response time for each function in your site Perform a performance benchmark to find what you can handle now and how much breathing space you have left. And run the benchmark on your expected 10M (???) products database as well. If the system performs according to your current and anticipated needs why bother with premature optimisations.
I would second (or third) the suggestion to keep all of the data in one table if you possibly can. Part of the point of a database is that all of your data is in one place. Once you move away from this, you are (as you have already found) creating problems for yourself. As long as the products table is properly indexed, a 100 million row table is just as manageable as a 1 million row onw. This is different from normalisation. And yes, you should look at that if you think it might help. Anything that isn't an attribute of the product itself should be in another, related table. For instance, if you have a supplier's address for each product, this shouldn't be in the products table itself, but in a separate suppliers table, with a supplier_id as the foreign key in the products table. (Probably a bad example, but obviously I don't know what your exact table structure is.)