I am going to start a search engine and I want to know whether storing too much data on database will slow down it ? and How many no of rows a SQL database can have?
its natural that performance of search will depend on number of records. But database engines are smart enough to handle huge amount of records. So far I know its not the no of rows, its the disk space determines how much data will be able to hold
I've got a database with over 10,000 tables of data (can't pull it up with phpmyadmin without it crashing your browser). Performance is just fine, and it get's loads of queries.
A database can handle millions of records as long as you have a fast server. If your database outgrow a server limit, you might have to setup cluster of servers to handle your database. I have a 2GB database currently running on a Dual Xeon server and it is still running very fast during query. So it depends on hardware that the database is sitting...
In addition to sizing your hardware appropriately, don't overlook the importance of design. - Make sure that your database is properly indexed. That doesn't mean that you should index every conceivable field/table. It does mean that you should have some understanding of how your queries/procs will access the data. - Make sure sure that your queries/proc are well-written (so that, for example, the database can optimize the query using the indexes you have defined). Poorly designed databases and/or SQL can kill performance. In some cases, order of magnitude increases in performance can be realized simply by improving your indexing and/or query implementation.
Agreed. Comes more down to query design and table structure, as well as hardware. No indexes and poorly designed queries and you can slow a fast server down with only a few thousand rows. Generally you'd want to look into indexing fields that come after "where" in your query. But work with "explain" to see how your query would perform.
Another thing to take into account is the database platform. Some DBs (particularly some free ones) have a hardcoded limit either on the number of records or the size of the DB itself. In some cases this is done to protect the database from the limits of the SQL engine. In some cases this is done just to force you to upgrade to something that makes the software company more money. The type of queries will also make a difference. This is related to query optimization, but not always. Let's say you were always going to pull from one table only, those queries would be fast. But start doing joins or nested select statements using three tables of > 1 million rows each, and that can be taxing for small databases. Whether this is a database, optimization, or hardware issue depends on first on the SQL engine. Some small DBs (Access) have definite engine limits, but others (MSSQLExpress which is really just a stripped down version of MSSQL) are more limited by the query optimization and hardware issues. In your case, if you use MySQL or MSSQLExpress then the engine should not be your issue. Depending on how you plan to execute the search, optimization will be your big issue. Will you assign keywords to each site, or will you store the text from the site's pages? Another issue that can crop up that will matter more the larger the DB is, if you delete rows regularly you will need to defrag the DB. Another option is to have a bit field (isDeleted) and never actually delete rows. You'll still need to defrag large tables once in a while.
I'd be looking at Oracle for this. However is it going to be a small niche search engine or something to rival Google? A niche may survive on MySQL. I'd recommend you get the contractors in to build the indexer and back end systems. The algorithm work will be intense and not a job for a newbie or intermediate level programmer.
you can have millions or even billions, but searching through that would take some huge hardware power