Recommendation on efficient database design

Discussion in 'MySQL' started by nickjag, Sep 5, 2007.

  1. #1
    I'm designing a website that has several (30 or so) categories for layouts. In each category there could be upwards of 1000 layouts that are displayed 10 at a time. Each layout entry will have the following information: title, datestamp, and possibly category... which brings me to my question. In designing for optimal memory usage and efficiency, which database design should I choose?

    1) Use a separate database tables for each category of layouts (30 or so) which would have about 1000 rows in each and the would be comprised of only title and datestamp.

    2) Use a master database table that includes all the layouts (30,000 or more) and consists of title, datestamp, and category. (and then possibly even a join for the category names)

    3) Something else?

    I'm using PHP and MySQL and this database should be able to serve over 100,000 unique visits a day... which is why I'm trying to design for efficiency and speed. Also, does anyone have any idea on indexing for this design for maximum efficiency? Your help and input is greatly appreciated! :)
     
    nickjag, Sep 5, 2007 IP
  2. ssanders82

    ssanders82 Peon

    Messages:
    77
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #2
    I would go for option #2. You would have a table called "Layout" and a 2nd table called "Category" with 30 entries in it, with an autoincrement integer primary key and the category name. (Also, any other data for the category.)

    Make a categoryID integer field in your "Layout" table, which is a foreign key to your Category table. Put an index on it. This should make searching for all layouts in a particular category relatively fast.
     
    ssanders82, Sep 5, 2007 IP
  3. nickjag

    nickjag Active Member

    Messages:
    50
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    56
    #3
    Excellent, thanks... that's about what I was thinking. I want to order by the date so I can put the most recent layouts first.. should I also index the date then too?
     
    nickjag, Sep 5, 2007 IP
  4. flippers.be

    flippers.be Peon

    Messages:
    432
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #4
    If you follow regular db design guidelines you'll get to option 2.

    However if your database becomes really too big and you are limited in resources (not a very fast cpu/memory/..) you may get problems..
    then option 1 may be a more optimal solution.
    Smaller tables mean smaller indexes, less page reads, ... so it will be faster.

    It may not be normalized, not 'logical' if someone else needs to work on this application, but it is better suited for the application you want to make.

    The problem however is that you get into problems if you want to choose like the top10 of most downloaded templates of all categories..
    that's only fast if you store them in 1 table.

    If you only will query, each time for only 1 type, then you can split the tables for each type.

    And yes you have to put an index on every column you want to select from.
    Choose your clustered index wisely ! (what you will search/order on most)
     
    flippers.be, Sep 6, 2007 IP
  5. ssanders82

    ssanders82 Peon

    Messages:
    77
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #5
    flippers, you may have a valid argument at the extreme limits of data storage. However, I strongly feel that this type of premature optimization should be avoided until it becomes a problem. I host a reviews site with over 500,000 product reviews in one table, on a hosted server (probably dozens or hundreds of domains sharing it). I haven't had a problem yet. Nickjag, once you get facebook- or google-style traffic you will need someone a lot smarter than me to optimize it, but for now, do it the standard way (#2).
     
    ssanders82, Sep 6, 2007 IP
  6. ssanders82

    ssanders82 Peon

    Messages:
    77
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #6
    That being said, optimization for very large sites is extremely interesting and it's always good to know the alternatives to traditional "normalized" database architecture.
     
    ssanders82, Sep 6, 2007 IP
  7. flippers.be

    flippers.be Peon

    Messages:
    432
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #7
    ssanders: yes, one should try to follow regular normalisation rules (and de-normalise if it's required for performance reasons)
    with todays performant servers it should be no problem

    but in some cases one needs to take a totally different approach to what seems logical

    a very long time ago (in 1999 on SQL Server 6.5) I even implemented a totally different solution..
    same type of problem, a website displayed a list out of a database, had a lot of daily visitors, but the data itself only changed every few days or even weeks

    the solution was to use triggers on the table that contained the data, when a row was inserted, the data was exported into an html-template..
    so while the data itself was in a database, each visitor did not select the same data over and over but just included the output file..
    when a new row was inserted, the include file was updated

    that way you could have 100000's of visitors without having a load on the database :)

    again, with todays performant servers this usually isn't an issue, but sometimes you need to take a look from another perspective
     
    flippers.be, Sep 6, 2007 IP
  8. nickjag

    nickjag Active Member

    Messages:
    50
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    56
    #8
    Thanks for the help guys! My primary concern with 2 wast that the orderby date would start taking up a lot once they were on... say... page 99 or so, but after a lot of searching I've come up with a new solution that eliminates the dates - plain incremented numbers.

    I will use the #2 except replace date with incremented numbers that are incremented by each layout category. To make it easy on data entry, I will count the number of rows returned with a layout type specified and when I enter the next layout it will just add one to the new entry.

    Using this will allow faster processing for orderby with indexes on layout category and the incemented numbers.

    Thanks again
     
    nickjag, Sep 11, 2007 IP
  9. flippers.be

    flippers.be Peon

    Messages:
    432
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #9
    a date or a number will not make a lot of difference.. what matters is if there is an index on that column and if the index is clustered or nonclustered

    if you use a number, then do not forget to put a (clustered) primary key on columns category+number

    another possibility is to use an identity/auto-increment column (it will increase values itself, you don't have to count them yourselves) but then you have one index number that increases over all categories
     
    flippers.be, Sep 12, 2007 IP
  10. nickjag

    nickjag Active Member

    Messages:
    50
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    56
    #10
    Hey flippers, thanks for the great info! You're obviously very knowledgeable on databases. I should have actually clarified, I was planning on specifying a range instead of doing orderby with the category numbers. I read that specifying a range instead of orderby would be much faster. Your thoughts?

    I wasn't aware of cluster indexes or identity/column keys (auto increment), but then again, I've only built very simple databases before. I'll have to look more into these then, thanks for the ideas.
     
    nickjag, Sep 13, 2007 IP
  11. mariush

    mariush Peon

    Messages:
    562
    Likes Received:
    44
    Best Answers:
    0
    Trophy Points:
    0
    #11
    Second option. 30.000 records is extremely small for a MySQL database, it can handle it without problems. If you're not updating the rows a lot, the whole table will probably be cached in memory.

    And what the others above me said, think a lot about how you make your indexes. For time, my choice is to use an INT record in the database, where I store the time in Unix format (using the function called time() in PHP)

    For me, it's easier and straight forward, and it's also easy to create the queries.

    Another advice I could give you, if you intend to update those rows a lot, maybe it would be better to use the InnoDB engine instead of the default MyISAM.
    InnoDB is slightly slower at reading from database (but probably not noticeable in your case, 30.000 records is very little) but is far better at updating and inserting records (has row locking and won't lock the whole table while some row gets updated, as happens with MyISAM)
     
    mariush, Sep 14, 2007 IP
  12. Aron Schatz

    Aron Schatz Peon

    Messages:
    201
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #12
    You can also cache MySQL results if you use a database abstraction layer. That works extremely well under high loads when the data is the same.
     
    Aron Schatz, Sep 17, 2007 IP
  13. omgitsfletch

    omgitsfletch Well-Known Member

    Messages:
    1,222
    Likes Received:
    44
    Best Answers:
    0
    Trophy Points:
    145
    #13
    Someone feel free to correct me if I'm wrong, but I believe searching by a certain field will take the same speed regardless of the offset, right?

    In other words:
    ORDER BY date LIMIT 0,10
    ORDER BY date LIMIT 10000,10
    PHP:
    are going to execute at the same speed, because MySQL needs to order EVERY row regardless of the offset to be sure it's ordered properly, right?. So using a date wouldn't matter too much compared to a number.
     
    omgitsfletch, Sep 18, 2007 IP
  14. flippers.be

    flippers.be Peon

    Messages:
    432
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #14
    true, the order statement needs to be executed in both statements and will use a lot of resources

    therefor if you're always going to select on this specific column, make the index/primary key on it clustered so ordering is already done
     
    flippers.be, Sep 19, 2007 IP
  15. nickjag

    nickjag Active Member

    Messages:
    50
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    56
    #15
    Hey Fletch, I would have thought so too until I read the "Beware of Large Limit" section on the MySQL Performance Blog:

    http://www.mysqlperformanceblog.com/2006/09/01/order-by-limit-performance-optimization/

    Sorry I can't enter "live links" yet. I suppose the idea is just that MySQL has to scan more regardless of whether the column is indexed whereas with precomputed numbers, it knows on the indexed column exactly where to start and stop. At least that's what I got out of it :)
     
    nickjag, Sep 19, 2007 IP
  16. Forrest

    Forrest Peon

    Messages:
    500
    Likes Received:
    25
    Best Answers:
    0
    Trophy Points:
    0
    #16
    How often will the data change?
     
    Forrest, Sep 21, 2007 IP
  17. nickjag

    nickjag Active Member

    Messages:
    50
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    56
    #17
    New data would be included about once a week - only about 30 extra records a week. The bulk of the data will be existent before the site is open.
     
    nickjag, Sep 23, 2007 IP