1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

MySQL: Storing Sessions in DB vs File Based

Discussion in 'MySQL' started by turiel, May 21, 2007.

  1. #1
    So, I'm in the middle of designing my application and am trying to decide between two approaches for managing sessions:

    a) Store them in a NFS directory accessible to all webservers.
    b) Store them in the DB.

    Obviously, its a load balanced configuration, with multiple webservers and multiple database servers.

    Storing them in the DB seems to be the 'standard' way to go but I've googled and I just can't find any performance benchmarks comparing the two methods.

    The session table is going to be hit with an INSERT/UPDATE every single time a page is loaded. If I have tens or hundreds of thousands of people browsing the site... thats an enormous strain on the DB server and thats before it even starts doing queries for the actual site.

    Anyone got any real-world experience to share on this?
     
    turiel, May 21, 2007 IP
  2. WillDee

    WillDee Peon

    Messages:
    68
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #2
    I guess it depends on how you have your web server load balancing configured. If it's set to single affinity, using file based session storage will be the better option, in my opinion, as users will (should) always hit the same web server. However, if you have no affinity set, then users will hit a different web server each time, and in this case database storage is most likely the best option.

    In our case, we had to go to no affinity, so developed the application to use db session storage. You're right, the database does take a hit, but it was far easier (and less expensive) to simply beef up the db server hardware to allow for the extra hits and processing required. We traded off additional cost of RAM and a second processor against significant redevelopment to refactor the app, and decided that adding memory and cpu were cheaper.

    In terms of actual performance, we found that adding two stored procedures to the db (get and set) didn't degrade performance too much. It did affect the web app in that it added a couple of milliseconds to each pages generation time, but we could live with it, as could our customers.

    Will.
     
    WillDee, May 26, 2007 IP
  3. Kalyse

    Kalyse Peon

    Messages:
    1,221
    Likes Received:
    24
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Can you give me some idea of your projects.
    I am looking to do some large scale systems shortly. Well Im in the middle of designs, although I havent ever done multi server coder.

    Its going to be something large, multiple mySQL servers, multiple web servers.

    I havent decided how the user will be directed to a server? DNS Roundrobin? Or direct to a master server then forked off.

    How do you manage it for ULTIMATE effeciency. I dont care about time and money, they arent an issue at the moment. Good and scalable code is.
     
    Kalyse, May 26, 2007 IP
  4. turiel

    turiel Peon

    Messages:
    148
    Likes Received:
    11
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Not really ;)

    DNS Roundrobin tends to be quite a bad idea for most scenarios because of DNS caching on the client side. Take for example these scenarios:
    * A webserver fails, you need to remove it from the pool. The client still has the IP address cached, and it continues to hit an non existant server.
    * Your server load is high, you need to add more webservers. You add another one in, but as before, clients continue to hit the same servers until their DNS cache expires.
    * Their DNS cache expires while they're in the middle of some data post process. Suddenly, they're hitting a different webserver.

    You need to consider something like "HAProxy" or "pound", which are open source load balancing applications. HAProxy in particular is what I'm using, it automatically manages server availability, has server affinity (sends requests back to the same webserver that the client originally connected from), and some nice reporting and graphing tools.

    For MySQL, you might need to consider a different approach. I haven't fully decided on one myself at the moment but one i'm looking in to is SQLRelay (http://sqlrelay.sourceforge.net). Thats a system that takes care of connection pooling but I'm not sure its right for our needs yet.

    There's a presentation that I found really really useful. Its how Flickr do their Linux/Apache/MySQL/PHP load balancing:
    www.ludicorp.com/flickr/zend-talk.ppt

    WillDee, thanks for your input. We do have server affinity through HAProxy as described above. So because 99.9% of the time the user will be hitting the same machine (the exception is if we take the machine out of the pool), do we even need an NFS directory shared between all webservers to store the sessions? What about that 0.1% of the time where the user doesn't hit the same webserver... I wonder if it will have any major effects.
     
    turiel, May 28, 2007 IP
  5. Kalyse

    Kalyse Peon

    Messages:
    1,221
    Likes Received:
    24
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Nice post turiel. Oneof the most useful posts Ive read in a long time.

    This is what Im trying and i think you may have answered part of my questions.

    Take a look.
    http://forums.digitalpoint.com/showthread.php?t=345555

    Howmany minimum servers do you require for your distribution, and is HProxy distribution or balancing? Nevermind, I realised Google exists...
     
    Kalyse, May 28, 2007 IP
  6. Kalyse

    Kalyse Peon

    Messages:
    1,221
    Likes Received:
    24
    Best Answers:
    0
    Trophy Points:
    0
    #6
    Kalyse, May 28, 2007 IP
  7. turiel

    turiel Peon

    Messages:
    148
    Likes Received:
    11
    Best Answers:
    0
    Trophy Points:
    0
    #7
    No its just saying it has support for SSL as well as plaintext HTTP. HAProxy does not have native SSL support although there are workarounds to enable it.

    There's no minimum servers, in fact my test box has HAProxy on one IP and the webserver on the other IP. It can be configured with a number of algorithms, so either distribution or balancing.

    As to your other thread, basically your load balancing needs are completely dependant on how you're using the machines. The simple answer is 'when the load on your machine gets high'. In reality, you try optimise your server and your application to bring down the load before you start load balancing. But we'll say thats all done and you can't optimise your code any more. The first thing to do is seperate the DB and Web servers. The processor switching between Apache and MySQL will result in quite an overhead that will slow your machine down. Seperating the two into different machines will improve performance, not only because there's a new machine to work with. After that... you need to identify where your bottleneck is at. If the load is still high on the mysql server, you need to loadbalance mysql. Same for the apache server.
     
    turiel, May 28, 2007 IP
  8. Kalyse

    Kalyse Peon

    Messages:
    1,221
    Likes Received:
    24
    Best Answers:
    0
    Trophy Points:
    0
    #8
    What did you decide to do by the way, for storing your sessions? DB or on each server?

    I think you have your answer anyway, you dont really need to store it on the databse, because HAProxy can work with distribution but also make sure the user is at the same server by using cookie control? I think thats right?
    I havent read the documentation from top to bottom and I may have goten one mixed up.

    It sounds you know what you are doing anyway which is good, I think just to design the best way for my application.
     
    Kalyse, May 28, 2007 IP
  9. turiel

    turiel Peon

    Messages:
    148
    Likes Received:
    11
    Best Answers:
    0
    Trophy Points:
    0
    #9
    I'm going to try the keeping it on the server implementation, I won't be able to test properly until we're further online in the development process.
     
    turiel, May 28, 2007 IP
  10. Kalyse

    Kalyse Peon

    Messages:
    1,221
    Likes Received:
    24
    Best Answers:
    0
    Trophy Points:
    0
    #10
    I think POUND sounds quite good.. there are different types of session control with that, all seem to work well. Or was that another system, I have been reading too much and may be getting confused.

    Have you decided how you are doing your mysql database management? ie. Reads to one database? Writes to another? Or.. some kind of balancing and high availability?
     
    Kalyse, May 28, 2007 IP
  11. turiel

    turiel Peon

    Messages:
    148
    Likes Received:
    11
    Best Answers:
    0
    Trophy Points:
    0
    #11
    Pound has no server affinity as far as I'm aware, which was the reason I'm not going to use it.

    Writes are going to go to the Master database, reads are going to go to a relay IP which will distribute the load among a number of replicated slaves.
     
    turiel, May 29, 2007 IP
  12. Kalyse

    Kalyse Peon

    Messages:
    1,221
    Likes Received:
    24
    Best Answers:
    0
    Trophy Points:
    0
    #12
    I dont quite understand the whole mysql balancing. I mean isnt the write database required to replicate its data right away to all the slaves? Isnt that going to be one huge CPU drain?

    How do you manage it to you dont lose consistency.
     
    Kalyse, May 29, 2007 IP
  13. turiel

    turiel Peon

    Messages:
    148
    Likes Received:
    11
    Best Answers:
    0
    Trophy Points:
    0
    #13
    Yes it replicates it straight away to the slaves, but it does so in a much more efficient manner than queries - binary logs. And basically, read queries are more intensive and much more numerous than write queries (usually). If your Master server is struggling with CPU load from writes, then there's no way your slaves are going to manage. Consistency is kept because the replication happens very quickly.
     
    turiel, May 30, 2007 IP