1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

fix a broken mysql replication

Discussion in 'MySQL' started by siddharth007, Nov 27, 2012.

  1. #1
    Hi all.I had set up a mysql master-slave replication and somehow it broke.The master is still up,its the slave side where i need to fix the things up.After some research on the internet i found out that stopping the slave and issuing the command SET GLOBAL sql_slave_skip_counter=N (where N is some number of events) can fix the replication.I have the following questions:

    1 What is the value should i put in place of N. Is it the number of queries which executed on the master since replication broke?? If so,how can i find out how many queries executed on the master since the replication broke?

    2. Will using SET GLOBAL.... and then starting the slave will re-sync the master-slave??


    Note that i am issuing the SET GLOBAL..... command on the slave only after stopping it(stop slave;)
     
    siddharth007, Nov 27, 2012 IP
  2. jestep

    jestep Prominent Member

    Messages:
    3,659
    Likes Received:
    215
    Best Answers:
    19
    Trophy Points:
    330
    #2
    I'll be honest, you shouldn't be doing replication on MySQL if you need to ask in a general MySQL forum. There's too many factors to consider to keep it running if you don't have significant understand on the way the databases connect, the hardware they're on, and specifically how to administer replication or clustering. It's really not going to be worth the headache.
     
    jestep, Nov 27, 2012 IP
  3. siddharth007

    siddharth007 Greenhorn

    Messages:
    23
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    11
    #3
    Dude, i have come here to seek some help regarding replication and not for some kind of a "free advice". I very well understand the requirements of replication.
     
    siddharth007, Nov 27, 2012 IP
  4. jestep

    jestep Prominent Member

    Messages:
    3,659
    Likes Received:
    215
    Best Answers:
    19
    Trophy Points:
    330
    #4
    Sorry. It's nothing to do with free advice or anything. It's that there are many factors in addressing this or any replication problem, and you may be getting a solution but not fixing what caused the replication to break or preventing it from happening again down the road.

    Do you know why the replication broke? How large is the dataset being replicated? When you restarted the slave, did it not automatically catch up to the master binlog position?

    Without knowing exactly what is going on, and size not prohibiting, I would stop both the slave and master. Flush out all the data on the slave. Reset the binlog and then re-import and restart the replication from scratch. This way you are sure that you have data integrity across both data sets. It's a bit more resource consuming since you need to transfer the whole thing, but you don't need to deal with trying to troubleshoot binlog positions or potentially broken data on the slave.
     
    jestep, Nov 28, 2012 IP
  5. siddharth007

    siddharth007 Greenhorn

    Messages:
    23
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    11
    #5
    I agree with your point that restarting replication from scratch is the best available solution.However we have an OLTP system(which will go live in Jan 2013) and considering this even 5 min of downtime can cause us substantial loss in our business.
     
    siddharth007, Nov 28, 2012 IP
  6. codemaster263

    codemaster263 Member

    Messages:
    36
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    41
    #6
    The N stands for the number of errors that you would like the slave to skip. You can find this out by checking your master's error logs. So if there are 5 errors, you can tell the slave to
    SET GLOBAL sql_slave_skip_counter=5, then you can start the slave again. This would ideally resolve your issues and get the replication running again. I advise that where there are multiple errors you should stop the slave, set N to 1 in the above command, and start the slave again. Do this multiple times instead of setting N to the number of events identified. This should be ok if there are 10 or less errors. If there are more I would recommend simply setting up the replication afresh.

    Hope this will help someone.

     
    codemaster263, Jan 22, 2013 IP