I am exhausted... Sequence of events... The cluster lost one of the blades today for the first time ever (this blade is the 2nd master DB server). The primary master writes everything to the 2nd master, then the 2nd master replicates out to all the slaves (read only access). Something happened with mysqld before, but this time it was a hardware issue. I screw with it remotely as best I can. Once I figure out it's a hardware problem, I go to data center. The blade is on, but not able to be pinged. Okay.... let's just power cycle it. Weird, the power button doesn't work on the blade. Let's try to power cycle a backup (currently unused blade) to see if it's the blade or I'm an idiot and forgot how to use the power button on the computer. Hmmm.. this blade won't turn off either. Okay, let's just try to power cycle the whole bloody array of blades (master power button on the chassis). Wow... the chassis won't even turn off. Neat. Okay, let's just unplug the God damn chassis, reset whatever logic controls the power buttons... {unplug} ......wait..... {plug back in} Power button is amber (meaning we have power, but we haven't turned on the chassis)... good. Let's turn it on. Uh yeah... this power button still doesn't work. Except now all 10 blades are turned off... nifty. {call Dell tech support} Not that I thought it would do much good (it's 11:00 pm now), and it's been over a year, so I'm sure this stuff is out of warranty. What the fuck? Someone knowledgeable is on the phone in about 30 seconds. Another 30 seconds and he concludes I'm not an idiot and the chassis itself has hardware problems (not responding to power buttons on it). Lovely I'm thinking, I'm gonna go to Hawaii for the weekend, so I don't have to deal with website being down for a few days while they try to find parts. So then the dude on the phone tells me he can have a tech there with every part possible for the chassis 2 hours to get it up and running. Which they actually did. He gutted the chassis, replaced a few circuit boards in it and also the main internal housing for good measure. Weeee.... it works! Had to reconfigure some stuff with the chassis (remote control and stuff since it was new hardware). I have figured out that blade8 (the one that went missing earlier) has some sort of issue and is hanging halfway through it's boot sequence... But you know what? I'm tired, it's 4:00 am... we are gonna run on a single DB server for today and I'll come back tomorrow with serial ables so I can watch the boot sequence spew crap on that blade to see what's up. But I do have to say... to have Dell show up at 1:00 am with parts and all that was pretty cool. Can I go home and go to bed (I'm sitting in the data center now) and deal with blade8 tomorrow? k, thanks.
Ah good DP woohoo. I wake up in the morning and think of some new threads then this happens. Now I forgot them =(
Thats awesome for sure . This could be a DELL commercial I am sure they will be pleased to hear such a nice comment from a huge forum owner, The fact that the servers were $85k might have helped as well
Glad to hear all is sorted, but I have always preferred normal rackmount servers over blades... too much of the hype... heating...price to performance etcetc
Thanks Shawn! It is good to know that You and Dell are taking such good care of us.. We can not function with out this place... LOL I am sure glad you got it pinned down, before you scoot off over to Hawaii.. Have a fun time and vacation over on the Islands.. Thanks again.. Boulder