1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

Bug Error 502 Ray ID

Discussion in 'Support & Feedback' started by Karen May Jones, May 27, 2015.

  1. #1
    Not sure if you need to know about this error I rcvd a few mins ago, but here it is. It said my browser OK - someplace in Dallas - OK --- then some kind of cloudflare thing/cache - not OK. alright, thats all for now.

    Error 502 Ray ID: 1ed68d07b7c2115f • 2015-05-28 02:32:51 UTC
    Bad gateway
     
    Solved! View solution.
    Karen May Jones, May 27, 2015 IP
  2. Nigel Lew

    Nigel Lew Notable Member

    Messages:
    4,642
    Likes Received:
    405
    Best Answers:
    21
    Trophy Points:
    295
    #2
    I cant leave people a message. Its a problem. Literally/
    The following error occurred
    The server responded with an error. The error message is in the JavaScript console. It took an hour to drop this one
     
    Nigel Lew, May 27, 2015 IP
  3. digitalpoint

    digitalpoint Overlord of no one Staff

    Messages:
    38,333
    Likes Received:
    2,613
    Best Answers:
    462
    Trophy Points:
    710
    Digital Goods:
    29
    #3
    Yeah, there's something funky going on with some PHP-FPM segfaulting for some reason once in awhile. Still trying to debug it, but I think it might be something in the new version of PHP, so might just roll back to an older version.
     
    digitalpoint, May 27, 2015 IP
  4. digitalpoint

    digitalpoint Overlord of no one Staff

    Messages:
    38,333
    Likes Received:
    2,613
    Best Answers:
    462
    Trophy Points:
    710
    Digital Goods:
    29
    #4
    I think it was related to the issue last week where PHP processes were trying to allocate close to 2GB of memory for nonsensical tasks and just wrecking the servers (you get 1,000 concurrent processes each trying to allocate 2GB of memory and you have a problem... lol)

    Anyway, it seemed to start with PHP 5.6.9, and I've rolled back all servers to 5.6.6 now. So hopefully whatever it was will stop.
     
    digitalpoint, May 28, 2015 IP
  5. Nigel Lew

    Nigel Lew Notable Member

    Messages:
    4,642
    Likes Received:
    405
    Best Answers:
    21
    Trophy Points:
    295
    #5
    Geez. What a chore. Thanks for getting it back up.
    Nigel
     
    Nigel Lew, May 28, 2015 IP
  6. #6
    That wasn't even (directly) the biggest chore. lol PHP-FPM going crazy with memory allocation requests caused some memory corruption on the servers (in theory it shouldn't even be possible, but somehow it did).

    That caused one of the database cluster data nodes to fail. Which wasn't an issue for end users since they are redundant. But when bringing that data node back online, 2 more data nodes failed with the same sort of memory corruption issues. Also not a problem since there's enough redundancy to handle 3 failed servers at the same time. But then when bringing those 3 data nodes online, 2 more failed... meaning 5 database cluster servers went offline concurrently with the same sort of memory corruption issues. And 5 servers going down then fails the entire database cluster because there isn't enough online servers for a complete set of data.

    Brought entire cluster back online fairly quickly, but when all servers came back online concurrently, they got internally confused I think about who was the nominated "president" (1 server is always nominated the one that makes the decisions about stuff). For really no good reason it seemed like multiple data nodes were trying to be authoritative, but there wasn't a way to tell WHICH servers were the problem ones. So then started doing rolling restarts of the data node processes, which takes about 90 minutes per data node because we have such a massive amount of data in our databases.

    Normally you can do rolling restarts without end users noticing, but with multiple "presidents" it was causing funky issues like certain tables being unavailable (like threads) depending no the record IDs you were getting.

    Got enough of the data nodes restarted (90 minutes per) to form at least 1 whole set of data. So now we are back online. There are still more redundant data nodes in the process of coming online, but at least we have a whole (working) set of data online now... so site is back online.

    So yeah... super fun morning. lol
     
    digitalpoint, May 28, 2015 IP
    Karen May Jones likes this.
  7. digitalpoint

    digitalpoint Overlord of no one Staff

    Messages:
    38,333
    Likes Received:
    2,613
    Best Answers:
    462
    Trophy Points:
    710
    Digital Goods:
    29
    #7
    The 3 remaining data nodes (for redundancy) should be coming online fully shortly. I'm going to be super pissed off if they don't come online properly. haha

    ndb_mgm> show
    Cluster Configuration
    ---------------------
    [ndbd(NDB)]    8 node(s)
    id=11    @192.168.10.20  (mysql-5.6.24 ndb-7.3.9, Nodegroup: 0, *)
    id=12    @192.168.10.21  (mysql-5.6.24 ndb-7.3.9, Nodegroup: 1)
    id=13    @192.168.10.22  (mysql-5.6.24 ndb-7.3.9, Nodegroup: 2)
    id=14    @192.168.10.23  (mysql-5.6.24 ndb-7.3.9, Nodegroup: 3)
    id=15    @192.168.10.24  (mysql-5.6.24 ndb-7.3.9, Nodegroup: 0)
    id=16    @192.168.10.25  (mysql-5.6.24 ndb-7.3.9, starting, Nodegroup: 0)
    id=17    @192.168.10.26  (mysql-5.6.24 ndb-7.3.9, starting, Nodegroup: 0)
    id=18    @192.168.10.27  (mysql-5.6.24 ndb-7.3.9, starting, Nodegroup: 0)
    Code (markup):
     
    digitalpoint, May 28, 2015 IP