Bug - Error 502 Ray ID

Karen May Jones Prominent Member

Messages:

3,469

Likes Received:

290

Best Answers:

1

Trophy Points:

380

#1

Not sure if you need to know about this error I rcvd a few mins ago, but here it is. It said my browser OK - someplace in Dallas - OK --- then some kind of cloudflare thing/cache - not OK. alright, thats all for now.

Error 502 Ray ID: 1ed68d07b7c2115f • 2015-05-28 02:32:51 UTC
Bad gateway

Solved! View solution.

Karen May Jones, May 27, 2015 IP
Nigel Lew Notable Member

Messages:

4,642

Likes Received:

406

Best Answers:

21

Trophy Points:

295

#2

I cant leave people a message. Its a problem. Literally/
The following error occurred
The server responded with an error. The error message is in the JavaScript console. It took an hour to drop this one

Nigel Lew, May 27, 2015 IP
digitalpoint Overlord of no one Staff

Messages:

38,334

Likes Received:

2,613

Best Answers:

462

Trophy Points:

710

Digital Goods:

29

#3

Yeah, there's something funky going on with some PHP-FPM segfaulting for some reason once in awhile. Still trying to debug it, but I think it might be something in the new version of PHP, so might just roll back to an older version.

If you contact me privately for support, I'll direct you to the correct support forum. Save time and go there first.
Ingress Intel

digitalpoint, May 27, 2015 IP
digitalpoint Overlord of no one Staff

Messages:

38,334

Likes Received:

2,613

Best Answers:

462

Trophy Points:

710

Digital Goods:

29

#4

I think it was related to the issue last week where PHP processes were trying to allocate close to 2GB of memory for nonsensical tasks and just wrecking the servers (you get 1,000 concurrent processes each trying to allocate 2GB of memory and you have a problem... lol)

Anyway, it seemed to start with PHP 5.6.9, and I've rolled back all servers to 5.6.6 now. So hopefully whatever it was will stop.

If you contact me privately for support, I'll direct you to the correct support forum. Save time and go there first.
Ingress Intel

digitalpoint, May 28, 2015 IP
Nigel Lew Notable Member

Messages:

4,642

Likes Received:

406

Best Answers:

21

Trophy Points:

295

#5

Geez. What a chore. Thanks for getting it back up.
Nigel

Nigel Lew, May 28, 2015 IP
digitalpoint Overlord of no one Staff Best Answer

Messages:

38,334

Likes Received:

2,613

Best Answers:

462

Trophy Points:

710

Digital Goods:

29

#6

Nigel Lew said: ↑

Geez. What a chore. Thanks for getting it back up.
Click to expand...

That wasn't even (directly) the biggest chore. lol PHP-FPM going crazy with memory allocation requests caused some memory corruption on the servers (in theory it shouldn't even be possible, but somehow it did).

That caused one of the database cluster data nodes to fail. Which wasn't an issue for end users since they are redundant. But when bringing that data node back online, 2 more data nodes failed with the same sort of memory corruption issues. Also not a problem since there's enough redundancy to handle 3 failed servers at the same time. But then when bringing those 3 data nodes online, 2 more failed... meaning 5 database cluster servers went offline concurrently with the same sort of memory corruption issues. And 5 servers going down then fails the entire database cluster because there isn't enough online servers for a complete set of data.

Brought entire cluster back online fairly quickly, but when all servers came back online concurrently, they got internally confused I think about who was the nominated "president" (1 server is always nominated the one that makes the decisions about stuff). For really no good reason it seemed like multiple data nodes were trying to be authoritative, but there wasn't a way to tell WHICH servers were the problem ones. So then started doing rolling restarts of the data node processes, which takes about 90 minutes per data node because we have such a massive amount of data in our databases.

Normally you can do rolling restarts without end users noticing, but with multiple "presidents" it was causing funky issues like certain tables being unavailable (like threads) depending no the record IDs you were getting.

Got enough of the data nodes restarted (90 minutes per) to form at least 1 whole set of data. So now we are back online. There are still more redundant data nodes in the process of coming online, but at least we have a whole (working) set of data online now... so site is back online.

So yeah... super fun morning. lol

If you contact me privately for support, I'll direct you to the correct support forum. Save time and go there first.
Ingress Intel

digitalpoint, May 28, 2015 IP

Karen May Jones likes this.

digitalpoint Overlord of no one Staff

Messages:: 38,334

Likes Received:: 2,613

Best Answers:: 462

Trophy Points:: 710

Digital Goods:: 29

The 3 remaining data nodes (for redundancy) should be coming online fully shortly. I'm going to be super pissed off if they don't come online properly. haha

ndb_mgm> show
Cluster Configuration
---------------------
[ndbd(NDB)]    8 node(s)
id=11    @192.168.10.20  (mysql-5.6.24 ndb-7.3.9, Nodegroup: 0, *)
id=12    @192.168.10.21  (mysql-5.6.24 ndb-7.3.9, Nodegroup: 1)
id=13    @192.168.10.22  (mysql-5.6.24 ndb-7.3.9, Nodegroup: 2)
id=14    @192.168.10.23  (mysql-5.6.24 ndb-7.3.9, Nodegroup: 3)
id=15    @192.168.10.24  (mysql-5.6.24 ndb-7.3.9, Nodegroup: 0)
id=16    @192.168.10.25  (mysql-5.6.24 ndb-7.3.9, starting, Nodegroup: 0)
id=17    @192.168.10.26  (mysql-5.6.24 ndb-7.3.9, starting, Nodegroup: 0)
id=18    @192.168.10.27  (mysql-5.6.24 ndb-7.3.9, starting, Nodegroup: 0)

Code (markup):

digitalpoint, May 28, 2015 IP

Log in or Sign up

Bug Error 502 Ray ID

Karen May Jones Prominent Member

Nigel Lew Notable Member

digitalpoint Overlord of no one Staff

digitalpoint Overlord of no one Staff

Nigel Lew Notable Member

digitalpoint Overlord of no one Staff Best Answer

digitalpoint Overlord of no one Staff

Log in or Sign up

Bug Error 502 Ray ID

Karen May Jones Prominent Member

Nigel Lew Notable Member

digitalpoint Overlord of no one Staff

digitalpoint Overlord of no one Staff

Nigel Lew Notable Member

digitalpoint Overlord of no one Staff Best Answer

digitalpoint Overlord of no one Staff

Useful Searches