URGENT: 502 Bad Gateway in Opal6 (again)

stefan

Opal6 seems to throw 502 errors again. This keeps happening every several months. Are the other servers behaving differently? I might consider moving away from Opal6.

Also, what is being done by the team to catch such errors? I've never seen any status update on 502 errors at: https://status.opalstack.com/

Stefan

DavidUrban

@sean I know you guys are probably asleep right now, and I hope the new (and fixed) monitoring will notify you. But we had this discussion few months ago and I am having very hard time explaining to my clients, why their service is unavailable so soon after the last problem. I don't see it with any other opal server we host on. What is the difference? Can I migrate out of opal6?

mattez

YES, YES, URGENT, very URGENT. We have planed meetings with clients for today. Other clients calling us: what's the trouble? And Opalstack STATUS is ALL GREEN? What a joke.

Same issue: https://community.opalstack.com/d/602-502-bad-gateway-in-opal6-again
Related older issue: https://community.opalstack.com/d/266-502-bad-gateway-in-opal6

mattez

I would be willing to pay extra for example Managed Virtual Private Server. But is this a solution? Is there any guarantee that it will not happen there also? Do any of you have experience with Opalstack Managed Virtual Private Servers?

madine

DavidUrban Completely agree. It would be one thing if we were hosting hobby sites and personal side projects on a super-cheap shared server but when we're paying hundreds per year, we should be able to run commercial sites for ourselves and clients and have an expectation of uptime and support.

mattez We started on the lowest-tier VPS but 'upgraded' to the shared hosting on opal6 as the CPU performance wasn't enough to host anything much more than static HTML. The lowest comparable VPS in terms of performance would be the $40/month

wweb

That's really really bad, we have already received several reports from clients and users of websites being down. I just hope this won't take as long as last time to be fixed 😨

OrangyTang

Yes, seeing the same thing here - Opal6 hosting, PHP and SVN apps are down, only static nginx apps are working. Same 506 Bad Gateway errors.

Jorgeoa

Same problem here with php sites.
I have a rails application in opal6 served with a "custom" nginx server and that works well, so the problem seems to be related to the frontend nginx.
I can't find any response from support... 🙁

DavidUrban

Well, it is 07:00 in Wyoming and from the lack of reaction on any channels I assume they are indeed still sleeping with the monitoring system experiencing similar failure as before.

I do believe there are great people in OpalStack. And I believe they do their level best to make sure the servers are running smoothly. Problems however happen and cannot be avoided and after this experience I think we should demand 24/7 support or at least life monitoring. If this was only an isolated incident I would never consider such harsh action. But the problem has not been resolved despite numerous assurances. And we will be loosing clients and therefore OpalStack will loose clients too. Nobody wants to do a night shift. But we all do it when it is needed and this is one such case. If only to have someone monitor support channels and wake up the engineers once a month when a crisis hits.

I hope someone wakes up soon, clients are not happy and it is so frustrating not to be able to help them.

wweb

Looks like it's back online. 😃
Still I strongly agree with DavidUrban and I think we should really discuss the 24/7 support issue.

DavidUrban

Yep, we are back online. Thanks OpalCrew! I’m sure someone will touch base with us soon. What I value the most on Oplastack is how responsive they are here on forum and how open they are to discussions!

sean

Hi all, apologies for the trouble. Opal6 Apache is back online now.

The downtime was caused by Apache running out of available semaphores. We'll be adjusting our monitoring system to check for this condition and correct it automatically when it occurs.

We're hoping to have EU staff on board within the next 2-3 months which will eliminate the largest gap in our support coverage.

DavidUrban

sean thank you for letting us know. That is a very good message indeed. In case you needed help with recruitment here in Czechia, please feel free to reach out.

wweb

sean How about creating a simple application on the server and monitoring the status code with something like uptime robot? (In another topic someone mentioned a similar service) I know it's not a failproof solution, but in both this and the November incidents that would have alerted you in real time. At least as a temporary patch to mitigate issue on shared stacks that would be easy to implement and give some peace of mind to many users!

sean

sean We'll be adjusting our monitoring system to check for this condition and correct it automatically when it occurs.

done.

sean

wweb we're already using a monitoring service as you've described, the larger issue is support coverage to ensure that people are available to handle the monitoring alerts when they happen.