Too many connections errors after switch to PHP-CGI

bohman

Hi! After the migration to PHP-CGI a couple of months ago I've started seeing a lot of errors - especially "Too many connections". Is there anything that can be done to mitigate this? I've attempted to cache the application as much as I can, and a particularly overloaded part was rewritten to be static, but these errors remain. I did not experience them on PHP-FPM.

For reference, this is a Laravel application with roughly 1000-2000 page views per day. I'm seeing traffic drop and several warnings cropping up in search console as well. Is it time for me to consider moving to a VPS instead of a shared instance? PHP-FPM on Nginx might be an option, but since I'm reliant on .htaccess I would need to figure out how to replace that logic first.

Any input greatly appreciated.

Cheers,
Linus

sean

bohman it's very possible that PHP-CGI is contributing to this. Nginx+FPM usually does help, but your concern about htaccess is a valid one. If you can make it work without htaccess then I really recommend FPM.

It's also possible that's not actually your own site causing the problem. If you would like us to look into it then please email support to let us know which site and the date/time/tz that you saw the errors.

kierenp

FWIW I'm also seeing this on a Django site. My monitoring is picking up occasional outages due to too many MySQL connections. My site is very low traffic and (looking at the log traffic) it doesn't seem to be my site that's causing the issue. I'll drop support an email with further details. @bohman It's perhaps worth checking the same before you do any re-engineering.

bohman

Yeah, I'll drop an e-mail to support too. Thanks!

etienneh

Please do update this thread if you find any details, seeing the same issue with a django app.

josearr

Same error here quite often with a Django app since I migrated from Webfaction (not Php issue as the OP).

sean

josearr The database service is shared by all customers regardless of the type of application they're running.

What we think might be happening is:

A botnet or some other bad actor starts hitting common target URLs (like Wordpress xmlrpc.php etc) on a shared server.
Many PHP-CGI processes are spawned, along with their individual database connections
System load rises so processes run slower, which means individual DB connections are open longer, exacerbating the problem
Eventually the system-wide DB connection limit is hit and the problem then affects other apps.

We're working to resolve this as soon as possible. In the interim some possible ways to mitigate the effects of this are:

Minimize your database hits by using whatever caching capabilities your application provides.
Use a private database instance (MariaDB or PostgreSQL) so that you're not subject to a shared systemwide connection limit.

igor

Maybe it would help if wait_timeout on DB server is lowered from 28800 seconds (8 hours) to something more reasonable? As I understand it this way some app can keep open connection for 8 hours without even using it.

sean

igor hmm, will pass that along to the sysadmin.

etienneh

If this is a system-wide issue, maybe some temporary fixes can be:

provide a quick installer for a private instance
provide a migration script from/to a private instance?

Just suggesting because it's a frustrating experience to have your application limited like this, but because any effort here is likely temporary and of course running a custom db counts against application/memory limits.

... and doing this for multiple apps will have quite the overhead ;P

sean

etienneh great suggestions, thanks! and I hear on you the frustration.

igor

@sean Are you maybe creating backups or some other DB maintenance at about 22:00-22:20 UTC? 🙂

I just witnessed active connections skyrocket from 180 to 250 (and too many connections) in two seconds, after that it plummeted to around 20 connections and stayed there. I assume it started at around 20-40 and pushed to 250 really fast. Seems to happen at approximately the same time every day.

etienneh

igor can corroborate that my monitoring alert fires at 2PM PST (UTC-8) consistently.

sean

igor etienneh yep, earlier today we determined that the backups seem to line up with most of these problems. We're looking into ways to reduce the impact that the backups have.

etienneh

sean any news? :-)
Still getting daily crash reports and downtime alerts

sean

etienneh still working on it. It's likely we'll have to stop backing up the largest DBs, will notify the affected customers if it comes to that.

etienneh

sean would it be possible to backup with a jitter? Each new dB gets a random timeslot in which the backups occur?

etienneh

sean Any news? 🙂

sean

etienneh I don't think the current backup setup can do that, but I'll pass the suggestion along.

etienneh

And thoughts on providing a quick installer for a private instance or a migration script from/to a private instance?

Still getting daily crashes and downtime alerts 😉

etienneh

Actually @sean I tried to do the migration myself and I'm still getting errors on the private instance. So something must be wrong that's not adding up?

sean

etienneh sorry, no migration script or installer available yet. We're still working on improving the backups.

System load was not high during the backups today (a few minutes ago) so your own private instance should not have been affected by that. Feel free to email support with the details about your private instance whatever apps are using it so we can look into it further.