Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SolidQueue crashes if database connection is lost, and takes Puma with it. #512

Open
darinwilson opened this issue Feb 7, 2025 · 2 comments

Comments

@darinwilson
Copy link

darinwilson commented Feb 7, 2025

We're running SolidQueue as a Puma plugin on a Rails 8 app, as our job processing load is currently quite small.

We recently had an incident where the server running Puma temporarily lost the connection to Postgres. This caused SolidQueue to crash with this message:

PQconsumeInput() FATAL:  terminating connection due to administrator command (PG::
ConnectionBad)
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.

and this in turn took down Puma:

Detected Solid Queue has gone away, stopping Puma...
- Gracefully stopping, waiting for requests to finish

I was able to reproduce this locally by shutting down Postgres after starting Rails.

When running Rails without the SolidQueue Puma plugin, if the database goes away, Rails throws an error when it tries to do something with the database, but Puma stays up and the connections recover when the database comes back online.

If I run SolidQueue separately, via bin/jobs, it also crashes if the database goes away.

Obviously SolidQueue can't be expected to do much without a database, but would it be reasonable for it to behave as Rails does when the db goes offline, i.e. pause its activity and reconnect when the db is available again?

Thanks for all your work on this - SolidQueue has been a fantastic addition to Rails!

@rosa
Copy link
Member

rosa commented Feb 10, 2025

Oh, interesting. This happens for the supervisor only, if any of the supervised processes crashes, the supervisor makes sure a new one is started 🤔 I think the supervisor would need some kind of recovery mechanism if the DB fails, but it could also crash for other reasons. I think it makes sense to do this, but I won't have time in the next couple of months at least, so if someone wants to submit a PR doing this, I'll be happy to review.

@darinwilson
Copy link
Author

Thanks for the feedback - that's good to know that it must be something at the supervisor level.

I'll dig into the code a bit, and see if I can find a solution that might work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants