-
Notifications
You must be signed in to change notification settings - Fork 533
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RUBY-3496] Fix legacy read pool retry error #2878
[RUBY-3496] Fix legacy read pool retry error #2878
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @joelim41 , thank you very much for your contribution! I have just one remark, everything else looks good!
lib/mongo/retryable/base_worker.rb
Outdated
@@ -58,7 +58,8 @@ def retryable_exceptions | |||
Error::ConnectionPerished, | |||
Error::ServerNotUsable, | |||
Error::SocketError, | |||
Error::SocketTimeoutError | |||
Error::SocketTimeoutError, | |||
Error::PoolError, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error::PoolError, | |
Error::PoolClearedError, |
According to the specification only the PoolClearedError
should be retried, not all pool errors. So, having PoolError
here seems to be too broad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@comandeo-mongo Error::ServerNotUsable
and Error::ConnectionPerished
are listed here as retryable but not in the specification, could you clarify if they are retryable? Asking because we started seeing these errors after upgrading to the latest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@comandeo-mongo I've addressed your comment, please look at it when you have the time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @joelim41 , I am finally looking into it, sorry for the delay. Meanwhile, I would like to clarify - did you start seeing Error::ServerNotUsable
and Error::ConnectionPerished
only after the upgrade? It might be an indication of some other issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joelim41 one more "side" question - is there something preventing you from using modern retries? Legacy retry mechanism is deprecates since 2.9.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@comandeo-mongo this happened after we upgraded from v2.16.0
to v2.20.0
, when we experienced some scaling issues in one of our mongoS clusters. I see Error::ServerNotUsable
was only introduced in v2.19.0
and Error::ConnectionPerished
was made retryable also in v2.19.0
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We just started exploring what it takes to move to modern retries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I missed that you are upgrading from 2.16. So yes, you weren't able to see those errors. I think it is OK if you see them when a mongos has issues.
As to this PR - I think we should separate the list of the retryable errors for modern and legacy retries. PoolError
has a special treatment in modern retries, so I would like to avoid just adding it to retryable_exceptions
. I'll make a commit to your PR with the proposed solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thank you!
Co-authored-by: Dmitry Rybakov <[email protected]>
Error::SocketTimeoutError | ||
Error::SocketTimeoutError, | ||
Error::PoolClearedError, | ||
Error::PoolPausedError, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PoolPausedError looks retryable too
RUBY-3496