Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RUBY-3496] Fix legacy read pool retry error #2878

Merged

Conversation

joelim41
Copy link
Contributor

@joelim41 joelim41 marked this pull request as ready for review June 20, 2024 18:00
Copy link
Contributor

@comandeo-mongo comandeo-mongo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @joelim41 , thank you very much for your contribution! I have just one remark, everything else looks good!

@@ -58,7 +58,8 @@ def retryable_exceptions
Error::ConnectionPerished,
Error::ServerNotUsable,
Error::SocketError,
Error::SocketTimeoutError
Error::SocketTimeoutError,
Error::PoolError,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Error::PoolError,
Error::PoolClearedError,

According to the specification only the PoolClearedError should be retried, not all pool errors. So, having PoolError here seems to be too broad.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@comandeo-mongo Error::ServerNotUsable and Error::ConnectionPerished are listed here as retryable but not in the specification, could you clarify if they are retryable? Asking because we started seeing these errors after upgrading to the latest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@comandeo-mongo I've addressed your comment, please look at it when you have the time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @joelim41 , I am finally looking into it, sorry for the delay. Meanwhile, I would like to clarify - did you start seeing Error::ServerNotUsable and Error::ConnectionPerished only after the upgrade? It might be an indication of some other issues.

Copy link
Contributor

@comandeo-mongo comandeo-mongo Jun 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joelim41 one more "side" question - is there something preventing you from using modern retries? Legacy retry mechanism is deprecates since 2.9.0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@comandeo-mongo this happened after we upgraded from v2.16.0 to v2.20.0, when we experienced some scaling issues in one of our mongoS clusters. I see Error::ServerNotUsable was only introduced in v2.19.0 and Error::ConnectionPerished was made retryable also in v2.19.0 here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just started exploring what it takes to move to modern retries

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I missed that you are upgrading from 2.16. So yes, you weren't able to see those errors. I think it is OK if you see them when a mongos has issues.

As to this PR - I think we should separate the list of the retryable errors for modern and legacy retries. PoolError has a special treatment in modern retries, so I would like to avoid just adding it to retryable_exceptions. I'll make a commit to your PR with the proposed solution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thank you!

@joelim41 joelim41 requested a review from comandeo-mongo June 21, 2024 14:53
Error::SocketTimeoutError
Error::SocketTimeoutError,
Error::PoolClearedError,
Error::PoolPausedError,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PoolPausedError looks retryable too

@comandeo-mongo comandeo-mongo self-assigned this Jun 28, 2024
@comandeo-mongo comandeo-mongo merged commit 3c5dc93 into mongodb:master Jul 4, 2024
304 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants