-
Notifications
You must be signed in to change notification settings - Fork 9.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Raft: Force kill a node during snapshot retrieval goes into limbo #10225
Comments
Some more debugging. Looks like the leader stores in memory that the last action to the follower was that it has to apply the snapshot and it pauses any probing until that action has succeeded I think this could be made more robust, by having the leader retry on a node restart.
|
How is the leader supposed to know that the follower has restarted? In CockroachDB, snapshots are pushed from the leader over TCP and this allows the leader to detect failure and call ReportSnapshot appropriately. If you're sending snapshots via a pull process by the follower, I think your leader should have a timeout and call ReportSnapshot if the pull is not started in time (or slows/stops once it is running). Either way, this is mainly a job for the application-provided transport code; I don't see a great way for the raft state machine to handle this on its own (Resetting from ProgressStateSnapshot to ProgressStateProbe after some number of ticks would help keep the node from getting stuck forever, but the ideal value would be application-dependent and would be best integrated with the transport. |
Thanks for the explanation, @bdarnell . The purpose of ReportSnapshot wasn't clear before, due to lack of documentation around the call. I've created a PR to that effect. For the fix, I made the snapshot pull bi-directional, so the follower can send an ACK back to the leader. If ACK fails, the leader will report snapshot failure and resume the log probing. That fixed the issue. P.S. Congrats on launching CockroachDB as a service. Great work! |
@manishrjain As an idea. What about to implement a snapshot logic in a bit different way. Raft snapshot could contain only meta information about your backend storage. So leader sends to the follower a very tiny snapshot and runs Also it could be interesting for you to check out this issue #10219 |
That's close to what we're already doing: Re: the report immediately idea. Reporting success doesn't do anything. The leader would still block sending any updates until the follower has advanced their state. And the follower should really only advance its raft state if it has gotten the snapshot -- we don't want this lagging follower to artificially advance its state (without having the actual data) and end up becoming a leader. |
@manishrjain According to the bi-directional communication. I don't know why, but Didn't have time to understand this part. With P.S. Thank you for Badger ;) |
I'm seeing this strange issue, where a Raft node replica which has fallen behind, gets a snapshot from the leader (
!raft.IsEmptySnapshot(rd.Snapshot)
). In the case of Dgraph, this retrieval can take some time. During this time, if the node gets force killed, when it restarts, irrespective of the state of the write-ahead log, the leader never tries to bring it back to the latest state. It doesn't get the snapshot again or get any following updates.When a node gets force killed, there's no chance to call
ReportSnapshot(SnapshotFailure)
. But, I expect that on a restart, the leader would compare the node's state against its own, and then start streaming the (at that time) latest snapshot and the proposals since then. That doesn't seem to happen at all. See attached logs:Now bank-dg3 is in a limbo. No updates are happening, it's just stuck waiting for updates,
raft.Ready
isn't called.Any ideas what is going on?
P.S. This is related to: hypermodeinc/dgraph#2698
P.P.S. The sleeping for 15s is artificially induced to help narrow this down. I can reproduce the limbo reliably.
The text was updated successfully, but these errors were encountered: