Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VAULT-28477 Bootstrap and persist autopilot versions #28186

Merged
merged 9 commits into from
Aug 30, 2024

Conversation

miagilepner
Copy link
Contributor

@miagilepner miagilepner commented Aug 26, 2024

Description

When setting up Raft and autopilot, the leader node will read a storage entry (core/raft/autopilot/states), which has a map of server IDs to upgrade and sdk versions. Autopilot will store this map in memory in a structure called persistedStates. If the storage entry does not exist, an error is logged but operation continues.

Whenever the autopilot library calls NotifyStates to inform the vault delegate that a state has changed, the leader will check to see if the cluster membership differs from what is in the persistedStates map, or if the upgrade version or sdk version differs from the persistedStates map. If either of these conditions is true, persistedStates is updated and written to storage at the path core/raft/autopilot/states.

core/raft/autopilot/states is not replicated to performance or DR secondaries.

New nodes joining a cluster will include their sdk version and upgrade version when they answer the Raft bootstrap challenge. When the active node receives this answer, the versions will be stored (along with the other server state) in the follower states map.

When autopilot routinely calls KnownServers to get information about the nodes in the cluster, the leader will:

  • Check for follower versions in the follower states map. Use these versions if they exist. Otherwise,
  • Check for versions in the persisted states map. Use these versions if they exist. Otherwise,
  • Check if the persisted states map is empty, indicating a new upgrade to 1.18. Use the leader versions if this is true. Otherwise,
  • Return empty versions for these servers.

The persisted states need to exist in order to ensure that a new leader doesn’t demote existing voters if their heartbeat is late. If the persisted states weren’t available, a new leader wouldn’t have any knowledge of the other node’s versions until the heartbeats happened.

Ent PR: https://github.com/hashicorp/vault-enterprise/pull/6351
Doc: https://docs.google.com/document/d/10MY9U-r8dH46-ICIdrObEjVfHKwQxLzqzh33YWWnYrQ/edit

TODO only if you're a HashiCorp employee

  • Backport Labels: If this PR is in the ENT repo and needs to be backported, backport
    to N, N-1, and N-2, using the backport/ent/x.x.x+ent labels. If this PR is in the CE repo, you should only backport to N, using the backport/x.x.x label, not the enterprise labels.
    • If this fixes a critical security vulnerability or severity 1 bug, it will also need to be backported to the current LTS versions of Vault. To ensure this, use all available enterprise labels.
  • ENT Breakage: If this PR either 1) removes a public function OR 2) changes the signature
    of a public function, even if that change is in a CE file, double check that
    applying the patch for this PR to the ENT repo and running tests doesn't
    break any tests. Sometimes ENT only tests rely on public functions in CE
    files.
  • Jira: If this change has an associated Jira, it's referenced either
    in the PR description, commit message, or branch name.
  • RFC: If this change has an associated RFC, please link it in the description.
  • ENT PR: If this change has an associated ENT PR, please link it in the
    description. Also, make sure the changelog is in this PR, not in your ENT PR.

@github-actions github-actions bot added the hashicorp-contributed-pr If the PR is HashiCorp (i.e. not-community) contributed label Aug 26, 2024
@miagilepner miagilepner added this to the 1.18.0-rc milestone Aug 26, 2024
Copy link

github-actions bot commented Aug 26, 2024

CI Results:
All Go tests succeeded! ✅

Copy link

github-actions bot commented Aug 26, 2024

Build Results:
All builds succeeded! ✅

Copy link
Collaborator

@raskchanky raskchanky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome job! I had a few questions and comments but I don't think anything showstopping.

}
if upgradeVersion == "" {
upgradeVersion = d.upgradeVersion
d.logger.Debug("no persisted state, using leader upgrade version version", "id", id, "upgrade_version", d.effectiveSDKVersion)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
d.logger.Debug("no persisted state, using leader upgrade version version", "id", id, "upgrade_version", d.effectiveSDKVersion)
d.logger.Debug("no persisted state, using leader upgrade version version", "id", id, "upgrade_version", d.upgradeVersion)

@miagilepner miagilepner enabled auto-merge (squash) August 30, 2024 08:15
@miagilepner miagilepner merged commit b5621aa into main Aug 30, 2024
83 checks passed
@miagilepner miagilepner deleted the miagilepner/VAULT-28477-bootstrap-ap-version branch August 30, 2024 08:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hashicorp-contributed-pr If the PR is HashiCorp (i.e. not-community) contributed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants