Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vault agent can DOS vault server if template suffers I/O errors #24657

Closed
pieter-lautus opened this issue Jan 3, 2024 · 1 comment · Fixed by #25497
Closed

Vault agent can DOS vault server if template suffers I/O errors #24657

pieter-lautus opened this issue Jan 3, 2024 · 1 comment · Fixed by #25497

Comments

@pieter-lautus
Copy link

Describe the bug

We recently suffered a self-inflicted DOS attack on our Vault server. Investigation revealed that it was our own vault agents embedded in our own system container images that hammered our server after a permission refactoring on a file the vault agent templates read went wrong.

It seems that in the case of I/O errors while rendering templates, the vault agent does not pause and back off when retrying. If the vault-agent needed to make any Vault API requests to the server as part of rendering the template, that means such API requests are retried in a tight loop with no back-off, leading to a "thundering herd" DOS type scenario.

To Reproduce

On Linux the steps are:

In the first terminal, start a vault dev server: vault server -dev

In a second terminal, launch a tcpdump to watch for traffic to the dev server: sudo tcpdump -i lo dst host 127.0.0.1 and dst port 8200

Write the dev server's root token to a file: echo -n TOKEN > /tmp/.vault-token

Write /tmp/agent-config.hcl as follows:

auto_auth {
    method "token_file" {
        config = {
            token_file_path = "/tmp/.vault-token"
        }
    }
}

template {
    source      = "/tmp/template.ctmpl"
    destination = "/tmp/dest"
}

Write /tmp/template.ctmpl as follows:

{{ with secret "pki/issue/foo" (printf "common_name=%s" (file "/nonexistent")) -}}
{{ .Data.private_key }}
{{ end }}

In a third terminal, launch the vault-agent with export VAULT_ADDR=http://127.0.0.1:8200; vault agent -config /tmp/agent-config.hcl.

The vault agent will spew reams and reams of output as the template fails over and over due to the nonexistent file it is trying to read.

Switch back to the tcpdump terminal. You will see reams and reams of packets flying to the dev server.

As an aside, it is curious that the vault agent communicated with the vault server at all while evaluating the given template. If it was unable to read the nonexistent file, it wouldn't know what "common_name" should be, and wouldn't be in a position to even know what API request to make.

Expected behavior

I would expect the agent to retry with a back-off similar to how the agent retries failing requests to the Vault server.

Environment:

  • Vault Server Version (retrieve with vault status): 1.15.4
  • Vault CLI Version (retrieve with vault version): 1.15.4
  • Server Operating System/Architecture: Linux x86_64
@pieter-lautus
Copy link
Author

To spell it out more clearly: this bug causes the vault-agent to sit in a tight loop, consuming 100% CPU, making repeated network connections to the vault server inside that loop.

When we suffered this self-inflicted DOS attack, our vault server was close to unusable. Setting an API limit on the Vault API inside the vault server itself did not really help, we had to do the rate limiting on a firewall level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant