Vault agent can DOS vault server if template suffers I/O errors #24657

pieter-lautus · 2024-01-03T12:34:40Z

Describe the bug

We recently suffered a self-inflicted DOS attack on our Vault server. Investigation revealed that it was our own vault agents embedded in our own system container images that hammered our server after a permission refactoring on a file the vault agent templates read went wrong.

It seems that in the case of I/O errors while rendering templates, the vault agent does not pause and back off when retrying. If the vault-agent needed to make any Vault API requests to the server as part of rendering the template, that means such API requests are retried in a tight loop with no back-off, leading to a "thundering herd" DOS type scenario.

To Reproduce

On Linux the steps are:

In the first terminal, start a vault dev server: vault server -dev

In a second terminal, launch a tcpdump to watch for traffic to the dev server: sudo tcpdump -i lo dst host 127.0.0.1 and dst port 8200

Write the dev server's root token to a file: echo -n TOKEN > /tmp/.vault-token

Write /tmp/agent-config.hcl as follows:

auto_auth {
    method "token_file" {
        config = {
            token_file_path = "/tmp/.vault-token"
        }
    }
}

template {
    source      = "/tmp/template.ctmpl"
    destination = "/tmp/dest"
}

Write /tmp/template.ctmpl as follows:

{{ with secret "pki/issue/foo" (printf "common_name=%s" (file "/nonexistent")) -}}
{{ .Data.private_key }}
{{ end }}

In a third terminal, launch the vault-agent with export VAULT_ADDR=http://127.0.0.1:8200; vault agent -config /tmp/agent-config.hcl.

The vault agent will spew reams and reams of output as the template fails over and over due to the nonexistent file it is trying to read.

Switch back to the tcpdump terminal. You will see reams and reams of packets flying to the dev server.

As an aside, it is curious that the vault agent communicated with the vault server at all while evaluating the given template. If it was unable to read the nonexistent file, it wouldn't know what "common_name" should be, and wouldn't be in a position to even know what API request to make.

Expected behavior

I would expect the agent to retry with a back-off similar to how the agent retries failing requests to the Vault server.

Environment:

Vault Server Version (retrieve with vault status): 1.15.4
Vault CLI Version (retrieve with vault version): 1.15.4
Server Operating System/Architecture: Linux x86_64

The text was updated successfully, but these errors were encountered:

pieter-lautus · 2024-01-04T03:47:34Z

To spell it out more clearly: this bug causes the vault-agent to sit in a tight loop, consuming 100% CPU, making repeated network connections to the vault server inside that loop.

When we suffered this self-inflicted DOS attack, our vault server was close to unusable. Setting an API limit on the Vault API inside the vault server itself did not really help, we had to do the rate limiting on a firewall level.

pieter-lautus mentioned this issue Jan 4, 2024

Vault agent telemetry should provide metrics on template errors #24670

Open

VioletHynes mentioned this issue Feb 20, 2024

Added exponential backoff #25497

Merged

divyaac closed this as completed in #25497 Feb 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vault agent can DOS vault server if template suffers I/O errors #24657

Vault agent can DOS vault server if template suffers I/O errors #24657

pieter-lautus commented Jan 3, 2024

pieter-lautus commented Jan 4, 2024

Vault agent can DOS vault server if template suffers I/O errors #24657

Vault agent can DOS vault server if template suffers I/O errors #24657

Comments

pieter-lautus commented Jan 3, 2024

pieter-lautus commented Jan 4, 2024