-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lot of sockets left in CLOSE_WAIT state #5872
Comments
Does this make a difference? |
Also, can you comment on the IP addresses on the other end? Are they random clients, do they correspond to other Vault servers, etc.? |
Can you take a look at Vault server logs/client logs and see if you're seeing errors? e.g. are you seeing any panics? |
Sorry, I mislead you, my mistake. The problem is not related to a vault parameter in the configuration file. I just described what I did to face the error. As you can see at the vault configuration level we are using DynamoDB backend storage with the following DynamoDB stanza parameters:
On AWS DynamoDB table configuration side read/write capacity was set as follow:
Basically, vault read/write capacity was set to DynamoDB provisioned read/write capacity. But as we have 3 nodes configured, every node configured with a read/write capacity of 5 we actually exceed the DynamoDB table capacity. And here is what was visible in the /var/log/messages system log file as vault is logging here as a systemd unit:
So to solve the issue I just triple the DynamoDB table configuration read/write capacity to make it consistent with the number of vault nodes
Basically DynamoDB read/write capacity must be compliant with following formulas:
This is the solution to solve the issue I face but the problem remains. Vault dynamoDB storage backend should handle the problem differently and should correctly release the resources in case of exception. As it looks like vault doesn't release the socket because it is not handling this throttling error correctly. So I would propose to:
|
The Dynamo code uses the aws sdk so this is probably a bug at that layer, though unsure. |
I was not able to reproduce this and I believe that in any event it may be resolved using OS specific tuning (like below) if not being addressed in DynamoDB plugin portion of Vault. Other suspects here that differed to my own testing which was outside AWS are LB & OS (I did not use RHEL). # /etc/sysctl.conf
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse = 1
sysctl -p ; Is this issue still applicable? |
Issues that are not reproducible and/or have not had any interaction for a long time are stale issues. Sometimes even the valid issues remain stale lacking traction either by the maintainers or the community. In order to provide faster responses and better engagement with the community, we strive to keep the issue tracker clean and the issue count low. In this regard, our current policy is to close stale issues after 30 days. If a feature request is being closed, it means that it is not on the product roadmap. Closed issues will still be indexed and available for future viewers. If users feel that the issue is still relevant but is wrongly closed, we encourage reopening them. Please refer to our contributing guidelines for details on issue lifecycle. |
Describe the bug
Observed Vault process left a lot of sockets in CLOSE_WAIT state. Socket in CLOSE_WAIT state are supposed to be released by the process owning the socket by issuing a close() system call. For any reason, vault is not releasing the sockets and reach the MAX OPEN FILE limit leading to a service failure.
To Reproduce
Steps to reproduce the behavior:
systemctl start vault
Expected behavior
Vault process should release any socket in CLOSE_WAIT state.
Environment:
vault status
): Version 0.11.3vault version
):Vault v0.11.3 ('fb601237bfbe4bc16ff679f642248ee8a86e627b')
Red Hat Enterprise Linux Server release 7.6 (Maipo)
Linux foo.bar.foobar.io 3.10.0-957.el7.x86_64 Initial Website Import #1 SMP Thu Oct 4 20:48:51 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Vault server configuration file(s):
Additional context
[root@foo ~]# systemctl status vault -l
[root@foo ~]# netstat -antplu | grep 8200
The text was updated successfully, but these errors were encountered: