Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak on long lived RpcChannel #15

Closed
jsdanielh opened this issue Aug 18, 2022 · 2 comments · Fixed by #16
Closed

Memory leak on long lived RpcChannel #15

jsdanielh opened this issue Aug 18, 2022 · 2 comments · Fixed by #16

Comments

@jsdanielh
Copy link
Contributor

jsdanielh commented Aug 18, 2022

Multiple calls to the async call method in the RpcChannel class for the same instance seems to be leaking memory.
I have been tracking a client instance that lives forever in a machine and the client seems to be leaking memory.

Tracking down the memory leaks with pympler I noticed that each call to the method mentioned leaks at least these objects:

- set
- collections.deque
- _asyncio.Task
- _asyncio.Future
- coroutine
- asyncio.locks.Event
- function (_on_completion)

Digging a little more I think there are two explanations for these leaks:

  1. The default argument value of None to timeout in the WebSocketRpcClient's call method would cause a leak (and in general when passing None to the RpcChannel's wait_for_response method) since asyncio.as_completed wouldn't have a timeout and since it waits for the self._close Event, that future may not end up marked as completed (since the close event rarely happens in a long lived RpcChannel instance) or timed-out (since there is no timeout) leaking the internal structures used in the method implementation such as the set, collections.deque. So, my personal conclusion is that None as a timeout leaks memory and should be avoided and documented somewhere.
  2. wait_for_reponse method in RpcChannel when calling asyncio.as_completed actually creates a task to wait for the self._close event. In a long lived RpcChannel that event never happens and thus that future continues to run until the close method is called. And since the task is created with each RpcChannel's call method, this could end up leaking objects such as _asyncio.Task, _asyncio.Future and coroutine. Here I think the best would be to use a wait_for future with timeout instead of the current Event'swait.
jsdanielh added a commit to jsdanielh/fastapi_websocket_rpc that referenced this issue Aug 18, 2022
Fix memory leak in the method `wait_for_response` of `RpcChannel`.
As documented in permitio#15, when calling `asyncio.as_completed` it
actually creates a task to wait for the `self._close` event. In a
long lived `RpcChannel` that event never happens and thus that
future continues to run until the `close` method is called.
Since the task is created with each `RpcChannel`'s `call` method,
this could end up leaking objects such as `_asyncio.Task`,
`_asyncio.Future` and `coroutine`.

This change introduces a `wait_until_closed` to use
`asyncio.wait_for` with a `timeout` such that this future can be
used when `wait_for_response` calls `as_completed` to check if the
channel is closed before an arrival of the response. Using a proper
`timeout` warrants that the future will be terminated and then no
future/task/couroutine is leaked.

Also add some documentation on using `None` as a `timeout` and add
some styling changes.

This fixes permitio#15.
jsdanielh added a commit to jsdanielh/fastapi_websocket_rpc that referenced this issue Aug 18, 2022
Fix memory leak in the method `wait_for_response` of `RpcChannel`.
As documented in permitio#15, when calling `asyncio.as_completed` it
actually creates a task to wait for the `self._close` event. In a
long lived `RpcChannel` that event never happens and thus that
future continues to run until the `close` method is called.
Since the task is created with each `RpcChannel`'s `call` method,
this could end up leaking objects such as `_asyncio.Task`,
`_asyncio.Future` and `coroutine`.

This change introduces a `wait_until_closed` to use
`asyncio.wait_for` with a `timeout` such that this future can be
used when `wait_for_response` calls `as_completed` to check if the
channel is closed before an arrival of the response. Using a proper
`timeout` warrants that the future will be terminated and then no
future/task/couroutine is leaked.

Also add some documentation on using `None` as a `timeout` and add
some styling changes.

This fixes permitio#15.
@orweis
Copy link
Contributor

orweis commented Aug 19, 2022

Hi @jsdanielh this seems like a very important find! Thank you for reporting and for the matching PR.

As it's late here, I'll do my best to find time tomorrow to thoroughly review this.

CC: @asafc @roekatz

jsdanielh added a commit to jsdanielh/python-client that referenced this issue Aug 19, 2022
Use a timeout for calling the `WebSocketRpcClient`'s `call` method
since as documented in [this
issue](permitio/fastapi_websocket_rpc#15) a
memory leak may result since `asyncio.as_completed` call in the
`wait_for_reponse` method may not mark the wait for the close event
of the `RpcChannel` as timed out nor completed since the close event
in long lived `RpcChannel` instances rarely happens.
jsdanielh added a commit to jsdanielh/fastapi_websocket_rpc that referenced this issue Aug 19, 2022
Fix memory leak in the method `wait_for_response` of `RpcChannel`.
As documented in permitio#15, when calling `asyncio.as_completed` it
actually creates a task to wait for the `self._close` event. In a
long lived `RpcChannel` that event never happens and thus that
future continues to run until the `close` method is called.
Since the task is created with each `RpcChannel`'s `call` method,
this could end up leaking objects such as `_asyncio.Task`,
`_asyncio.Future` and `coroutine`.

This change introduces a `wait_until_closed` to use
`asyncio.wait_for` with a `timeout` such that this future can be
used when `wait_for_response` calls `as_completed` to check if the
channel is closed before an arrival of the response. Using a proper
`timeout` warrants that the future will be terminated and then no
future/task/couroutine is leaked.

Also add some documentation on using `None` as a `timeout` and add
some styling changes.

This fixes permitio#15.
@orweis
Copy link
Contributor

orweis commented Aug 19, 2022

This is indeed an important find; kudos for finding and reporting this - @jsdanielh
It comes from a rather unexpected angle - RPC call promises frequently failing, and significant resource allocations within asyncio core functions.

In the as_completed I think the main expectation is for the promise to resolve a lot sooner than anything else;
An RPC channel for which the other side is not sending or failing to send replies frequently is a pretty useless RPC channel. That being said, I would agree that we should not expect every RPC implementation on top of the package to behave correctly.

We should probably change the default - to something long enough but not forever - say 300 seconds (5 minutes)
As a default that comes with a memory leak, is a bad default.

We can also look at using asyncio.wait with FIRST_COMPLETED instead of asyncio.as_completed- which is probably more correct, and reduces the potential for a leak a little, though objects are still created on the stack.

I've set some of these points as change requests in your PR, I hope you can apply them as well :)

jsdanielh added a commit to jsdanielh/fastapi_websocket_rpc that referenced this issue Aug 19, 2022
Fix memory leak in the method `wait_for_response` of `RpcChannel`.
As documented in permitio#15, when calling `asyncio.as_completed` it
actually creates a task to wait for the `self._close` event. In a
long lived `RpcChannel` that event never happens and thus that
future continues to run until the `close` method is called.
Since the task is created with each `RpcChannel`'s `call` method,
this could end up leaking objects such as `_asyncio.Task`,
`_asyncio.Future` and `coroutine`.

This change changes a the `wait_for_response` method to use
`asyncio.wait` with a `timeout` and `FIRST_COMPLETED` for the
`returned_when` argument such that we can obtain all pending futures
after the first completed and cancel them to avoid the leak. This
plus the use of a proper `timeout` warrants that the `wait` for the
`self._close` event future will be terminated and then no
future/task/couroutine is leaked.

Also add some documentation on using `None` as a `timeout` and add
some styling changes.

This fixes permitio#15.
jsdanielh added a commit to jsdanielh/python-client that referenced this issue Aug 20, 2022
Use forked `fastapi_websocket_rpc` to get the proposed fix for the
memory leak reported in [this
issue](permitio/fastapi_websocket_rpc#15).
jsdanielh added a commit to jsdanielh/fastapi_websocket_rpc that referenced this issue Aug 23, 2022
Fix memory leak in the method `wait_for_response` of `RpcChannel`.
As documented in permitio#15, when calling `asyncio.as_completed` it
actually creates a task to wait for the `self._close` event. In a
long lived `RpcChannel` that event never happens and thus that
future continues to run until the `close` method is called.
Since the task is created with each `RpcChannel`'s `call` method,
this could end up leaking objects such as `_asyncio.Task`,
`_asyncio.Future` and `coroutine`.

This change changes a the `wait_for_response` method to use
`asyncio.wait` with a `timeout` and `FIRST_COMPLETED` for the
`returned_when` argument such that we can obtain all pending futures
after the first completed and cancel them to avoid the leak. This
warrants that the `wait` for the `self._close` event future will be
cancelled and then no future/task/couroutine is leaked.

Also add some styling changes and fix some typos.

This fixes permitio#15.
jsdanielh added a commit to jsdanielh/fastapi_websocket_rpc that referenced this issue Aug 23, 2022
Fix memory leak in the method `wait_for_response` of `RpcChannel`.
As documented in permitio#15, when calling `asyncio.as_completed` it
actually creates a task to wait for the `self._close` event. In a
long lived `RpcChannel` that event never happens and thus that
future continues to run until the `close` method is called.
Since the task is created with each `RpcChannel`'s `call` method,
this could end up leaking objects such as `_asyncio.Task`,
`_asyncio.Future` and `coroutine`.

This change changes a the `wait_for_response` method to use
`asyncio.wait` with a `timeout` and `FIRST_COMPLETED` for the
`returned_when` argument such that we can obtain all pending futures
after the first completed and cancel them to avoid the leak. This
warrants that the `wait` for the `self._close` event future will be
cancelled and then no future/task/coroutine is leaked.

Also add some styling changes and fix some typos.

This fixes permitio#15.
jsdanielh added a commit to jsdanielh/python-client that referenced this issue Aug 24, 2022
Go back to mainstream `fastapi_websocket_rpc` now that the fix for
the memory leak reported in [this
issue](permitio/fastapi_websocket_rpc#15)
has been merged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants