-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Configure abnormal exit reasons for DynamicSupervisor #131
Comments
Btw, is there are any good ways to avoid this issue except spawning limited pool of workers and re-use them? |
Sorry but why is the supervisor reaching max_restart_intensity? The value is not increased if the GenServer exits with normal reason. |
In our case it happens because of bugs in our application (for example when we receive job from RabbitMQ and fail to process it due to some unpredicted temporary bug, eg.: some external service is unavailable). Then supervisor retries job (because RabbitMQ will re-deliver unacknowledged messages) and reaches max restart intensity. Supervisor dies silently, because :shutdown reason is not abnormal and whole application node keeps to live silently without processing any jobs. We use Kubernetes, it will restart whole container and it is desired to have application supervisor to die in this cases, because cluster will be able to heal itself in some time. (And we will notice container restarts.) For this cases it would be nice to have something like "do not restart only when reason is :normal" option. |
@AndrewDryga Why is the supervisor child spec being specified with reason :transient then? If you want it to be restarted, then it should be define as |
@josevalim because we want to be able to kill it with :normal reason when job is completed. Sample usage:
|
@AndrewDryga but that's not related to the exit the supervisor uses when reaching the max restart intensity. |
We already have such restart strategy. It is called |
|
@AndrewDryga there is a confusion here because you are referring to two different processes at the same time. In the issue, you say you want to change the exit value of a supervisor because |
You are right. I don't know where is the best place to put this option, so this misunderstanding appears. And due to language barrier it's hard to tell what do I want :). Here is logs from our test environment that describes issue:
How does this happen?
What is desired behaviour:
I will try to write a sample code for this case, but it's hard to reproduce. |
@AndrewDryga in the above log |
I am closing this I believe there is no bug or feature request per se but we will be glad to continue the discussion. :) |
@fishcakez In our case Here you can see the same situation in totally different application: bitwalker/distillery#118 (comment) I can give access to the source code, if you willing to look into it. |
Can you provide a sample app that reproduces the error?
…On Thu, Dec 8, 2016 at 11:08 Andrew Dryga ***@***.***> wrote:
@fishcakez <https://github.com/fishcakez> In our case
Trader.Workers.Supervisor is not restarting :(.
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#131 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAAlbiBJfdXDxXzaoToGMsT_q7S1Nu6kks5rF9cHgaJpZM4LHDE1>
.
|
Keep in mind that, if a dynamic supervisor restarts, all of its previous
processes will be lost.
You can also do "elixir --logger-sasl-reports true -S mix run" to get
progress reports.
|
I've build a sample app but wasn't able to reproduce this issue. I guess it's not an supervisors fault, it looks like this happens because we have limited prefetch count (limit for unacknowledged messages that is sent to node) that is equal to number of processes that gets spawned. Once all of them is killed, RabbitMQ would not receive acknowledgements and neither reschedule messages, nor send a new ones, because from it's perspective node is processing jobs at max capacity. Is there any good practices to do some job whenever supervisor children dies? We need to store RabbitMQ tags and send negative acknowledgement when this situation occurs. Maybe GenServer with duplicate processes that monitor workers? |
I would have each consumer execute each job inside a task and use activities such as Task.yield to find if the job terminated or not. I.e. the best way to do it is to decouple the ack/deack system from the processing. |
Here is our solution for this problem: https://github.com/Nebo15/gen_task |
Motivation: provide a supervisor for temporary GenServers. They should be spawned when new job is started, and exit with
:normal
reason after job is finished.From the Elixir docs it looks like
:transient
restart type is most suitable for this use case, but actually it's not, because whenever supervisor reachesmax_restart_intensity
it will exit with reason:shutdown
that is threaten as normal, and supervisor will die silently.I guess it will be a good solution to provide different restart strategy that will restart process even when it exits with
:shutdown
reason or to provide a way to configure abnormal exit reasons for supervisor.The text was updated successfully, but these errors were encountered: