You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently Iroha spawns many tokio::tasks in a detached way, without checking whether they panic somewhere along execution.
Here are a few examples (just cases when the handle is completely ignored; there are many other when the handle is used, but not to check whether it resolves in an error):
When some vital task panics, we don't even see that it happened, and don't see an error message.
It results in a non-graceful "crumbling" of the system due to some tasks relying on failed ones.
For example, in case of the NetworkBase task panicking, we don't see anything except NetworkBase must accept messages until there is at least one handle to it: SendError { .. } (which doesn't tell the cause).
Instead, it would be good:
To see panic messages of most tasks (ideally - of all tasks)
When a vital task panics, gracefully shutdown all other tasks
Ideally (overkill for now), adopt Supervision Tree design principle: monitor tasks execution in a centralised way with restart/shutdown strategies. This must contribute into overall fault-tolerance of Iroha significantly. For more info, see Erlang/OTP or Supervisor in Elixir
Currently Iroha spawns many
tokio::task
s in a detached way, without checking whether they panic somewhere along execution.Here are a few examples (just cases when the handle is completely ignored; there are many other when the handle is used, but not to check whether it resolves in an error):
https://github.com/hyperledger/iroha/blob/5085ff2435bf2e412dec76649b19d3a7bf091239/p2p/src/network.rs#L111
https://github.com/hyperledger/iroha/blob/f5e3c493a6f2f336da09d75c8d00562aac8168a5/cli/src/lib.rs#L387
https://github.com/hyperledger/iroha/blob/f5e3c493a6f2f336da09d75c8d00562aac8168a5/cli/src/lib.rs#L137
https://github.com/hyperledger/iroha/blob/f5e3c493a6f2f336da09d75c8d00562aac8168a5/cli/src/lib.rs#L494
It has a few issues:
For example, in case of the
NetworkBase
task panicking, we don't see anything exceptNetworkBase must accept messages until there is at least one handle to it: SendError { .. }
(which doesn't tell the cause).Instead, it would be good:
Tools to help:
TaskTracker
,JoinSet
,CancellationToken
.The text was updated successfully, but these errors were encountered: