Replies: 5 comments 27 replies
-
As noted by @sergefdrv, this is related to #8 |
Beta Was this translation helpful? Give feedback.
-
I remember to have discussed this with @matejpavlovic once. I would suggest a solution where we introduce a notion of message queues (buffers) with specified properties like reliable delivery, FIFO order etc. Those message queues would ensure the promised properties until explicitly garbage-collected. |
Beta Was this translation helpful? Give feedback.
-
Adding my take on the matter, if my understanding is off please let me know (or feel free to completely disregard this message). My feeling is that we are mixing two things here: the network abstraction that the protocol will use (where the discussion about FIFO queues, events notifications, etc. make sense), and the underlying transport substrate used by the protocol, and one may impact the other. First of all, libp2p uses a If, on the other hand, we need 1-to-1 message exchange between peers, I would implement an ad-hoc libp2p protocol for the transport layer. In this case, we may need to handle retransmissions. For this case, I would keep a stream open for each peer and I'd use a "fire-and-forget" approach to improve the latency in the happy path. The retransmission policy really depends on the protocol (i.e. Mir). If we have the concept of "protocol progress" I would send the message, wait a bit for progress, and then either remove the message from the queue or retransmit, and minimize the number of retransmissions. Disconnections and other network events can easily be detected through the opened peer stream. I would really avoid retransmiting every second, we can probably use better heuristics from Mir in the network abstraction (if you think it is useful I can read more in depth Mir's operation to try an elaborate a bit more on the policy). |
Beta Was this translation helpful? Give feedback.
-
Yes, reliable message delivery is a general concept common to many distributed protocols, and having it encapsulated in an abstraction with a separate implementation would definitely be useful. The naive approach currently implemented in ISS is definitely sub-optimal, with all the disadvantages mentioned by @xosmig here and @sergefdrv here and in other places earlier. I think "Solution 2" is a good start and I could imagine something like that to form the basis of an advanced communication abstraction. (I think it's better than "Solution 2.5", which seems to put more load on the application than necessary in this context.) When it comes to acknowledging messages (and assuming an acknowledgment allows the sender to garbage-collect the corresponding message), I think there is a fundamental constraint to have in mind:
Note that this doesn't mean that each message needs to be acknowledged individually. Sometimes that might be appropriate, sometimes acknowledgments could be batched, and sometimes acknowledgments could even be implicit through garbage collection. (The latter is probably equivalent to deleting message queues when no longer needed.) For the garbage collection itself, I think the tags shared by multiple messages are going in a good direction. I can easily imagine an extension to that, where there would be a total order on the tags and a greater tag would garbage-collect everything associated with lower tags (especially with FIFO channels this makes sense). This concept is already used in Mir and the "tags" are called |
Beta Was this translation helpful? Give feedback.
-
I am no longer sure whether trying to take advantage of transport-layer retransmission was a good idea.
Hence, it seems reasonable to actually build our own retransmission mechanism on top of a transport-layer protocol that does not have built-in retransmission (the obvious candidate is UDP). However, the issue is that we would need to also implement some flow-control mechanisms to avoid DOSing the network with retransmissions when the throughput is lower than expected. |
Beta Was this translation helpful? Give feedback.
-
Context
Most distributed protocols assume reliable message delivery between non-Byzantine participants.
In practice, we also need to be able to garbage-collect messages that are no longer relevant, even if they haven't yet reached the destination (e.g., once a quorum of responses is collected, many protocols do not care if the message reaches the rest of the nodes).
We should also take into account that transport layer connections can be spontaneously lost (due to some issues in the network) and that nodes may crash and recover.
Naive approach
A naive solution would be simply repeatedly invoking
SendMessage
every second or so until the message is no longer relevant.This is what, I believe, is currently implemented in ISS.
A slightly less naive solution would also add acknowledgments to avoid wasting resources.
Issues with the naive approach
Retransmission is non-trivial: a static retransmission delay would be either too small (in which case we would simply overload our own networking stack) or too large (in which case we would waste time).
There is already a retransmission mechanism in the transport layer, which we should take advantage of. Indeed, if the networking module uses TCP or QUIC, unless the connection is lost, resending messages would make zero sense as the repeated messages would simply queue up in the buffer and would all be delivered.
In the current implementations of the networking module, the connections are established once and, if lost, are never recovered. Hence, in the current implementation, retransmissions do not really do anything useful (if the connection is not lost, reliable delivery is already guaranteed by the transport layer).
Some possible solutions
Possible solution 1
We could probably try to use some existing solutions (would libp2p pub-sub be suitable?).
Possible solution 2
First of all, make the net module responsible for automatically recovering lost connections.
It should also send notifications up-stack when connections are lost and recovered in order to let other modules know that some messages could have been lost. More precisely, the networking module can export the following interface:
SendMessage(msg Message, destNodes []NodeID, msgID int)
MessageRecieved(from NodeID, msg Message)
ConnectionLost(peer NodeID)
ConnectionRestored(peer NodeID)
This way, we will be able to implement a
reliablenet
module on top of thenet
module that will take advantage of the built-in retransmission mechanisms of the Transport layer whenever possible.The
reliablenet
module can guarantee FIFO at-least-once delivery and expose the following API:SendMessage(msg Message, destNodes []NodeID, tag string)
CancelMessage(tag string)
MessageRecieved(from NodeID, msg Message, msgID int)
AcknowledgeMessage(msgID int)
-- used to notify thereliablenet
module that the message was processed by the module up-stack and does not need to be retransmitted in case of recovery.The
reliablenet
module will send each message once and, when the connection is lost and recovered, will exchange messages to determine which messages need to be retransmitted. It will also need to use persistent storage to keep track of the prefix of acknowledged message ids.The
reliablenet
module will guarantee at-least-once delivery in the case when the recipient node fails and recovers, but the message can be lost if the sender node fails.Possible solution 2.5
The
net
module may also not recover lost connections automatically, but simply notify the module up-stack that the connection is lost and export an API for reconnecting.Beta Was this translation helpful? Give feedback.
All reactions