Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Draft] New Block Reconstruction Protocol Proposal #4051

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

walldiss
Copy link
Member

@walldiss walldiss commented Jan 20, 2025

This PR introduces a draft proposal for a new block reconstruction protocol aimed at improving network efficiency and scalability.

Key improvements:

  • Reduces duplicate requests and bandwidth usage
  • Structured bitmap sharing using Roaring Bitmaps
  • Clear separation of client/server responsibilities
  • Robust failure handling with peer penalties
  • Optimized proof packaging and sharing

The document serves as a starting point for discussion and will be later split into CIP and ADR.

TODO:

  • Add testing plan
  • Add false positive triggered reconstruction impact scenario. Potential harm and mitigation

@walldiss walldiss self-assigned this Jan 20, 2025
@codecov-commenter
Copy link

codecov-commenter commented Jan 20, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 45.01%. Comparing base (2469e7a) to head (e072552).
Report is 440 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4051      +/-   ##
==========================================
+ Coverage   44.83%   45.01%   +0.17%     
==========================================
  Files         265      308      +43     
  Lines       14620    22494    +7874     
==========================================
+ Hits         6555    10125    +3570     
- Misses       7313    11285    +3972     
- Partials      752     1084     +332     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@musalbas
Copy link
Member

Looks generally fine to me directionally. Networking protocol should probably be in a CIP as it's relevant to other clients like lumina etc.

Copy link
Member

@renaynay renaynay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this proposal also require some modifications in discovery + peer management as it currently stands? If so, can you please outline what that would look like?

@renaynay
Copy link
Member

Also generally speaking, would be nice to have a diagram at the end sketching out a reconstruction process e2e.

@walldiss
Copy link
Member Author

Would this proposal also require some modifications in discovery + peer management as it currently stands? If so, can you please outline what that would look like?

First implementation is simplified and does need to have any additional peer management. We would still need to have at least some(5) full nodes connected, which is ensured by current implementation of discovery. Requests will be send to random peers, so the only thing that needs to be tracked is all connected peers, which is available from libp2p. It would be easier to see once I add diagrams for FN reconstruction steps.

Copy link
Member Author

@walldiss walldiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feedback notes from sync call

- return handshake based on user agent.
- Do not subscribe for samples from LN
- Request from LN in batches
- return handshake based on user agent.
- Do not subscribe for samples from LN
- Request from LN in batches
@walldiss walldiss force-pushed the reconstruction-adr branch 2 times, most recently from 90840d5 to 5cdcc6c Compare February 11, 2025 22:22

### No Bitmap Subscription From Light Nodes: Why?

Subscribing to bitmaps from Light Nodes would allow more precise deduplication of requested samples, but it also introduces additional complexity and round-trip overhead. Since each Light Node holds relatively few samples, the probability of overlap is low. Instead, the Full Node can send an inverse “have” bitmap in its request to indicate which samples are still needed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that the FN needs to regularly send the bitmap (or a diff of the bitmap) again to LNs as the FN collects more samples? Or is it not necessary as most LNs only have a few samples anyway?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question! The short answer is that updates are not needed.

Because each Light Node typically holds only a few samples, a single request round-trip to the LN is usually sufficient to retrieve everything it has. If the LN sampled only some of 16 coordinates it can provide none. So the FN can retry later with an increasing backoff. This approach allows the FN to determine whether a particular LN has fully sampled the block and no additional bitmap exchanges are required. Essentially, once the LN reports having all samples (or delivers all it has), the FN can safely move on.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the FN know if the LN has all the samples though? SamplesRequest struct doesn't provide for this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the new sampling approach, a LN either holds all samples for a given block or none at all. It will not serve any samples until it has fully sampled the block. Once sampling is complete and the LN has all shares, it can serve them to the FN. This “all-or-none” model means that, in practice, there’s no partial state—either the LN has zero samples, or it has the entire set.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will result cases where the block cannot be reconstructed even if LNs have enough shares to do so -- for example if 1/4 of the block is withheld, most LN sampling will fail, til reconstruction happens. It's a chicken and egg situation...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that’s correct: if 1/4 of the block is withheld, LNs can’t finish sampling and therefore won’t serve any shares. 99% of LNs will treat the block as unavailable. Although it may seem like a “chicken and egg” issue, the protocol is actually protecting LNs from finalizing incomplete blocks. The block could be recovered if the missing shares eventually surface, but until then, it remains withheld. The assumption here is that protocol’s main goal is to prevent LNs from accepting blocks that aren’t fully available.

Do you think this scenario is a viable attack vector? In given case random 1% of LN will get into thinking that block is available, which is the same as the chance of false positive sampling

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is that in such a scenario, LNs should be sharing their partial samples with FNs, in case FNs can reconstruct the missing parts of the block, and complete the rest of their samples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants