get code refactor for stateless #8298

matkt · 2025-02-13T14:14:50Z

PR description

This PR is necessary for the future transition to stateless. The idea behind this PR is to wrap the code in a specific Bytecode class. Why is this necessary? Because today, when we access the code, we retrieve the entire bytecode, which won’t be possible in a stateless environment where the code is fragmented in different places in the tree and only partially available in the witnesses. To address this, we need to try to retrieve only the portions needed during the execution of the EVM.

So, there will be a generic Bytecode class, which will be subdivided into two classes: FullBytecode when we can have the entire bytecode, and ChunkedBytecode when we won’t necessarily have it all. ChunkedBytecode will have the particularity of loading only the necessary parts depending on the get calls we make. It will manage slices, etc., in a way that it won’t load anything unless there’s an explicit get on a specific bytecode. This retrieval won’t involve disk access because the code will come from memory (since the witness will be the data source) and there will be no need for disk access during EVM execution.

ChunkedBytecode is not present in this PR but will be part of the "verkle" branch. However, some things had to be changed in the code, such as calculateJumpDest, which will now focus only on the necessary part. The EVM code will also load only the necessary portions, along with other minor changes.

To make this work, the call to toArrayUnsafe() has been replaced by a call to getRawByteArray, which will be a class in two versions: chunked and full (this limits inheritance to avoid performance issues). RawByteArray will simply be a wrapper for the byte array, as it is now, and should not affect performance, while ChunkedBytecode will have its own logic.

So, we have two levels:

Bytecode as the replacement for Bytes
RawByteArray as the replacement for the byte array obtained with toArrayUnsafe()

This will allow for different logic for these two levels depending on the mode used.

Why is RawByteArray important? Because Bytecode is a class that inherits from Bytes, it is difficult to remove this, so I made the choice for Bytecode to extend Bytes. We observed performance impacts when calling methods of classes that extend Bytes due to the multitude of classes inheriting from it, which slows down the JVM when calling methods of Bytes classes.By using RawByteArray for EVM execution, this class does not inherit from Bytes, so it will not have any significant performance impact when called during the execution of the EVM.

Fixed Issue(s)

Thanks for sending a pull request! Have you done the following?

Checked out our contribution guidelines?
Considered documentation and added the doc-change-required label to this PR if updates are required.
Considered the changelog and included an update if required.
For database changes (e.g. KeyValueSegmentIdentifier) considered compatibility and performed forwards and backwards compatibility tests

Locally, you can run these tests to catch failures early:

spotless: ./gradlew spotlessApply
unit tests: ./gradlew build
acceptance tests: ./gradlew acceptanceTest
integration tests: ./gradlew integrationTest
reference tests: ./gradlew ethereum:referenceTests:referenceTests

Signed-off-by: Karim Taam <[email protected]>

matkt · 2025-02-21T08:28:57Z

We just tested the performance @shemnon :
4 nodes were launched on mainnet (2 based on the main branch and 2 on this branch). The result is that with this branch, no issues were detected with snapsync and no issues occurred while following the head for several days without any performance problems.

A benchmark was also performed on several opcodes, comparing main vs branch, and no performance issues were noticed. I'll let @ahamlat add more information if necessary.

Signed-off-by: Karim Taam <[email protected]>

ahamlat · 2025-02-21T10:19:33Z

I can confirm that benchmarking some of the opcodes, and doing Wall Clock profiling haven't shown any regressions related to this byte code refactoring.
We can see part of the results below, Besu 25.2-develop-5826567 represents this PR

matkt added 2 commits February 13, 2025 15:10

byte code refactor

d95c009

Signed-off-by: Karim Taam <[email protected]>

fix build

ce557c7

Signed-off-by: Karim Taam <[email protected]>

matkt added the verkle label Feb 13, 2025

matkt added 2 commits February 13, 2025 15:49

fix check api changes

73f3223

Signed-off-by: Karim Taam <[email protected]>

merge main

3cb42fb

Signed-off-by: Karim Taam <[email protected]>

matkt force-pushed the bytecode-refactor branch from f29d313 to adf9301 Compare February 13, 2025 15:27

fix build

c31a579

Signed-off-by: Karim Taam <[email protected]>

matkt force-pushed the bytecode-refactor branch from 979051c to c31a579 Compare February 13, 2025 16:32

matkt added 4 commits February 17, 2025 10:42

fix jump dest issue

276a8a4

Signed-off-by: Karim Taam <[email protected]>

fix build

0066ca5

Signed-off-by: Karim Taam <[email protected]>

add javadoc

4f58f1b

Signed-off-by: Karim Taam <[email protected]>

Merge branch 'main' into bytecode-refactor

5826567

matkt marked this pull request as ready for review February 21, 2025 08:19

merge main

4bceeea

Signed-off-by: Karim Taam <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

get code refactor for stateless #8298

get code refactor for stateless #8298

matkt commented Feb 13, 2025 •

edited

Loading

matkt commented Feb 21, 2025

ahamlat commented Feb 21, 2025

get code refactor for stateless #8298

Are you sure you want to change the base?

get code refactor for stateless #8298

Conversation

matkt commented Feb 13, 2025 • edited Loading

PR description

Fixed Issue(s)

Thanks for sending a pull request! Have you done the following?

Locally, you can run these tests to catch failures early:

matkt commented Feb 21, 2025

ahamlat commented Feb 21, 2025

matkt commented Feb 13, 2025 •

edited

Loading