-
Notifications
You must be signed in to change notification settings - Fork 895
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tracing support for internals and JSON-RPC #1557
Conversation
9303e60
to
be459ee
Compare
dc50467
to
00db98d
Compare
Please can you do a full sync against Goerli and post the sync time here ? Just want to make sure this PR does not introduce performance issues. |
Our 20.10.0-RC2 full sync was 12h9m. That was prior to the native libs improvements, so I'll run another full sync against 20.10.1. |
A 20.10.1 full sync clocked in at 11h6m, so the overhead of this change is under 4%. (20.10.1 had some significant crypto upgrades that significantly impact Goerli specifically) I understand this is "fully loaded" with OLTP spans being fully reported. What's the speed with the original prometheus metrics? |
There a number of things happening on the instance, as it also runs Splunk, the OpenTelemetry Collector and ethlogger. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A full sync with only Prometheus turned on and no open telemetry items resulted in a sync of 11h7h, only one minute longer than our 10.20.1 sync, well within noise levels of the network.
- change
io.hyperledger
toorg.hyperledger
- remove usage of the shaded jar
ethereum/core/src/main/java/org/hyperledger/besu/ethereum/mainnet/AbstractBlockProcessor.java
Outdated
Show resolved
Hide resolved
@@ -169,6 +198,7 @@ private JsonRpcHttpService( | |||
this.authenticationService = authenticationService; | |||
this.livenessService = livenessService; | |||
this.readinessService = readinessService; | |||
this.tracer = OpenTelemetry.getGlobalTracer("io.hyperledger.besu.jsonrpc", "1.0.0"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
org.hyperledger
instead of io.hyperledger
?
@@ -279,6 +310,32 @@ private Router buildRouter() { | |||
return router; | |||
} | |||
|
|||
private void createSpan(final RoutingContext routingContext) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is OpenTelemetry "always on"? Should there be a way for the end user to "opt out" like we do for other metrics collection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Asking for the best way on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it was just the one create span call it would be easy, but it looks like every request has to possibly add a span, so plumbing the metrics configs through seems a bit much... unless performance suffers. We have a benchmark where rpcs is the bottleneck, so if it's a problem I'd expect those benchmarks to fail. If you are willing to risk a call for a revert if performance takes a hit it can stay as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I'm asking, and so far if we don't use a metrics backend, the spans are created, but in the words of OpenTelemetry devs, they then "fall to the floor" as there is no export enabled.
If we want to even avoid the object allocation of creating span objects, we can add a check in the Besu code that disables creating span objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Object allocation is my worry, but I'm thinking if it matters it will only show up under load, which we have a test set up to detect (but only for committed code).
So if the test shows a regression the worst case scenario of a commit is a rollback and then a fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am willing to try this out and see how the performance is impacted.
Initially I had a span collecting trie performance, but I reverted it, as it was too verbose. I believe the AbstractBlockProcessor might be too verbose as well.
We'll see - it would be great to understand better if there are specific use cases (block imports, mining) where tracing helps specifically.
services/pipeline/src/main/java/org/hyperledger/besu/services/pipeline/Pipeline.java
Outdated
Show resolved
Hide resolved
Any progress on the requirement for non-shaded jars? |
Still working on it, slowly. I did a search to make sure I had all the libraries lined up, but since it tends to go wrong at runtime, I wanted to add an integration test to test this out. |
Signed-off-by: Antoine Toulme <[email protected]>
Signed-off-by: Antoine Toulme <[email protected]>
Signed-off-by: Antoine Toulme <[email protected]>
Signed-off-by: Antoine Toulme <[email protected]>
Signed-off-by: Antoine Toulme <[email protected]>
Signed-off-by: Antoine Toulme <[email protected]>
Signed-off-by: Antoine Toulme <[email protected]>
Signed-off-by: Antoine Toulme <[email protected]>
6ccdefa
to
38a2fe3
Compare
Signed-off-by: Antoine Toulme <[email protected]>
Signed-off-by: Antoine Toulme <[email protected]>
Our next release is 13 Jan. Reviews will be infrequent until the new year. Ping me on 1 Jan if this isn't approved. |
@shemnon I appreciate you chiming in. Happy holidays to you, your family and the contributors to the Besu project! |
c891724
to
3fccb91
Compare
Signed-off-by: Antoine Toulme <[email protected]>
3fccb91
to
c6f2ccd
Compare
.../dsl/src/main/java/org/hyperledger/besu/tests/acceptance/dsl/node/ProcessBesuNodeRunner.java
Outdated
Show resolved
Hide resolved
// OpenTelemetry Wiring for windows scripts | ||
def windowsRegex = $/"%JAVA_EXE%" %DEFAULT_JVM_OPTS%/$ | ||
def windowsReplacement = $/if Defined TRACING (TRACING_AGENT="-javaagent:" "%APP_HOME%\agent\/$ + agentFileName + '")\r\n"%JAVA_EXE%" %DEFAULT_JVM_OPTS% %TRACING_AGENT%' | ||
createScriptTask.windowsScript.text = createScriptTask.windowsScript.text.replace(windowsRegex, windowsReplacement) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What changed so that we no longer need the java agent for OTEL? I'm not seeing anything code-wise so I'm concerned this is an accidental deletion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The agent inspects the code and injects changes at runtime to create traces from well-known libraries such as SQL querying or HTTP. We now do this manually and in a much more targeted fashion. We no longer need to use automatic instrumentation. Automatic instrumentation could also create extra traces that don't provide as much value.
metrics/core/src/main/java/org/hyperledger/besu/metrics/opentelemetry/OpenTelemetrySystem.java
Outdated
Show resolved
Hide resolved
...src/test/java/org/hyperledger/besu/metrics/opentelemetry/OpenTelemetryMetricsSystemTest.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Antoine Toulme <[email protected]>
* Add tracing support for internals and JSON-RPC Signed-off-by: Antoine Toulme <[email protected]> * Remove rocksdb tracing as it slows down execution too much Signed-off-by: Antoine Toulme <[email protected]> * Add B3 headers extraction on JSON-RPC requests Signed-off-by: Antoine Toulme <[email protected]> * Remove traces around trie tree as they slow down syncing significantly Signed-off-by: Antoine Toulme <[email protected]> * Add tracing to fast sync pipeline Signed-off-by: Antoine Toulme <[email protected]> * Add tracing for all pipelines Signed-off-by: Antoine Toulme <[email protected]> * Address code review Signed-off-by: Antoine Toulme <[email protected]> * Add acceptance tests and break out the shaded dependency Signed-off-by: Antoine Toulme <[email protected]> * Fix tracer id Signed-off-by: Antoine Toulme <[email protected]> * Revert changes to trie Signed-off-by: Antoine Toulme <[email protected]> * Upgrade otel to latest, remove old tracing Signed-off-by: Antoine Toulme <[email protected]> * Code review comments Signed-off-by: Antoine Toulme <[email protected]>
Signed-off-by: Antoine Toulme [email protected]
PR description
Adds tracing support for internals of Besu and the JSON-RPC HTTP service.
This was tested with soaking a Besu node for 48 hours against mainnet, with no sync issues oberved.
Changelog