Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move Batch deserialization out of the lock #17050

Merged
merged 1 commit into from
Feb 17, 2025

Conversation

jsvd
Copy link
Member

@jsvd jsvd commented Feb 11, 2025

Release Notes

  • Improves performance of the Persistent Queue, especially in the case of large events, by moving deserialization out of the exclusive access lock.

Discussion

currently the deserialization is behind the readBatch's lock, so any large batch will take time deserializing, causing any other Queue writer (e.g. netty executor threads) and any other Queue reader (pipeline worker) to block.

This commit moves the deserialization out of the lock, allowing multiple pipeline workers to deserialize batches concurrently.

To test, edit the generator input to produce 100kb messages, and run:

bin/logstash -e "input { generator { threads => 5 } } output { null {} }" with PQ enabled.

This PR should shows about a 4-5x performance improvement for 10 worker pipeline:

Metric v8.15.5 v8.15.5+pr17050
Events per Second 1031 4971
CPU Usage (for 10 cores) 111.2% 593.1%
Concurrent Deserializations 1 2-5

Copy link
Member

@yaauie yaauie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mechanically, this makes sense, and I see no problem with it. I can't think of any way that deferring the deserialization until outside of the lock would cause problems.

I would prefer for the intermediate object to be an internal detail of the Queue class, and to have a name that more-accurately describes what it is.

I have opened jsvd#9 to show what I mean by that.

@jsvd jsvd changed the title first attempt at moving Batch deserialization out of the lock move Batch deserialization out of the lock Feb 17, 2025
@jsvd jsvd marked this pull request as ready for review February 17, 2025 09:53
Currently the deserialization is behind the readBatch's lock, so any large batch will take time deserializing, causing any other Queue writer (e.g. netty executor threads) and any other Queue reader (pipeline worker) to block.

This commit moves the deserialization out of the lock, allowing multiple pipeline workers to deserialize batches concurrently.

- add intermediate batch-holder from `Queue` methods
- make the intermediate batch-holder a private inner class of `Queue` with
   a descriptive name `SerializedBatchHolder`

Co-authored-by: Ry Biesemeyer <[email protected]>
@jsvd jsvd force-pushed the unlock_queue_deserialization branch from b4a6109 to a2e7667 Compare February 17, 2025 10:24
Copy link

Quality Gate passed Quality Gate passed

Issues
0 New issues
0 Fixed issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarQube

@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

History

@jsvd jsvd requested a review from yaauie February 17, 2025 15:27
Copy link
Contributor

@mashhurs mashhurs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@jlind23
Copy link

jlind23 commented Feb 17, 2025

Thanks @mashhurs for the review and Joao for the work :)

@jsvd jsvd merged commit 637f447 into elastic:main Feb 17, 2025
7 checks passed
@jsvd
Copy link
Member Author

jsvd commented Feb 17, 2025

@logstashmachine backport 8.18

@jsvd
Copy link
Member Author

jsvd commented Feb 17, 2025

@logstashmachine backport 8.17

@jsvd
Copy link
Member Author

jsvd commented Feb 17, 2025

@logstashmachine backport 8.16

github-actions bot pushed a commit that referenced this pull request Feb 17, 2025
Currently the deserialization is behind the readBatch's lock, so any large batch will take time deserializing, causing any other Queue writer (e.g. netty executor threads) and any other Queue reader (pipeline worker) to block.

This commit moves the deserialization out of the lock, allowing multiple pipeline workers to deserialize batches concurrently.

- add intermediate batch-holder from `Queue` methods
- make the intermediate batch-holder a private inner class of `Queue` with a descriptive name `SerializedBatchHolder`

Co-authored-by: Ry Biesemeyer <[email protected]>
(cherry picked from commit 637f447)
github-actions bot pushed a commit that referenced this pull request Feb 17, 2025
Currently the deserialization is behind the readBatch's lock, so any large batch will take time deserializing, causing any other Queue writer (e.g. netty executor threads) and any other Queue reader (pipeline worker) to block.

This commit moves the deserialization out of the lock, allowing multiple pipeline workers to deserialize batches concurrently.

- add intermediate batch-holder from `Queue` methods
- make the intermediate batch-holder a private inner class of `Queue` with a descriptive name `SerializedBatchHolder`

Co-authored-by: Ry Biesemeyer <[email protected]>
(cherry picked from commit 637f447)
github-actions bot pushed a commit that referenced this pull request Feb 17, 2025
Currently the deserialization is behind the readBatch's lock, so any large batch will take time deserializing, causing any other Queue writer (e.g. netty executor threads) and any other Queue reader (pipeline worker) to block.

This commit moves the deserialization out of the lock, allowing multiple pipeline workers to deserialize batches concurrently.

- add intermediate batch-holder from `Queue` methods
- make the intermediate batch-holder a private inner class of `Queue` with a descriptive name `SerializedBatchHolder`

Co-authored-by: Ry Biesemeyer <[email protected]>
(cherry picked from commit 637f447)
@jsvd
Copy link
Member Author

jsvd commented Feb 17, 2025

@logstashmachine backport 9.0

@jsvd jsvd deleted the unlock_queue_deserialization branch February 17, 2025 19:20
github-actions bot pushed a commit that referenced this pull request Feb 17, 2025
Currently the deserialization is behind the readBatch's lock, so any large batch will take time deserializing, causing any other Queue writer (e.g. netty executor threads) and any other Queue reader (pipeline worker) to block.

This commit moves the deserialization out of the lock, allowing multiple pipeline workers to deserialize batches concurrently.

- add intermediate batch-holder from `Queue` methods
- make the intermediate batch-holder a private inner class of `Queue` with a descriptive name `SerializedBatchHolder`

Co-authored-by: Ry Biesemeyer <[email protected]>
(cherry picked from commit 637f447)
jsvd added a commit that referenced this pull request Feb 17, 2025
Currently the deserialization is behind the readBatch's lock, so any large batch will take time deserializing, causing any other Queue writer (e.g. netty executor threads) and any other Queue reader (pipeline worker) to block.

This commit moves the deserialization out of the lock, allowing multiple pipeline workers to deserialize batches concurrently.

- add intermediate batch-holder from `Queue` methods
- make the intermediate batch-holder a private inner class of `Queue` with a descriptive name `SerializedBatchHolder`

Co-authored-by: Ry Biesemeyer <[email protected]>
(cherry picked from commit 637f447)

Co-authored-by: João Duarte <[email protected]>
jsvd added a commit that referenced this pull request Feb 17, 2025
Currently the deserialization is behind the readBatch's lock, so any large batch will take time deserializing, causing any other Queue writer (e.g. netty executor threads) and any other Queue reader (pipeline worker) to block.

This commit moves the deserialization out of the lock, allowing multiple pipeline workers to deserialize batches concurrently.

- add intermediate batch-holder from `Queue` methods
- make the intermediate batch-holder a private inner class of `Queue` with a descriptive name `SerializedBatchHolder`

Co-authored-by: Ry Biesemeyer <[email protected]>
(cherry picked from commit 637f447)

Co-authored-by: João Duarte <[email protected]>
jsvd added a commit that referenced this pull request Feb 17, 2025
Currently the deserialization is behind the readBatch's lock, so any large batch will take time deserializing, causing any other Queue writer (e.g. netty executor threads) and any other Queue reader (pipeline worker) to block.

This commit moves the deserialization out of the lock, allowing multiple pipeline workers to deserialize batches concurrently.

- add intermediate batch-holder from `Queue` methods
- make the intermediate batch-holder a private inner class of `Queue` with a descriptive name `SerializedBatchHolder`

Co-authored-by: Ry Biesemeyer <[email protected]>
(cherry picked from commit 637f447)

Co-authored-by: João Duarte <[email protected]>
jsvd added a commit that referenced this pull request Feb 17, 2025
Currently the deserialization is behind the readBatch's lock, so any large batch will take time deserializing, causing any other Queue writer (e.g. netty executor threads) and any other Queue reader (pipeline worker) to block.

This commit moves the deserialization out of the lock, allowing multiple pipeline workers to deserialize batches concurrently.

- add intermediate batch-holder from `Queue` methods
- make the intermediate batch-holder a private inner class of `Queue` with a descriptive name `SerializedBatchHolder`

Co-authored-by: Ry Biesemeyer <[email protected]>
(cherry picked from commit 637f447)

Co-authored-by: João Duarte <[email protected]>
@jsvd
Copy link
Member Author

jsvd commented Feb 18, 2025

@logstashmachine backport 8.x

github-actions bot pushed a commit that referenced this pull request Feb 18, 2025
Currently the deserialization is behind the readBatch's lock, so any large batch will take time deserializing, causing any other Queue writer (e.g. netty executor threads) and any other Queue reader (pipeline worker) to block.

This commit moves the deserialization out of the lock, allowing multiple pipeline workers to deserialize batches concurrently.

- add intermediate batch-holder from `Queue` methods
- make the intermediate batch-holder a private inner class of `Queue` with a descriptive name `SerializedBatchHolder`

Co-authored-by: Ry Biesemeyer <[email protected]>
(cherry picked from commit 637f447)
jsvd added a commit that referenced this pull request Feb 18, 2025
Currently the deserialization is behind the readBatch's lock, so any large batch will take time deserializing, causing any other Queue writer (e.g. netty executor threads) and any other Queue reader (pipeline worker) to block.

This commit moves the deserialization out of the lock, allowing multiple pipeline workers to deserialize batches concurrently.

- add intermediate batch-holder from `Queue` methods
- make the intermediate batch-holder a private inner class of `Queue` with a descriptive name `SerializedBatchHolder`

Co-authored-by: Ry Biesemeyer <[email protected]>
(cherry picked from commit 637f447)

Co-authored-by: João Duarte <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants