-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
move Batch deserialization out of the lock #17050
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mechanically, this makes sense, and I see no problem with it. I can't think of any way that deferring the deserialization until outside of the lock would cause problems.
I would prefer for the intermediate object to be an internal detail of the Queue
class, and to have a name that more-accurately describes what it is.
I have opened jsvd#9 to show what I mean by that.
Currently the deserialization is behind the readBatch's lock, so any large batch will take time deserializing, causing any other Queue writer (e.g. netty executor threads) and any other Queue reader (pipeline worker) to block. This commit moves the deserialization out of the lock, allowing multiple pipeline workers to deserialize batches concurrently. - add intermediate batch-holder from `Queue` methods - make the intermediate batch-holder a private inner class of `Queue` with a descriptive name `SerializedBatchHolder` Co-authored-by: Ry Biesemeyer <[email protected]>
b4a6109
to
a2e7667
Compare
|
💚 Build Succeeded
History
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Thanks @mashhurs for the review and Joao for the work :) |
@logstashmachine backport 8.18 |
@logstashmachine backport 8.17 |
@logstashmachine backport 8.16 |
Currently the deserialization is behind the readBatch's lock, so any large batch will take time deserializing, causing any other Queue writer (e.g. netty executor threads) and any other Queue reader (pipeline worker) to block. This commit moves the deserialization out of the lock, allowing multiple pipeline workers to deserialize batches concurrently. - add intermediate batch-holder from `Queue` methods - make the intermediate batch-holder a private inner class of `Queue` with a descriptive name `SerializedBatchHolder` Co-authored-by: Ry Biesemeyer <[email protected]> (cherry picked from commit 637f447)
Currently the deserialization is behind the readBatch's lock, so any large batch will take time deserializing, causing any other Queue writer (e.g. netty executor threads) and any other Queue reader (pipeline worker) to block. This commit moves the deserialization out of the lock, allowing multiple pipeline workers to deserialize batches concurrently. - add intermediate batch-holder from `Queue` methods - make the intermediate batch-holder a private inner class of `Queue` with a descriptive name `SerializedBatchHolder` Co-authored-by: Ry Biesemeyer <[email protected]> (cherry picked from commit 637f447)
Currently the deserialization is behind the readBatch's lock, so any large batch will take time deserializing, causing any other Queue writer (e.g. netty executor threads) and any other Queue reader (pipeline worker) to block. This commit moves the deserialization out of the lock, allowing multiple pipeline workers to deserialize batches concurrently. - add intermediate batch-holder from `Queue` methods - make the intermediate batch-holder a private inner class of `Queue` with a descriptive name `SerializedBatchHolder` Co-authored-by: Ry Biesemeyer <[email protected]> (cherry picked from commit 637f447)
@logstashmachine backport 9.0 |
Currently the deserialization is behind the readBatch's lock, so any large batch will take time deserializing, causing any other Queue writer (e.g. netty executor threads) and any other Queue reader (pipeline worker) to block. This commit moves the deserialization out of the lock, allowing multiple pipeline workers to deserialize batches concurrently. - add intermediate batch-holder from `Queue` methods - make the intermediate batch-holder a private inner class of `Queue` with a descriptive name `SerializedBatchHolder` Co-authored-by: Ry Biesemeyer <[email protected]> (cherry picked from commit 637f447)
Currently the deserialization is behind the readBatch's lock, so any large batch will take time deserializing, causing any other Queue writer (e.g. netty executor threads) and any other Queue reader (pipeline worker) to block. This commit moves the deserialization out of the lock, allowing multiple pipeline workers to deserialize batches concurrently. - add intermediate batch-holder from `Queue` methods - make the intermediate batch-holder a private inner class of `Queue` with a descriptive name `SerializedBatchHolder` Co-authored-by: Ry Biesemeyer <[email protected]> (cherry picked from commit 637f447) Co-authored-by: João Duarte <[email protected]>
Currently the deserialization is behind the readBatch's lock, so any large batch will take time deserializing, causing any other Queue writer (e.g. netty executor threads) and any other Queue reader (pipeline worker) to block. This commit moves the deserialization out of the lock, allowing multiple pipeline workers to deserialize batches concurrently. - add intermediate batch-holder from `Queue` methods - make the intermediate batch-holder a private inner class of `Queue` with a descriptive name `SerializedBatchHolder` Co-authored-by: Ry Biesemeyer <[email protected]> (cherry picked from commit 637f447) Co-authored-by: João Duarte <[email protected]>
Currently the deserialization is behind the readBatch's lock, so any large batch will take time deserializing, causing any other Queue writer (e.g. netty executor threads) and any other Queue reader (pipeline worker) to block. This commit moves the deserialization out of the lock, allowing multiple pipeline workers to deserialize batches concurrently. - add intermediate batch-holder from `Queue` methods - make the intermediate batch-holder a private inner class of `Queue` with a descriptive name `SerializedBatchHolder` Co-authored-by: Ry Biesemeyer <[email protected]> (cherry picked from commit 637f447) Co-authored-by: João Duarte <[email protected]>
Currently the deserialization is behind the readBatch's lock, so any large batch will take time deserializing, causing any other Queue writer (e.g. netty executor threads) and any other Queue reader (pipeline worker) to block. This commit moves the deserialization out of the lock, allowing multiple pipeline workers to deserialize batches concurrently. - add intermediate batch-holder from `Queue` methods - make the intermediate batch-holder a private inner class of `Queue` with a descriptive name `SerializedBatchHolder` Co-authored-by: Ry Biesemeyer <[email protected]> (cherry picked from commit 637f447) Co-authored-by: João Duarte <[email protected]>
@logstashmachine backport 8.x |
Currently the deserialization is behind the readBatch's lock, so any large batch will take time deserializing, causing any other Queue writer (e.g. netty executor threads) and any other Queue reader (pipeline worker) to block. This commit moves the deserialization out of the lock, allowing multiple pipeline workers to deserialize batches concurrently. - add intermediate batch-holder from `Queue` methods - make the intermediate batch-holder a private inner class of `Queue` with a descriptive name `SerializedBatchHolder` Co-authored-by: Ry Biesemeyer <[email protected]> (cherry picked from commit 637f447)
Currently the deserialization is behind the readBatch's lock, so any large batch will take time deserializing, causing any other Queue writer (e.g. netty executor threads) and any other Queue reader (pipeline worker) to block. This commit moves the deserialization out of the lock, allowing multiple pipeline workers to deserialize batches concurrently. - add intermediate batch-holder from `Queue` methods - make the intermediate batch-holder a private inner class of `Queue` with a descriptive name `SerializedBatchHolder` Co-authored-by: Ry Biesemeyer <[email protected]> (cherry picked from commit 637f447) Co-authored-by: João Duarte <[email protected]>
Release Notes
Discussion
currently the deserialization is behind the readBatch's lock, so any large batch will take time deserializing, causing any other Queue writer (e.g. netty executor threads) and any other Queue reader (pipeline worker) to block.
This commit moves the deserialization out of the lock, allowing multiple pipeline workers to deserialize batches concurrently.
To test, edit the generator input to produce 100kb messages, and run:
bin/logstash -e "input { generator { threads => 5 } } output { null {} }"
with PQ enabled.This PR should shows about a 4-5x performance improvement for 10 worker pipeline: