-
-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: str_attr and num_attr materialized views for new eap_items table #6907
base: master
Are you sure you want to change the base?
Conversation
This PR has a migration; here is the generated SQL for -- start migrations
-- forward migration events_analytics_platform : 0029_items_attribute_table_v1
Local op: CREATE TABLE IF NOT EXISTS items_str_attrs_1_local (organization_id UInt64, project_id UInt64, item_type UInt8, attr_key String CODEC (ZSTD(1)), attr_value String CODEC (ZSTD(1)), timestamp DateTime CODEC (DoubleDelta, ZSTD(1)), retention_days UInt16, count SimpleAggregateFunction(sum, UInt64)) ENGINE ReplicatedAggregatingMergeTree('/clickhouse/tables/events_analytics_platform/{shard}/default/items_str_attrs_1_local', '{replica}') PRIMARY KEY (organization_id, attr_key) ORDER BY (organization_id, attr_key, item_type, attr_value, project_id, timestamp, retention_days) PARTITION BY (retention_days, toMonday(timestamp)) TTL timestamp + toIntervalDay(retention_days) SETTINGS index_granularity=8192;
Distributed op: CREATE TABLE IF NOT EXISTS items_str_attrs_1_dist (organization_id UInt64, project_id UInt64, item_type UInt8, attr_key String CODEC (ZSTD(1)), attr_value String CODEC (ZSTD(1)), timestamp DateTime CODEC (DoubleDelta, ZSTD(1)), retention_days UInt16, count SimpleAggregateFunction(sum, UInt64)) ENGINE Distributed(`cluster_one_sh`, default, items_str_attrs_1_local);
Local op: CREATE MATERIALIZED VIEW IF NOT EXISTS items_str_attrs_1_mv TO items_str_attrs_1_local (organization_id UInt64, project_id UInt64, item_type UInt8, attr_key String CODEC (ZSTD(1)), attr_value String CODEC (ZSTD(1)), timestamp DateTime CODEC (DoubleDelta, ZSTD(1)), retention_days UInt16, count SimpleAggregateFunction(sum, UInt64)) AS
SELECT
organization_id,
project_id,
item_type,
attrs.1 as attr_key,
attrs.2 as attr_value,
toStartOfDay(timestamp) AS timestamp,
retention_days,
1 AS count
FROM eap_items_1_local
LEFT ARRAY JOIN
arrayConcat(
CAST(attributes_string_0, 'Array(Tuple(String, String))'), CAST(attributes_string_1, 'Array(Tuple(String, String))'), CAST(attributes_string_2, 'Array(Tuple(String, String))'), CAST(attributes_string_3, 'Array(Tuple(String, String))'), CAST(attributes_string_4, 'Array(Tuple(String, String))'), CAST(attributes_string_5, 'Array(Tuple(String, String))'), CAST(attributes_string_6, 'Array(Tuple(String, String))'), CAST(attributes_string_7, 'Array(Tuple(String, String))'), CAST(attributes_string_8, 'Array(Tuple(String, String))'), CAST(attributes_string_9, 'Array(Tuple(String, String))'), CAST(attributes_string_10, 'Array(Tuple(String, String))'), CAST(attributes_string_11, 'Array(Tuple(String, String))'), CAST(attributes_string_12, 'Array(Tuple(String, String))'), CAST(attributes_string_13, 'Array(Tuple(String, String))'), CAST(attributes_string_14, 'Array(Tuple(String, String))'), CAST(attributes_string_15, 'Array(Tuple(String, String))'), CAST(attributes_string_16, 'Array(Tuple(String, String))'), CAST(attributes_string_17, 'Array(Tuple(String, String))'), CAST(attributes_string_18, 'Array(Tuple(String, String))'), CAST(attributes_string_19, 'Array(Tuple(String, String))'), CAST(attributes_string_20, 'Array(Tuple(String, String))'), CAST(attributes_string_21, 'Array(Tuple(String, String))'), CAST(attributes_string_22, 'Array(Tuple(String, String))'), CAST(attributes_string_23, 'Array(Tuple(String, String))'), CAST(attributes_string_24, 'Array(Tuple(String, String))'), CAST(attributes_string_25, 'Array(Tuple(String, String))'), CAST(attributes_string_26, 'Array(Tuple(String, String))'), CAST(attributes_string_27, 'Array(Tuple(String, String))'), CAST(attributes_string_28, 'Array(Tuple(String, String))'), CAST(attributes_string_29, 'Array(Tuple(String, String))'), CAST(attributes_string_30, 'Array(Tuple(String, String))'), CAST(attributes_string_31, 'Array(Tuple(String, String))'), CAST(attributes_string_32, 'Array(Tuple(String, String))'), CAST(attributes_string_33, 'Array(Tuple(String, String))'), CAST(attributes_string_34, 'Array(Tuple(String, String))'), CAST(attributes_string_35, 'Array(Tuple(String, String))'), CAST(attributes_string_36, 'Array(Tuple(String, String))'), CAST(attributes_string_37, 'Array(Tuple(String, String))'), CAST(attributes_string_38, 'Array(Tuple(String, String))'), CAST(attributes_string_39, 'Array(Tuple(String, String))')
) AS attrs
GROUP BY
organization_id,
project_id,
item_type,
attr_key,
attr_value,
timestamp,
retention_days
;
Local op: CREATE TABLE IF NOT EXISTS items_num_attrs_1_local (organization_id UInt64, project_id UInt64, item_type UInt8, attr_key String CODEC (ZSTD(1)), attr_min_value SimpleAggregateFunction(min, Float64), attr_max_value SimpleAggregateFunction(max, Float64), timestamp DateTime CODEC (DoubleDelta, ZSTD(1)), retention_days UInt16, count SimpleAggregateFunction(sum, UInt64)) ENGINE ReplicatedAggregatingMergeTree('/clickhouse/tables/events_analytics_platform/{shard}/default/items_num_attrs_1_local', '{replica}') PRIMARY KEY (organization_id, attr_key) ORDER BY (organization_id, attr_key, item_type, timestamp, project_id, retention_days) PARTITION BY (retention_days, toMonday(timestamp)) TTL timestamp + toIntervalDay(retention_days) SETTINGS index_granularity=8192;
Distributed op: CREATE TABLE IF NOT EXISTS items_num_attrs_1_dist (organization_id UInt64, project_id UInt64, item_type UInt8, attr_key String CODEC (ZSTD(1)), attr_min_value SimpleAggregateFunction(min, Float64), attr_max_value SimpleAggregateFunction(max, Float64), timestamp DateTime CODEC (DoubleDelta, ZSTD(1)), retention_days UInt16, count SimpleAggregateFunction(sum, UInt64)) ENGINE Distributed(`cluster_one_sh`, default, items_num_attrs_1_local);
Local op: CREATE MATERIALIZED VIEW IF NOT EXISTS items_num_attrs_1_mv TO items_num_attrs_1_local (organization_id UInt64, project_id UInt64, item_type UInt8, attr_key String CODEC (ZSTD(1)), attr_min_value SimpleAggregateFunction(min, Float64), attr_max_value SimpleAggregateFunction(max, Float64), timestamp DateTime CODEC (DoubleDelta, ZSTD(1)), retention_days UInt16, count SimpleAggregateFunction(sum, UInt64)) AS
SELECT
organization_id,
project_id,
item_type,
attrs.1 as attr_key,
attrs.2 as attr_min_value,
attrs.2 as attr_max_value,
toStartOfDay(timestamp) AS timestamp,
retention_days,
1 AS count
FROM eap_items_1_local
LEFT ARRAY JOIN
arrayConcat(
CAST(attributes_float_0, 'Array(Tuple(String, Float64))'),CAST(attributes_float_1, 'Array(Tuple(String, Float64))'),CAST(attributes_float_2, 'Array(Tuple(String, Float64))'),CAST(attributes_float_3, 'Array(Tuple(String, Float64))'),CAST(attributes_float_4, 'Array(Tuple(String, Float64))'),CAST(attributes_float_5, 'Array(Tuple(String, Float64))'),CAST(attributes_float_6, 'Array(Tuple(String, Float64))'),CAST(attributes_float_7, 'Array(Tuple(String, Float64))'),CAST(attributes_float_8, 'Array(Tuple(String, Float64))'),CAST(attributes_float_9, 'Array(Tuple(String, Float64))'),CAST(attributes_float_10, 'Array(Tuple(String, Float64))'),CAST(attributes_float_11, 'Array(Tuple(String, Float64))'),CAST(attributes_float_12, 'Array(Tuple(String, Float64))'),CAST(attributes_float_13, 'Array(Tuple(String, Float64))'),CAST(attributes_float_14, 'Array(Tuple(String, Float64))'),CAST(attributes_float_15, 'Array(Tuple(String, Float64))'),CAST(attributes_float_16, 'Array(Tuple(String, Float64))'),CAST(attributes_float_17, 'Array(Tuple(String, Float64))'),CAST(attributes_float_18, 'Array(Tuple(String, Float64))'),CAST(attributes_float_19, 'Array(Tuple(String, Float64))'),CAST(attributes_float_20, 'Array(Tuple(String, Float64))'),CAST(attributes_float_21, 'Array(Tuple(String, Float64))'),CAST(attributes_float_22, 'Array(Tuple(String, Float64))'),CAST(attributes_float_23, 'Array(Tuple(String, Float64))'),CAST(attributes_float_24, 'Array(Tuple(String, Float64))'),CAST(attributes_float_25, 'Array(Tuple(String, Float64))'),CAST(attributes_float_26, 'Array(Tuple(String, Float64))'),CAST(attributes_float_27, 'Array(Tuple(String, Float64))'),CAST(attributes_float_28, 'Array(Tuple(String, Float64))'),CAST(attributes_float_29, 'Array(Tuple(String, Float64))'),CAST(attributes_float_30, 'Array(Tuple(String, Float64))'),CAST(attributes_float_31, 'Array(Tuple(String, Float64))'),CAST(attributes_float_32, 'Array(Tuple(String, Float64))'),CAST(attributes_float_33, 'Array(Tuple(String, Float64))'),CAST(attributes_float_34, 'Array(Tuple(String, Float64))'),CAST(attributes_float_35, 'Array(Tuple(String, Float64))'),CAST(attributes_float_36, 'Array(Tuple(String, Float64))'),CAST(attributes_float_37, 'Array(Tuple(String, Float64))'),CAST(attributes_float_38, 'Array(Tuple(String, Float64))'),CAST(attributes_float_39, 'Array(Tuple(String, Float64))')
) AS attrs
GROUP BY
organization_id,
project_id,
item_type,
attrs.1,
attrs.2,
timestamp,
retention_days
;
-- end forward migration events_analytics_platform : 0029_items_attribute_table_v1
-- backward migration events_analytics_platform : 0029_items_attribute_table_v1
Local op: DROP TABLE IF EXISTS items_str_attrs_1_mv;
Local op: DROP TABLE IF EXISTS items_str_attrs_1_local;
Distributed op: DROP TABLE IF EXISTS items_str_attrs_1_dist;
Local op: DROP TABLE IF EXISTS items_num_attrs_1_mv;
Local op: DROP TABLE IF EXISTS items_num_attrs_1_local;
Distributed op: DROP TABLE IF EXISTS items_num_attrs_1_dist;
-- end backward migration events_analytics_platform : 0029_items_attribute_table_v1 |
❌ 1 Tests Failed:
View the top 1 failed test(s) by shortest run time
To view more test analytics, go to the Test Analytics Dashboard |
str_columns: Sequence[Column[Modifiers]] = [ | ||
Column("organization_id", UInt(64)), | ||
Column("project_id", UInt(64)), | ||
Column("item_type", UInt(8)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new column for item_type
engine=table_engines.AggregatingMergeTree( | ||
storage_set=self.storage_set_key, | ||
primary_key="(organization_id, attr_key)", | ||
order_by="(organization_id, attr_key, item_type, attr_value, project_id, timestamp, retention_days)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add item_type to sort key.
is this the right place in the sort key or should it be changed?
should it be added to primary_key?
LEFT ARRAY JOIN | ||
arrayConcat( | ||
{", ".join(f"CAST(attributes_string_{n}, 'Array(Tuple(String, String))')" for n in range(ITEM_ATTRIBUTE_BUCKETS))} | ||
) AS attrs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at this part the old mvs also had
array(
tuple('sentry.service', `service`),
tuple('sentry.segment_name', `segment_name`),
tuple('sentry.name', `name`)
)
i removed that since those columns dont exist in eap_items
LEFT ARRAY JOIN | ||
arrayConcat( | ||
{",".join(f"CAST(attributes_float_{n}, 'Array(Tuple(String, Float64))')" for n in range(ITEM_ATTRIBUTE_BUCKETS))} | ||
) AS attrs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at this part the old mvs also had
array(
tuple('sentry.duration_ms', duration_micro / 1000)
)
i removed that since the column doesnt exist anymore
the purpose of this PR is to recreate the existing
spans_num_attrs_3_mv
andspans_str_attrs_3_mv
and the underlyingspans_str_attrs_3_local
andspans_num_attrs_3_local
tables so they read fromeap_items_1_local
instead ofeap_spans_2_local
it addresses this ticket https://github.com/getsentry/eap-planning/issues/194
changes
items_str_attrs_1
anditems_num_attrs_1
reading fromeap_items
via migration