-
-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ref(reader): Improve abstractions around column transformation #812
Conversation
snuba/reader.py
Outdated
transformer = next( | ||
( | ||
transformer | ||
for pattern, transformer in column_transformations | ||
if pattern.match(column["type"]) | ||
), | ||
None, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same caveat still applies here: #800 (comment)
# XXX: If the original value had a valid nonzero UTC offset, this will | ||
# result in an incorrect value being returned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is one of those "shouldn't happen" (but probably will eventually) comments. Mark my words.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you figure out why some dates had timezone from clickhouse and others did not ? I am curious.
Also should we log warning in case the result we are returning is incorrect because there was a valid non zero UTC offset in the parameter ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you figure out why some dates had timezone from clickhouse and others did not ? I am curious.
It seems like certain functions that manipulate the time value result in the time having the timezone attached, as in these examples:
SELECT now() LIMIT 1 FORMAT JSON
{
"meta":
[
{
"name": "now()",
"type": "DateTime"
}
],
"data":
[
{
"now()": "2020-03-04 22:57:42"
}
]
}
vs.
SELECT toStartOfHour(now()) LIMIT 1 FORMAT JSON
{
"meta":
[
{
"name": "toStartOfHour(now())",
"type": "DateTime('Etc\/UTC')"
}
],
"data":
[
{
"toStartOfHour(now())": "2020-03-04 22:00:00"
}
]
}
In our case, this means there is going to be a difference between the values returned by the time
synthetic column added by the time series dataset code, and the timestamp
returned from a single row.
Also should we log warning in case the result we are returning is incorrect because there was a valid non zero UTC offset in the parameter ?
I thought about this too. I don't like that this will be silently adjusting (read: corrupting) data, but also the idea of logging within the value transformer seems like a bad idea since it'll put the log in a tight loop. This also seems to be inconsistently named between environments (in production it is Universal
whereas I see Etc/UTC
in my local environment.)
I think that the real answer is to just treat tz-naive and tz-aware datetimes differently. If there's no tzinfo, assume UTC and if there is tzinfo, either ensure (via asserting) that it's UTC, or always localize it to UTC regardless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the real answer is to just treat tz-naive and tz-aware datetimes differently. If there's no tzinfo, assume UTC and if there is tzinfo, either ensure (via asserting) that it's UTC, or always localize it to UTC regardless.
I'm going to do this as a follow up just to do some semi-dangerous thing at a time.
# XXX: If the original value had a valid nonzero UTC offset, this will | ||
# result in an incorrect value being returned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you figure out why some dates had timezone from clickhouse and others did not ? I am curious.
Also should we log warning in case the result we are returning is incorrect because there was a valid non zero UTC offset in the parameter ?
Second take at #800, now with timezone support. From the original PR description: