You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
But at the moment the spec doesn't say anything about how the datetime is stored (which resolution, or whether it supports multiple resolutions with some parametrization).
Updating the spec to mention it should be nanoseconds might be the obvious solution (since that's the only resolution pandas currently supports), but I think we should make this more flexible and allow different units (hopefully pandas will support non-nanosecond resolutions in the future, and other systems might use other resolutions by default).
The text was updated successfully, but these errors were encountered:
The spec mentions that the format string is used for datetime specification and that it uses the Arrow C Data Interface format string specification, so I'd argue this is well defined.
The spec mentions that the format string is used for datetime specification
OK, doing a second search, I found "Format strings are mostly useful for datetime specification, and for categoricals." in the notes of the dtypes docstring. That can probably be made a bit more explicit :)
But IMO there is still the question if we find this sufficient, as it would mean that you need to parse a string (to extract the resolution) to know how to interpret the buffer (but it certainly avoids needing to add more parametrization to the len-4 tuple that is currently already returned for .dtype).
BTW, @rgommers, on the other hand this would also already solve the question about how to support timezones, as the Arrow C Data interface format strings include a timezone.
But IMO there is still the question if we find this sufficient, as it would mean that you need to parse a string (to extract the resolution) to know how to interpret the buffer
Only if the dtype itself is datetime, right? That seems fine, because how else are we going to support timezones if not via format strings?
BTW, @rgommers, on the other hand this would also already solve the question about how to support timezones, as the Arrow C Data interface format strings include a timezone.
Yes good point. Maybe that's fine and the rest is "just" implementation (and I'm just scarred by the NumPy history).
We currenty list "datetime support" in the design document, and also listed it in the dtype docstring:
dataframe-api/protocol/dataframe_protocol.py
Line 142 in 27b8e1c
But at the moment the spec doesn't say anything about how the datetime is stored (which resolution, or whether it supports multiple resolutions with some parametrization).
Updating the spec to mention it should be nanoseconds might be the obvious solution (since that's the only resolution pandas currently supports), but I think we should make this more flexible and allow different units (hopefully pandas will support non-nanosecond resolutions in the future, and other systems might use other resolutions by default).
The text was updated successfully, but these errors were encountered: