Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame interchange protocol: datetime units #65

Open
jorisvandenbossche opened this issue Sep 13, 2021 · 3 comments
Open

DataFrame interchange protocol: datetime units #65

jorisvandenbossche opened this issue Sep 13, 2021 · 3 comments

Comments

@jorisvandenbossche
Copy link
Member

We currenty list "datetime support" in the design document, and also listed it in the dtype docstring:

But at the moment the spec doesn't say anything about how the datetime is stored (which resolution, or whether it supports multiple resolutions with some parametrization).

Updating the spec to mention it should be nanoseconds might be the obvious solution (since that's the only resolution pandas currently supports), but I think we should make this more flexible and allow different units (hopefully pandas will support non-nanosecond resolutions in the future, and other systems might use other resolutions by default).

@kkraus14
Copy link
Collaborator

The spec mentions that the format string is used for datetime specification and that it uses the Arrow C Data Interface format string specification, so I'd argue this is well defined.

@jorisvandenbossche
Copy link
Member Author

The spec mentions that the format string is used for datetime specification

OK, doing a second search, I found "Format strings are mostly useful for datetime specification, and for categoricals." in the notes of the dtypes docstring. That can probably be made a bit more explicit :)

But IMO there is still the question if we find this sufficient, as it would mean that you need to parse a string (to extract the resolution) to know how to interpret the buffer (but it certainly avoids needing to add more parametrization to the len-4 tuple that is currently already returned for .dtype).
BTW, @rgommers, on the other hand this would also already solve the question about how to support timezones, as the Arrow C Data interface format strings include a timezone.

@rgommers
Copy link
Member

But IMO there is still the question if we find this sufficient, as it would mean that you need to parse a string (to extract the resolution) to know how to interpret the buffer

Only if the dtype itself is datetime, right? That seems fine, because how else are we going to support timezones if not via format strings?

BTW, @rgommers, on the other hand this would also already solve the question about how to support timezones, as the Arrow C Data interface format strings include a timezone.

Yes good point. Maybe that's fine and the rest is "just" implementation (and I'm just scarred by the NumPy history).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants