Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: df.astype converts to datetime64[ns] inconsistently with respect to dayfirst #60964

Open
3 tasks done
cgflex opened this issue Feb 19, 2025 · 10 comments
Open
3 tasks done
Assignees
Labels
Bug datetime.date stdlib datetime.date support Deprecate Functionality to remove in pandas

Comments

@cgflex
Copy link

cgflex commented Feb 19, 2025

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
df = pd.DataFrame({"some_dates": ["1/1/2025","12/1/2025","13/1/2025","1/12/2025","11/12/2025","13/12/2025",]})
df["converted_dates"] = df.astype({"some_dates": "datetime64[ns]"})
print(df)

# output:
#    some_dates converted_dates
# 0    1/1/2025      2025-01-01
# 1   12/1/2025      2025-12-01 )
# 2   13/1/2025      2025-01-13 ) <- converted_date reverses day and month
# 3   1/12/2025      2025-01-12
# 4  11/12/2025      2025-11-12
# 5  13/12/2025      2025-12-13

Issue Description

When converting dates using astype, dates that are valid monthfirst dates (eg 1 Dec 2025) are interpreted as such. If a date is not valid monthfirst (13 Jan 2025) but it is valid dayfirst then the individual line is interpreted as a dayfirst field.

There was a comment by @MarcoGorelli here: #53127 (comment) that disallowing converting string dates with astype('datetime64[ns]') might be a good idea and after a morning debugging this I'm inclined to agree!

Expected Behavior

In general, I would expect a column of data to have a consistent interpretation. It should be an error or at least a warning for different rows to be interpreted differently without an explicit user request.

Installed Versions

INSTALLED VERSIONS

commit : 0691c5c
python : 3.11.9
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.26100
machine : AMD64
processor : Intel64 Family 6 Model 186 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : English_United Kingdom.1252

pandas : 2.2.3
numpy : 2.1.3
pytz : 2024.2
dateutil : 2.9.0.post0
pip : 24.3.1
Cython : None
sphinx : None
IPython : 8.12.3
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.3
blosc : None
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : None
html5lib : None
hypothesis : None
gcsfs : None
jinja2 : 3.1.4
lxml.etree : 5.3.0
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : 3.1.5
pandas_gbq : None
psycopg2 : 2.9.10
pymysql : None
pyarrow : None
pyreadstat : None
pytest : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : 2.0.36
tables : None
tabulate : None
xarray : None
xlrd : None
xlsxwriter : 3.2.0
zstandard : None
tzdata : 2024.2
qtpy : None
pyqt5 : None
None

@cgflex cgflex added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 19, 2025
@palbha
Copy link

palbha commented Feb 19, 2025

cc @rhshadrach
I am using the latest version & the above code throws an error

pip install pandas==2.2.0
import pandas as pd
df = pd.DataFrame({"some_dates": ["1/1/2025","12/1/2025","13/1/2025","1/12/2025","11/12/2025","13/12/2025",]})
df["converted_dates"] = df.astype({"some_dates": "datetime64[ns]"})
print(df)

Image

@rhshadrach
Copy link
Member

@palbha - the latest version of pandas is 2.2.3; can you try with that.

@rhshadrach
Copy link
Member

@MarcoGorelli - per PDEP-4, this should raise, is that right?

@rhshadrach rhshadrach added datetime.date stdlib datetime.date support and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 19, 2025
@MarcoGorelli
Copy link
Member

yeah probably...pdep4 didn't specifically say anything about datetimeindex, but my inclination is that the DatetimeIndex constructor, as well as astype('datetime64[ns]') should either:

  • only parse iso8601-like formats
  • just not be supported, so people use to_datetime which accepts format / dayfirst / etc. arguments

@cgflex
Copy link
Author

cgflex commented Feb 20, 2025

The main thing that surprised me (and I realise I could have just read the documentation...) was that the decision about how to parse dates was made at row level rather than column level. That's a bit of a hand-wavey point, I know, but not realising that was what really caused me problems with this particular issue (and I have now switched to to_datetime!)

@rhshadrach
Copy link
Member

I would be for restricting to iso8601-like formats rather than removing astype behavior entirely.

@Anurag-Varma
Copy link
Contributor

@palbha @rhshadrach

The issue is present in the latest pandas-dev version.
(Attaching the picture below)

Image

@Anurag-Varma
Copy link
Contributor

Anurag-Varma commented Feb 21, 2025

@MarcoGorelli @rhshadrach

Should the fix for this be like change in code to only accept iso8601 type strings ?

And change documentation to reflect this new changes for astype('datetime64[ns]')?

@Anurag-Varma
Copy link
Contributor

take

@rhshadrach
Copy link
Member

rhshadrach commented Feb 22, 2025

Should the fix for this be like change in code to only accept iso8601 type strings ?

Yes, I think so. At least eventually.

It seems to me restricting to iso8601 can impact valid uses and has a clear deprecation path. I think we should deprecate the current behavior rather than making a breaking change.

@rhshadrach rhshadrach added the Deprecate Functionality to remove in pandas label Feb 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug datetime.date stdlib datetime.date support Deprecate Functionality to remove in pandas
Projects
None yet
Development

No branches or pull requests

5 participants