-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pathlib.Path support for download utils #8196
Conversation
This reverts commit 9a69752.
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/8196
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit e27abcd with merge base 2afb7fa ( FLAKY - The following job failed but was likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR @ahmadsharif1 !
This is a good start, I left a few comments below. Some other things we'll have to address before merging:
- We should change the title to something more descriptive, e.g. "Add pathlib.Path support for download utils". We typically link to the related issue in the comment, instead of the title. We need descriptive PR title, because this is what eventually ends up being shown on our release notes - and simply linking to an issue wouldn't provide enough information.
- We should try to test these changes by passing
pathlib.Path
objects to these 2 utils and make sure it works. Those utils are tested here: https://github.com/pytorch/vision/blob/main/test/test_internet.py#L16. I think we should be able to usepytest.mark.parmetrize("use_pathlib", (True, False))
or something similar.
@@ -118,7 +122,7 @@ def download_url( | |||
root = os.path.expanduser(root) | |||
if not filename: | |||
filename = os.path.basename(url) | |||
fpath = os.path.join(root, filename) | |||
fpath = os.fspath(os.path.join(root, filename)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, is os.fspath
what we need here? I think it should suffice to do something like
fpath = os.fspath(os.path.join(root, filename)) | |
fpath = os.path.join(str(root), filename) |
which would make sure to convert root
to a str
object, in case it was a pathlib.Path
object. But we'll confirm once we write tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did other patches using str(path) but I believe os.fspath(path) is the right way to do this:
https://discuss.python.org/t/make-pathlib-extensible/3428/79
Also the issue number was incorrect. This PR is supposed to partially address #8120
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is more discussion about str() vs. fspath here:
https://peps.python.org/pep-0519/
That proposal appears to be final and the relevant quote from there is this:
One issue in converting path objects to strings comes from the fact that the only generic way to get a string representation of the path was to pass the object to str(). This can pose a problem when done blindly as nearly all Python objects have some string representation whether they are a path or not, e.g. str(None) will give a result that builtins.open() [5] will happily use to create a new file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, I hadn't thought about the str(None)
issue. Let's use fspath
then, thank you for the investigation!
test/test_internet.py
Outdated
@pytest.mark.parametrize("temp_dir", ("tmpdir", "tmp_path")) | ||
def test_download_url(self, temp_dir, request): | ||
temp_dir = request.getfixturevalue(temp_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing it this way works, but the main drawback is that it doesn't make the code very explicit about what we're trying to do. That is, our intent here is to test support for pathlib.Path()
, but it's not obvious from the code that i) this is the actual intent behind using tmp_path
and ii) that pathlib.Path()
is being used at-all: one needs to know that the tmp_path
fixture returns a Path()
, and quite frankly... we don't know that from the top of our head. And even if we know it now, we'll forget very soon, and we'll look at these tests 3 months from now and wonder why we bothered testing these 2 fixtures.
So I'd recommend a more obvious alternative that clarifies the intent, e.g. something that you already suggested in our chat:
@pytest.mark.parametrize("use_pathlib", (True, False))
# ...
if use_pathlib:
tmpdir = Path(tmpdir)
We have a strong bias towards simplicity in torchvision and in the domain libraries in general, for the sake of long-term maintainability. So we try hard to keep things dumb-simply and very obvious, even if sometimes that means a little bit of repetition/boilerplate (in this case, it does not).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ahmadsharif1 , LGTM if green! (The linux cpu unittest job is already green!) The bc lint job is safe to ignore here
Co-authored-by: Ahmad Sharif <[email protected]>
…pytorch#8197) Co-authored-by: Nicolas Hug <[email protected]>
Hey @ahmadsharif1! You merged this PR, but no labels were added. The list of valid labels is available at https://github.com/pytorch/vision/blob/main/.github/process_commit.py |
Reviewed By: vmoens Differential Revision: D55062761 fbshipit-source-id: 938cac9790c2d388990f58a33aa1c961113f3fa9 Co-authored-by: Ahmad Sharif <[email protected]> Co-authored-by: Brizar <[email protected]> Co-authored-by: Nicolas Hug <[email protected]>
Add os.PathLike types to 2 functions.