Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pathlib.Path support for download utils #8196

Merged
merged 10 commits into from
Jan 9, 2024

Conversation

ahmadsharif1
Copy link
Contributor

Add os.PathLike types to 2 functions.

Copy link

pytorch-bot bot commented Jan 3, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/8196

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit e27abcd with merge base 2afb7fa (image):

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @ahmadsharif1 !

This is a good start, I left a few comments below. Some other things we'll have to address before merging:

  • We should change the title to something more descriptive, e.g. "Add pathlib.Path support for download utils". We typically link to the related issue in the comment, instead of the title. We need descriptive PR title, because this is what eventually ends up being shown on our release notes - and simply linking to an issue wouldn't provide enough information.
  • We should try to test these changes by passing pathlib.Path objects to these 2 utils and make sure it works. Those utils are tested here: https://github.com/pytorch/vision/blob/main/test/test_internet.py#L16. I think we should be able to use pytest.mark.parmetrize("use_pathlib", (True, False)) or something similar.

@@ -118,7 +122,7 @@ def download_url(
root = os.path.expanduser(root)
if not filename:
filename = os.path.basename(url)
fpath = os.path.join(root, filename)
fpath = os.fspath(os.path.join(root, filename))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, is os.fspath what we need here? I think it should suffice to do something like

Suggested change
fpath = os.fspath(os.path.join(root, filename))
fpath = os.path.join(str(root), filename)

which would make sure to convert root to a str object, in case it was a pathlib.Path object. But we'll confirm once we write tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did other patches using str(path) but I believe os.fspath(path) is the right way to do this:

https://discuss.python.org/t/make-pathlib-extensible/3428/79

Also the issue number was incorrect. This PR is supposed to partially address #8120

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is more discussion about str() vs. fspath here:

https://peps.python.org/pep-0519/

That proposal appears to be final and the relevant quote from there is this:

One issue in converting path objects to strings comes from the fact that the only generic way to get a string representation of the path was to pass the object to str(). This can pose a problem when done blindly as nearly all Python objects have some string representation whether they are a path or not, e.g. str(None) will give a result that builtins.open() [5] will happily use to create a new file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I hadn't thought about the str(None) issue. Let's use fspath then, thank you for the investigation!

@ahmadsharif1 ahmadsharif1 changed the title Partially address #1616 Add pathlib.Path support for download utils Jan 4, 2024
Comment on lines 16 to 18
@pytest.mark.parametrize("temp_dir", ("tmpdir", "tmp_path"))
def test_download_url(self, temp_dir, request):
temp_dir = request.getfixturevalue(temp_dir)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing it this way works, but the main drawback is that it doesn't make the code very explicit about what we're trying to do. That is, our intent here is to test support for pathlib.Path(), but it's not obvious from the code that i) this is the actual intent behind using tmp_path and ii) that pathlib.Path() is being used at-all: one needs to know that the tmp_path fixture returns a Path(), and quite frankly... we don't know that from the top of our head. And even if we know it now, we'll forget very soon, and we'll look at these tests 3 months from now and wonder why we bothered testing these 2 fixtures.

So I'd recommend a more obvious alternative that clarifies the intent, e.g. something that you already suggested in our chat:

@pytest.mark.parametrize("use_pathlib", (True, False))
# ...
    if use_pathlib:
        tmpdir = Path(tmpdir)

We have a strong bias towards simplicity in torchvision and in the domain libraries in general, for the sake of long-term maintainability. So we try hard to keep things dumb-simply and very obvious, even if sometimes that means a little bit of repetition/boilerplate (in this case, it does not).

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ahmadsharif1 , LGTM if green! (The linux cpu unittest job is already green!) The bc lint job is safe to ignore here

@ahmadsharif1 ahmadsharif1 merged commit d234307 into pytorch:main Jan 9, 2024
63 of 64 checks passed
@ahmadsharif1 ahmadsharif1 deleted the issue1616 branch January 9, 2024 00:07
Copy link

github-actions bot commented Jan 9, 2024

Hey @ahmadsharif1!

You merged this PR, but no labels were added. The list of valid labels is available at https://github.com/pytorch/vision/blob/main/.github/process_commit.py

@adamjstewart adamjstewart mentioned this pull request Feb 27, 2024
facebook-github-bot pushed a commit that referenced this pull request Mar 19, 2024
Reviewed By: vmoens

Differential Revision: D55062761

fbshipit-source-id: 938cac9790c2d388990f58a33aa1c961113f3fa9

Co-authored-by: Ahmad Sharif <[email protected]>
Co-authored-by: Brizar <[email protected]>
Co-authored-by: Nicolas Hug <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants