Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-37137][PYTHON] Inline type hints for python/pyspark/conf.py #34411

Closed
wants to merge 9 commits into from

Conversation

ByronHsu
Copy link
Member

What changes were proposed in this pull request?

Inline type hints for python/pyspark/conf.py

Why are the changes needed?

Currently, Inline type hints for python/pyspark/conf.pyi doesn't support type checking within function bodies. So we inline type hints to support that.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Exising test.

@ByronHsu
Copy link
Member Author

ByronHsu commented Oct 28, 2021

@xinrong-databricks @ueshin Can you help me take a look? I refer to conf.pyi, but I found that If I copy the type from pyi entirely, some of them will cause errors. Therefore, I manually modify some types to remove the error, but I am not sure whether it is correct.

@xinrong-meng
Copy link
Member

Thanks, @ByronHsu! It is expected that type hints in pyi files cannot be inlined directly. Especially, after inlining type hints, function bodies will be type-checked as well. Let's either adjust type hints or ignore the respective mypy errors.

@github-actions github-actions bot added the SQL label Oct 29, 2021
@ByronHsu
Copy link
Member Author

@zero323 @xinrong-databricks Thanks for your advice. I have revised my code.

@ByronHsu
Copy link
Member Author

Also, I am curious about the advantage of inline-type rather than stub files.

@zero323
Copy link
Member

zero323 commented Oct 29, 2021

Also, I am curious about the advantage of inline-type rather than stub files.

Both have pros and cons.

The biggest advantage here, is that it enables checking actual implementation (other type checkers might check code, even if stubs), so it makes easier to detect discrepancies and you get additional coverage for your code as bonus.

@@ -124,48 +130,57 @@ def __init__(self, loadDefaults=True, _jvm=None, _jconf=None):
self._jconf = None
self._conf = {}

def set(self, key, value):
def set(self, key: str, value: str) -> "SparkConf":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could be less restrictive here, but I am not sure if that's a good idea, since it would depend on __str__ or __repr__ implementation for the value. This has some weird consequences, like:

>>> key, value = "foo", None
>>> conf = sc.getConf()
>>> conf.set(key, value)
<pyspark.conf.SparkConf object at 0x7f4870aa5ee0>
>>> conf.get(key) == value
False

Copy link
Member Author

@ByronHsu ByronHsu Oct 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So do we need to change the type of value from str to Optional[str]? Or could we open another ticket for this issue?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ueshin @HyukjinKwon Could you help me review this patch? Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tough call .. I would keep it as just str for now though .. for None it should be mapped to null on JVM ideally.

@zero323
Copy link
Member

zero323 commented Oct 29, 2021

ok to test

@HyukjinKwon
Copy link
Member

add to whitelist

@SparkQA
Copy link

SparkQA commented Nov 2, 2021

Test build #144842 has finished for PR 34411 at commit 82898e0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 2, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49312/

@SparkQA
Copy link

SparkQA commented Nov 2, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49312/

Comment on lines 204 to 206
if key not in cast(Dict[str, str], self._conf):
return None
return self._conf[key]
return cast(Dict[str, str], self._conf)[key]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated note ‒ shouldn't we use get here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, same as above ‒ single assert might be a better option.

"""Set a configuration property."""
# Try to set self._jconf first if JVM is created, set self._conf if JVM is not created yet.
if self._jconf is not None:
self._jconf.set(key, str(value))
else:
self._conf[key] = str(value)
cast(Dict[str, str], self._conf)[key] = str(value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably go with assert here

diff --git a/python/pyspark/conf.py b/python/pyspark/conf.py
index 09c8e63d09..a8538b06e4 100644
--- a/python/pyspark/conf.py
+++ b/python/pyspark/conf.py
@@ -136,7 +136,8 @@ class SparkConf(object):
         if self._jconf is not None:
             self._jconf.set(key, str(value))
         else:
-            cast(Dict[str, str], self._conf)[key] = str(value)
+            assert self._conf is not None
+            self._conf[key] = str(value)
         return self
 
     def setIfMissing(self, key: str, value: str) -> "SparkConf":

but I guess it is fine for now.

Copy link
Member

@zero323 zero323 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, LGTM.

@ByronHsu
Copy link
Member Author

ByronHsu commented Nov 4, 2021

@zero323 I have updated the code. Thank you for your precious advice!

@SparkQA
Copy link

SparkQA commented Nov 4, 2021

Test build #144894 has finished for PR 34411 at commit b812d4e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 4, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49364/

@SparkQA
Copy link

SparkQA commented Nov 4, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49364/

@HyukjinKwon
Copy link
Member

Merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants