-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-37137][PYTHON] Inline type hints for python/pyspark/conf.py #34411
Conversation
@xinrong-databricks @ueshin Can you help me take a look? I refer to |
Thanks, @ByronHsu! It is expected that type hints in |
@zero323 @xinrong-databricks Thanks for your advice. I have revised my code. |
Also, I am curious about the advantage of inline-type rather than stub files. |
Both have pros and cons. The biggest advantage here, is that it enables checking actual implementation (other type checkers might check code, even if stubs), so it makes easier to detect discrepancies and you get additional coverage for your code as bonus. |
@@ -124,48 +130,57 @@ def __init__(self, loadDefaults=True, _jvm=None, _jconf=None): | |||
self._jconf = None | |||
self._conf = {} | |||
|
|||
def set(self, key, value): | |||
def set(self, key: str, value: str) -> "SparkConf": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we could be less restrictive here, but I am not sure if that's a good idea, since it would depend on __str__
or __repr__
implementation for the value
. This has some weird consequences, like:
>>> key, value = "foo", None
>>> conf = sc.getConf()
>>> conf.set(key, value)
<pyspark.conf.SparkConf object at 0x7f4870aa5ee0>
>>> conf.get(key) == value
False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So do we need to change the type of value
from str
to Optional[str]
? Or could we open another ticket for this issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ueshin @HyukjinKwon WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ueshin @HyukjinKwon Could you help me review this patch? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tough call .. I would keep it as just str
for now though .. for None
it should be mapped to null
on JVM ideally.
ok to test |
add to whitelist |
Test build #144842 has finished for PR 34411 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status failure |
python/pyspark/conf.py
Outdated
if key not in cast(Dict[str, str], self._conf): | ||
return None | ||
return self._conf[key] | ||
return cast(Dict[str, str], self._conf)[key] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated note ‒ shouldn't we use get
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, same as above ‒ single assert
might be a better option.
python/pyspark/conf.py
Outdated
"""Set a configuration property.""" | ||
# Try to set self._jconf first if JVM is created, set self._conf if JVM is not created yet. | ||
if self._jconf is not None: | ||
self._jconf.set(key, str(value)) | ||
else: | ||
self._conf[key] = str(value) | ||
cast(Dict[str, str], self._conf)[key] = str(value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably go with assert
here
diff --git a/python/pyspark/conf.py b/python/pyspark/conf.py
index 09c8e63d09..a8538b06e4 100644
--- a/python/pyspark/conf.py
+++ b/python/pyspark/conf.py
@@ -136,7 +136,8 @@ class SparkConf(object):
if self._jconf is not None:
self._jconf.set(key, str(value))
else:
- cast(Dict[str, str], self._conf)[key] = str(value)
+ assert self._conf is not None
+ self._conf[key] = str(value)
return self
def setIfMissing(self, key: str, value: str) -> "SparkConf":
but I guess it is fine for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, LGTM.
@zero323 I have updated the code. Thank you for your precious advice! |
Test build #144894 has finished for PR 34411 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status failure |
Merged to master. |
What changes were proposed in this pull request?
Inline type hints for python/pyspark/conf.py
Why are the changes needed?
Currently, Inline type hints for python/pyspark/conf.pyi doesn't support type checking within function bodies. So we inline type hints to support that.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Exising test.