Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-51302][CONNECT] Spark Connect supports JDBC should use the DataFrameReader API #50059

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

beliefer
Copy link
Contributor

@beliefer beliefer commented Feb 24, 2025

What changes were proposed in this pull request?

This PR proposes to unify the calling to the DataFrameReader API in Spark Connect where supports the jdbc API.

Why are the changes needed?

The origin code is good at a little advance of performance, but it is bad if we change the logic of jdbc API.
I think we should unify the code path here.

Does this PR introduce any user-facing change?

'No'.

How was this patch tested?

GA.

Was this patch authored or co-authored using generative AI tooling?

'No'.

@beliefer beliefer changed the title [WIP][SPARK-51302][CONNECT] Spark Connect supports JDBC should use the DataFrameReader API [SPARK-51302][CONNECT] Spark Connect supports JDBC should use the DataFrameReader API Feb 24, 2025
@beliefer
Copy link
Contributor Author

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think you can add some test cases, @beliefer , to be clear what was the problem and to prevent a future regression?

@beliefer
Copy link
Contributor Author

Do you think you can add some test cases, @beliefer , to be clear what was the problem and to prevent a future regression?

Spark Connect already have the test cases. This improvement is just to unify the code path and improve the maintenance.

LogicalRelation(relation)
val properties = new Properties()
properties.putAll(rel.getDataSource.getOptionsMap)
reader.jdbc(url, table, predicates, properties).queryExecution.analyzed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When rel.getDataSource.getFormat == "jdbc" && rel.getDataSource.getPredicatesCount == 0 is true, isn't it unnecessary to use reader.jdbc(url, table, predicates, properties).queryExecution.analyzed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants