[SPARK-51302][CONNECT] Spark Connect supports JDBC should use the DataFrameReader API #50059

beliefer · 2025-02-24T08:11:53Z

What changes were proposed in this pull request?

This PR proposes to unify the calling to the DataFrameReader API in Spark Connect where supports the jdbc API.

Why are the changes needed?

The origin code is good at a little advance of performance, but it is bad if we change the logic of jdbc API.
I think we should unify the code path here.

Does this PR introduce any user-facing change?

'No'.

How was this patch tested?

GA.

Was this patch authored or co-authored using generative AI tooling?

'No'.

…aFrameReader API

beliefer · 2025-02-24T11:18:27Z

ping @HyukjinKwon @zhengruifeng @LuciferYang

dongjoon-hyun

Do you think you can add some test cases, @beliefer , to be clear what was the problem and to prevent a future regression?

beliefer · 2025-02-25T03:25:08Z

Do you think you can add some test cases, @beliefer , to be clear what was the problem and to prevent a future regression?

Spark Connect already have the test cases. This improvement is just to unify the code path and improve the maintenance.

LuciferYang · 2025-02-25T14:16:21Z

...connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala

-          LogicalRelation(relation)
+          val properties = new Properties()
+          properties.putAll(rel.getDataSource.getOptionsMap)
+          reader.jdbc(url, table, predicates, properties).queryExecution.analyzed


When rel.getDataSource.getFormat == "jdbc" && rel.getDataSource.getPredicatesCount == 0 is true, isn't it unnecessary to use reader.jdbc(url, table, predicates, properties).queryExecution.analyzed?

github-actions bot added SQL CONNECT labels Feb 24, 2025

[SPARK-51302][CONNECT] Spark Connect supports JDBC should use the Dat…

5d89a74

…aFrameReader API

beliefer force-pushed the SPARK-51302 branch from 93a73e2 to 5d89a74 Compare February 24, 2025 09:34

beliefer changed the title ~~[WIP][SPARK-51302][CONNECT] Spark Connect supports JDBC should use the DataFrameReader API~~ [SPARK-51302][CONNECT] Spark Connect supports JDBC should use the DataFrameReader API Feb 24, 2025

dongjoon-hyun reviewed Feb 24, 2025

View reviewed changes

LuciferYang reviewed Feb 25, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-51302][CONNECT] Spark Connect supports JDBC should use the DataFrameReader API #50059

[SPARK-51302][CONNECT] Spark Connect supports JDBC should use the DataFrameReader API #50059

beliefer commented Feb 24, 2025 •

edited

Loading

beliefer commented Feb 24, 2025

dongjoon-hyun left a comment

beliefer commented Feb 25, 2025

LuciferYang Feb 25, 2025

[SPARK-51302][CONNECT] Spark Connect supports JDBC should use the DataFrameReader API #50059

Are you sure you want to change the base?

[SPARK-51302][CONNECT] Spark Connect supports JDBC should use the DataFrameReader API #50059

Conversation

beliefer commented Feb 24, 2025 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

beliefer commented Feb 24, 2025

dongjoon-hyun left a comment

Choose a reason for hiding this comment

beliefer commented Feb 25, 2025

LuciferYang Feb 25, 2025

Choose a reason for hiding this comment

beliefer commented Feb 24, 2025 •

edited

Loading