fix(Views): multiple joins #1611

scaliseraoul · 2025-02-13T16:53:05Z

Tests added and passed if fixing a bug or adding a new feature.
All code checks passed.

Important

Fixes handling of multiple join conditions in ViewQueryBuilder by grouping and combining them, with new tests added for verification.

Behavior:
- Fixes handling of multiple join conditions for the same target dataset in ViewQueryBuilder by grouping and combining conditions with 'AND'.
Tests:
- Adds test_multiple_joins_same_table to verify multiple join conditions are combined correctly.
- Adds test_three_table_join to verify joining three different tables with correct conditions.

^{This description was created by}^{for 72b0584. It will automatically update as commits are pushed.}

…355-query * commit '61ba3530794fbe2b5739a04c26dc35e704ce69c9': fix(dataset): slug format validation on load (sinaptik-ai#1609) fix(views): transformation using raw sql (sinaptik-ai#1608) Release v3.0.0-beta.10

ellipsis-dev

👍 Looks good to me! Reviewed everything up to 72b0584 in 2 minutes and 44 seconds

More details

Looked at 176 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 8 drafted comments based on config settings.

1. pandasai/query_builders/view_query_builder.py:127

Draft comment:
Ensure that the target dataset key exists in schema_dependencies_dict before using it. Consider adding error handling for missing keys to prevent KeyError.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50%
The comment raises a valid concern about potential KeyErrors. However, schema_dependencies_dict is a core dependency passed in constructor, and to_datasets comes from schema.relations which should be validated at schema creation time. This feels like it should be handled at a higher architectural level, not in this specific method. The comment is asking for defensive programming in what appears to be an internal API.
I might be wrong about schema validation - perhaps invalid relations could slip through. Also, defensive programming can be good practice even for internal APIs.
While defensive programming is good, this particular case feels like it would be masking a more fundamental data integrity issue that should be caught earlier. Adding error handling here would be treating a symptom rather than the cause.
Delete the comment as it suggests error handling at the wrong level of abstraction. Data integrity should be enforced at schema validation time.

2. pandasai/query_builders/view_query_builder.py:131

Draft comment:
Using ' AND '.join(conditions) is fine; ensure that sanitize_view_column_name fully prevents SQL injection in join conditions.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50%
None

3. tests/unit_tests/query_builders/test_view_query_builder.py:266

Draft comment:
Hardcoding expected multi-line SQL strings in tests can be brittle due to formatting. Consider comparing parsed SQL expressions or using a normalization function in tests.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50%
The comment raises a valid point about test maintainability - string comparisons of SQL can be fragile if formatting changes. However, the current approach is consistent throughout the test file and seems to be working well. The suggestion would require significant test refactoring and adding new dependencies/utilities for SQL parsing. The current approach, while not perfect, is simple and readable.
The comment identifies a real testing best practice issue. SQL parsing would make tests more robust against formatting changes.
While technically correct, implementing the suggestion would add complexity for minimal benefit since the current string-based tests are working well and are consistent throughout the codebase.
Delete the comment since it suggests a major testing approach change that would add complexity without providing enough benefit to justify the refactoring effort.

4. pandasai/query_builders/view_query_builder.py:118

Draft comment:
Consider renaming the variable 'to_datasets' to 'target_dataset' for clarity, since it represents a single target dataset extracted from the join relation.
Reason this comment was not posted:
Marked as duplicate.

5. pandasai/query_builders/view_query_builder.py:121

Draft comment:
In the join condition strings, sanitize_view_column_name is used directly. Consider if using self.normalize_view_column_name would be more consistent with column aliasing. If the intent is to preserve table.column format for join predicates, document this choice.
Reason this comment was not posted:
Marked as duplicate.

6. pandasai/query_builders/view_query_builder.py:126

Draft comment:
The logic grouping join conditions by target dataset and combining them with 'AND' is clear and DRY. Ensure that tests sufficiently cover scenarios where multiple join conditions exist for the same target table.
Reason this comment was not posted:
Marked as duplicate.

7. tests/unit_tests/query_builders/test_view_query_builder.py:243

Draft comment:
New tests (test_multiple_joins_same_table and test_three_table_join) provide comprehensive coverage for multi-join scenarios. Consider adding brief inline comments to clarify their intent, and monitor that the pretty-printed SQL output remains stable with sqlglot updates.
Reason this comment was not posted:
Marked as duplicate.

8. tests/unit_tests/query_builders/test_view_query_builder.py:16

Draft comment:
The helper method _create_mock_loader is effective for creating mock loaders. If it becomes widely used across tests, consider refactoring it as a pytest fixture to promote reuse.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50%
The suggestion is about code organization and test maintainability. The method is currently used in two new tests, but it's not clear if it will be widely used elsewhere. The comment is speculative ("if it becomes widely used") rather than addressing a current issue. Converting to a fixture would require additional work without clear immediate benefit.
The comment makes a reasonable suggestion for potential future refactoring, but it's speculative and not based on current widespread usage. The current implementation as a helper method is perfectly valid for its current usage.
While fixtures are good for test reuse, making premature abstractions based on potential future use cases can lead to unnecessary complexity. The current helper method approach is appropriate for its current limited usage.
The comment should be deleted as it's speculative and suggests a refactoring that isn't clearly needed given the current usage pattern.

Workflow ID: wflow_KGNpdW0aG8wfJk34

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

scaliseraoul added 3 commits February 13, 2025 15:37

fix(views): transformation using raw sql

92a4891

fix(Views): multiple joins

911cf35

Merge commit '61ba3530794fbe2b5739a04c26dc35e704ce69c9' into fix/SIN-…

72b0584

…355-query * commit '61ba3530794fbe2b5739a04c26dc35e704ce69c9': fix(dataset): slug format validation on load (sinaptik-ai#1609) fix(views): transformation using raw sql (sinaptik-ai#1608) Release v3.0.0-beta.10

ellipsis-dev bot reviewed Feb 13, 2025

View reviewed changes

gventuri merged commit a1b36f4 into sinaptik-ai:main Feb 13, 2025
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(Views): multiple joins #1611

fix(Views): multiple joins #1611

scaliseraoul commented Feb 13, 2025 •

edited by ellipsis-dev bot

Loading

ellipsis-dev bot left a comment

fix(Views): multiple joins #1611

fix(Views): multiple joins #1611

Conversation

scaliseraoul commented Feb 13, 2025 • edited by ellipsis-dev bot Loading

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

scaliseraoul commented Feb 13, 2025 •

edited by ellipsis-dev bot

Loading