Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(Views): multiple joins #1611

Merged
merged 3 commits into from
Feb 13, 2025
Merged

Conversation

scaliseraoul
Copy link
Contributor

@scaliseraoul scaliseraoul commented Feb 13, 2025


Important

Fixes handling of multiple join conditions in ViewQueryBuilder by grouping and combining them, with new tests added for verification.

  • Behavior:
    • Fixes handling of multiple join conditions for the same target dataset in ViewQueryBuilder by grouping and combining conditions with 'AND'.
  • Tests:
    • Adds test_multiple_joins_same_table to verify multiple join conditions are combined correctly.
    • Adds test_three_table_join to verify joining three different tables with correct conditions.

This description was created by Ellipsis for 72b0584. It will automatically update as commits are pushed.

…355-query

* commit '61ba3530794fbe2b5739a04c26dc35e704ce69c9':
  fix(dataset): slug format validation on load (sinaptik-ai#1609)
  fix(views): transformation using raw sql (sinaptik-ai#1608)
  Release v3.0.0-beta.10
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Reviewed everything up to 72b0584 in 2 minutes and 44 seconds

More details
  • Looked at 176 lines of code in 2 files
  • Skipped 0 files when reviewing.
  • Skipped posting 8 drafted comments based on config settings.
1. pandasai/query_builders/view_query_builder.py:127
  • Draft comment:
    Ensure that the target dataset key exists in schema_dependencies_dict before using it. Consider adding error handling for missing keys to prevent KeyError.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50%
    The comment raises a valid concern about potential KeyErrors. However, schema_dependencies_dict is a core dependency passed in constructor, and to_datasets comes from schema.relations which should be validated at schema creation time. This feels like it should be handled at a higher architectural level, not in this specific method. The comment is asking for defensive programming in what appears to be an internal API.
    I might be wrong about schema validation - perhaps invalid relations could slip through. Also, defensive programming can be good practice even for internal APIs.
    While defensive programming is good, this particular case feels like it would be masking a more fundamental data integrity issue that should be caught earlier. Adding error handling here would be treating a symptom rather than the cause.
    Delete the comment as it suggests error handling at the wrong level of abstraction. Data integrity should be enforced at schema validation time.
2. pandasai/query_builders/view_query_builder.py:131
  • Draft comment:
    Using ' AND '.join(conditions) is fine; ensure that sanitize_view_column_name fully prevents SQL injection in join conditions.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 50%
    None
3. tests/unit_tests/query_builders/test_view_query_builder.py:266
  • Draft comment:
    Hardcoding expected multi-line SQL strings in tests can be brittle due to formatting. Consider comparing parsed SQL expressions or using a normalization function in tests.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50%
    The comment raises a valid point about test maintainability - string comparisons of SQL can be fragile if formatting changes. However, the current approach is consistent throughout the test file and seems to be working well. The suggestion would require significant test refactoring and adding new dependencies/utilities for SQL parsing. The current approach, while not perfect, is simple and readable.
    The comment identifies a real testing best practice issue. SQL parsing would make tests more robust against formatting changes.
    While technically correct, implementing the suggestion would add complexity for minimal benefit since the current string-based tests are working well and are consistent throughout the codebase.
    Delete the comment since it suggests a major testing approach change that would add complexity without providing enough benefit to justify the refactoring effort.
4. pandasai/query_builders/view_query_builder.py:118
  • Draft comment:
    Consider renaming the variable 'to_datasets' to 'target_dataset' for clarity, since it represents a single target dataset extracted from the join relation.
  • Reason this comment was not posted:
    Marked as duplicate.
5. pandasai/query_builders/view_query_builder.py:121
  • Draft comment:
    In the join condition strings, sanitize_view_column_name is used directly. Consider if using self.normalize_view_column_name would be more consistent with column aliasing. If the intent is to preserve table.column format for join predicates, document this choice.
  • Reason this comment was not posted:
    Marked as duplicate.
6. pandasai/query_builders/view_query_builder.py:126
  • Draft comment:
    The logic grouping join conditions by target dataset and combining them with 'AND' is clear and DRY. Ensure that tests sufficiently cover scenarios where multiple join conditions exist for the same target table.
  • Reason this comment was not posted:
    Marked as duplicate.
7. tests/unit_tests/query_builders/test_view_query_builder.py:243
  • Draft comment:
    New tests (test_multiple_joins_same_table and test_three_table_join) provide comprehensive coverage for multi-join scenarios. Consider adding brief inline comments to clarify their intent, and monitor that the pretty-printed SQL output remains stable with sqlglot updates.
  • Reason this comment was not posted:
    Marked as duplicate.
8. tests/unit_tests/query_builders/test_view_query_builder.py:16
  • Draft comment:
    The helper method _create_mock_loader is effective for creating mock loaders. If it becomes widely used across tests, consider refactoring it as a pytest fixture to promote reuse.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50%
    The suggestion is about code organization and test maintainability. The method is currently used in two new tests, but it's not clear if it will be widely used elsewhere. The comment is speculative ("if it becomes widely used") rather than addressing a current issue. Converting to a fixture would require additional work without clear immediate benefit.
    The comment makes a reasonable suggestion for potential future refactoring, but it's speculative and not based on current widespread usage. The current implementation as a helper method is perfectly valid for its current usage.
    While fixtures are good for test reuse, making premature abstractions based on potential future use cases can lead to unnecessary complexity. The current helper method approach is appropriate for its current limited usage.
    The comment should be deleted as it's speculative and suggests a refactoring that isn't clearly needed given the current usage pattern.

Workflow ID: wflow_KGNpdW0aG8wfJk34


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@gventuri gventuri merged commit a1b36f4 into sinaptik-ai:main Feb 13, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants