Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: fixing datasets naming convention #1623

Merged
merged 1 commit into from
Feb 18, 2025

Conversation

scaliseraoul
Copy link
Contributor

@scaliseraoul scaliseraoul commented Feb 18, 2025


Important

Standardizes dataset naming conventions to use underscores instead of dashes and ensures lowercase usage, with updates to code and tests for validation and transformation.

  • Behavior:
    • Enforces dataset names to be lowercase and use underscores instead of dashes in create() in pandasai/__init__.py.
    • Updates push() in base.py to use transformed dataset names.
    • Adds validation for underscore format in SemanticLayerSchema in semantic_layer_schema.py.
  • Helpers:
    • Adds transform_dash_to_underscore() and transform_underscore_to_dash() in path.py.
    • Adds validate_underscore_name_format() in path.py.
  • Tests:
    • Updates tests in test_loader.py, test_semantic_layer_schema.py, and test_pandasai_init.py to reflect new naming conventions.
    • Adds tests for invalid dataset names in test_loader.py and test_semantic_layer_schema.py.

This description was created by Ellipsis for 614a9ca. It will automatically update as commits are pushed.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Reviewed everything up to 614a9ca in 2 minutes and 8 seconds

More details
  • Looked at 327 lines of code in 10 files
  • Skipped 0 files when reviewing.
  • Skipped posting 12 drafted comments based on config settings.
1. pandasai/__init__.py:148
  • Draft comment:
    Good use of transform_dash_to_underscore; the dataset name is now consistently formatted. Consider documenting this change in the changelog.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50%
    None
2. pandasai/helpers/path.py:59
  • Draft comment:
    New transform functions look correct. Ensure these utility functions have unit tests in case additional edge cases arise.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50%
    None
3. pandasai/data_loader/loader.py:66
  • Draft comment:
    Conversion of dataset_path using transform_underscore_to_dash is clear; verify this behavior aligns with user expectations when naming datasets.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50%
    None
4. pandasai/data_loader/semantic_layer_schema.py:334
  • Draft comment:
    Updated regex for view column names is precise. Consider adding a comment explaining the regex choice for clarity.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50%
    None
5. tests/unit_tests/query_builders/test_query_builder.py:152
  • Draft comment:
    Injection tests for table and column names are robust. Ensure these tests remain updated if query formatting logic changes.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50%
    None
6. tests/unit_tests/test_pandasai_init.py:233
  • Draft comment:
    Error messages in tests (e.g., organization name validation) are clear. Consider using regex patterns in tests for flexibility against minor wording changes.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50%
    None
7. pandasai/__init__.py:125
  • Draft comment:
    Clarify in the docstring that while the dataset path (organization/dataset) uses dashes, the schema name is internally converted to use underscores for consistency.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 50%
    None
8. pandasai/data_loader/loader.py:66
  • Draft comment:
    Ensure the transformation (transform_underscore_to_dash) applied to the dataset path is documented so that the naming convention (dashes for directories vs underscores for schema names) is clear.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 50%
    None
9. pandasai/data_loader/semantic_layer_schema.py:334
  • Draft comment:
    The updated regex now enforces underscores instead of dashes in table (dataset) names for views. Consider compiling this regex as a constant if it’s used frequently.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 50%
    None
10. pandasai/helpers/path.py:59
  • Draft comment:
    Update the docstrings for transform_dash_to_underscore and transform_underscore_to_dash to clearly explain that path names should use dashes but schema names internally use underscores.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 50%
    None
11. tests/unit_tests/data_loader/test_loader.py:35
  • Draft comment:
    Unit tests have been updated to check for the new naming conventions. Good job ensuring the error messages match the updated validation rules.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50%
    None
12. tests/unit_tests/dataframe/test_semantic_layer_schema.py:12
  • Draft comment:
    Schema tests now reflect the updated convention (e.g. 'users' instead of 'Users'). Ensure that the documentation and any user-facing messages are updated accordingly.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50%
    None

Workflow ID: wflow_nrK3NKLUN46PSJlU


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@gventuri gventuri merged commit 0306d01 into sinaptik-ai:main Feb 18, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants