Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(dataset): slug format validation on load #1609

Merged
merged 9 commits into from
Feb 13, 2025
Merged

Conversation

ArslanSaleem
Copy link
Collaborator

@ArslanSaleem ArslanSaleem commented Feb 13, 2025

Important

Enhances dataset path validation in load() by enforcing 'organization/dataset-name' format and updates tests accordingly.

  • Behavior:
    • load() in pandasai/__init__.py now uses get_validated_dataset_path() for dataset path validation.
    • Validates that dataset paths are in the format 'organization/dataset-name'.
  • Helpers:
    • Adds validate_name_format() in helpers/path.py to check name format.
    • Updates get_validated_dataset_path() to use validate_name_format() for organization and dataset names.
  • Tests:
    • Adds tests in test_cli.py for invalid dataset names starting or ending with hyphens.
    • Updates tests in test_pandasai_init.py to use hyphenated dataset names.

This description was created by Ellipsis for a5f0926. It will automatically update as commits are pushed.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Reviewed everything up to e911abc in 2 minutes and 28 seconds

More details
  • Looked at 89 lines of code in 3 files
  • Skipped 0 files when reviewing.
  • Skipped posting 4 drafted comments based on config settings.
1. tests/test_cli.py:110
  • Draft comment:
    The test for dataset name starting with a hyphen is identical to the test for ending with a hyphen. Update one of these to use a dataset name ending with a hyphen (e.g. 'my-org/INVALID-DATASET-') to ensure both edge cases are covered.
  • Reason this comment was not posted:
    Marked as duplicate.
2. pandasai/helpers/path.py:45
  • Draft comment:
    Ensure the regex in validate_name_format properly checks formatting constraints as intended. The current regex disallows names starting or ending with a hyphen, which is correct, but the error message might be more specific about not allowing leading or trailing hyphens.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50%
    The comment violates several rules: 1) It starts with "Ensure" which is asking for verification 2) It's not clearly requiring a code change 3) It's more of a suggestion/observation than a clear issue. The error messages are already clear enough about using hyphens instead of spaces.
    The error messages could be more specific, and having clearer error messages is generally good practice. Maybe I'm being too strict about the "Ensure" rule.
    While better error messages would be nice, the comment is phrased as a verification request rather than a clear change requirement. The existing messages are adequate for users to understand the format.
    Delete this comment as it's phrased as a verification request and doesn't clearly indicate required changes. The existing error messages are sufficient.
3. pandasai/__init__.py:250
  • Draft comment:
    The check for empty org_name or dataset_name is redundant, as get_validated_dataset_path already validates and raises errors. Consider removing this extra check for cleaner code.
  • Reason this comment was not posted:
    Marked as duplicate.
4. tests/test_cli.py:119
  • Draft comment:
    The test for a dataset name ending with a hyphen is identical to the one for starting with a hyphen. Update it to use an input like 'my-org/my-dataset-' to properly test a trailing hyphen.
  • Reason this comment was not posted:
    Marked as duplicate.

Workflow ID: wflow_aGEeFJXC2egSsXjR


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on cc40b33 in 2 minutes and 21 seconds

More details
  • Looked at 63 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 13 drafted comments based on config settings.
1. tests/unit_tests/test_pandasai_init.py:123
  • Draft comment:
    Dataset slug updated from underscore to hyphen. Ensure error messages and documentation reflect this new format.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.
2. tests/unit_tests/test_pandasai_init.py:141
  • Draft comment:
    Ensure the error message string matches the new slug format with hyphen.
  • Reason this comment was not posted:
    Marked as duplicate.
3. tests/unit_tests/test_pandasai_init.py:157
  • Draft comment:
    Consistently update dataset path in error message. Verify both local and remote messages reflect 'org/dataset-name'.
  • Reason this comment was not posted:
    Marked as duplicate.
4. tests/unit_tests/test_pandasai_init.py:170
  • Draft comment:
    Dataset path updated in test_load_missing_api_url. Ensure consistency of slug format.
  • Reason this comment was not posted:
    Marked as duplicate.
5. tests/unit_tests/test_pandasai_init.py:184
  • Draft comment:
    Dataset slug updated for error check in test_load_missing_not_found. Confirm uniform usage of hyphen over underscore.
  • Reason this comment was not posted:
    Marked as duplicate.
6. tests/unit_tests/test_pandasai_init.py:211
  • Draft comment:
    Ensure dataset slug in test_load_successful_zip_extraction uses the new hyphen format consistently.
  • Reason this comment was not posted:
    Marked as duplicate.
7. tests/unit_tests/test_pandasai_init.py:160
  • Draft comment:
    Consider updating 'valid permits' to 'valid permissions' in error messages for clearer English.
  • Reason this comment was not posted:
    Marked as duplicate.
8. tests/unit_tests/test_pandasai_init.py:123
  • Draft comment:
    Excellent consistency: the dataset path is now updated to use a hyphen ('-') instead of an underscore ('_'). This change aligns with slug format conventions.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50%
    None
9. tests/unit_tests/test_pandasai_init.py:141
  • Draft comment:
    Dataset path string updated to 'org/dataset-name' for the not-found case. This ensures consistent slug format in the test.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50%
    None
10. tests/unit_tests/test_pandasai_init.py:157
  • Draft comment:
    Error message matched string now reflects the updated slug. Consider rephrasing the error text for clarity, e.g., 'Please set a valid API key to fetch the dataset from the remote server.'
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 50%
    The current error message is actually more informative than the suggested one. It explains that: 1) The dataset wasn't found locally 2) No API key was provided 3) What action to take. The suggested message only covers point FIX README.md #3 and loses important context. The current message follows good error message practices by being specific and actionable.
    I could be overvaluing verbosity - sometimes shorter, more direct messages are better. The current message might be too long.
    While conciseness has value, error messages should prioritize clarity and actionability. The current message achieves this by providing complete context and clear next steps.
    The comment should be deleted. The current error message is well-constructed, providing necessary context and clear guidance. The suggested change would remove valuable information without improving clarity.
11. tests/unit_tests/test_pandasai_init.py:170
  • Draft comment:
    Consistently update dataset path to 'org/dataset-name' in test_load_missing_api_url.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50%
    None
12. tests/unit_tests/test_pandasai_init.py:184
  • Draft comment:
    Dataset path in test_load_missing_not_found updated correctly; ensuring consistency with slug format.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50%
    None
13. tests/unit_tests/test_pandasai_init.py:211
  • Draft comment:
    Updated dataset path in the successful zip extraction test to match new slug format.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50%
    None

Workflow ID: wflow_emveKHs5SDafuHqK


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link

codecov bot commented Feb 13, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.02%. Comparing base (b746056) to head (a5f0926).
Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1609      +/-   ##
==========================================
+ Coverage   88.98%   89.02%   +0.03%     
==========================================
  Files          71       71              
  Lines        2687     2687              
==========================================
+ Hits         2391     2392       +1     
+ Misses        296      295       -1     
Flag Coverage Δ
unittests 89.02% <100.00%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Incremental review on a5f0926 in 1 minute and 29 seconds

More details
  • Looked at 17 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 3 drafted comments based on config settings.
1. pandasai/__init__.py:248
  • Draft comment:
    Removed tuple unpacking from get_validated_dataset_path. Ensure that this function now internally validates the slug format and raises an error if the format is incorrect.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50%
    The comment is asking the author to verify/ensure something about the internal workings of another function that we can't see. This violates the rule about not asking authors to verify things. Additionally, we can't see the implementation of get_validated_dataset_path() so we can't verify if this change is correct or not. The comment is speculative about what might need to be checked.
    The function name implies validation happens inside it, so maybe this is a valid concern about error handling being removed. The old code did explicitly check the return values.
    Even if there's a potential issue, asking the author to "ensure" something is not actionable feedback. If there's a specific problem, it should be stated directly with evidence.
    This comment should be deleted as it asks the author to verify/ensure something rather than making a specific, actionable suggestion, and we lack context about the validation function's implementation.
2. pandasai/__init__.py:247
  • Draft comment:
    Ensure that get_validated_dataset_path now raises an error for invalid paths. Previously, the tuple values were checked to enforce the 'organization/dataset' format. With the new change, confirm that the helper performs this validation; otherwise, invalid slugs may pass silently.
  • Reason this comment was not posted:
    Marked as duplicate.
3. pandasai/__init__.py:247
  • Draft comment:
    Add or update unit tests to cover invalid dataset slug formats. This ensures robust validation when loading datasets.
  • Reason this comment was not posted:
    Marked as duplicate.

Workflow ID: wflow_6WyTFnNbXF873sfU


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@gventuri gventuri merged commit 61ba353 into main Feb 13, 2025
15 checks passed
scaliseraoul added a commit to scaliseraoul/pandas-ai that referenced this pull request Feb 13, 2025
…355-query

* commit '61ba3530794fbe2b5739a04c26dc35e704ce69c9':
  fix(dataset): slug format validation on load (sinaptik-ai#1609)
  fix(views): transformation using raw sql (sinaptik-ai#1608)
  Release v3.0.0-beta.10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants