fix(dataset): slug format validation on load #1609

ArslanSaleem · 2025-02-13T15:18:59Z

Important

Enhances dataset path validation in load() by enforcing 'organization/dataset-name' format and updates tests accordingly.

Behavior:
- load() in pandasai/__init__.py now uses get_validated_dataset_path() for dataset path validation.
- Validates that dataset paths are in the format 'organization/dataset-name'.
Helpers:
- Adds validate_name_format() in helpers/path.py to check name format.
- Updates get_validated_dataset_path() to use validate_name_format() for organization and dataset names.
Tests:
- Adds tests in test_cli.py for invalid dataset names starting or ending with hyphens.
- Updates tests in test_pandasai_init.py to use hyphenated dataset names.

^{This description was created by}^{for a5f0926. It will automatically update as commits are pushed.}

…ing request

ellipsis-dev

❌ Changes requested. Reviewed everything up to e911abc in 2 minutes and 28 seconds

More details

Looked at 89 lines of code in 3 files
Skipped 0 files when reviewing.
Skipped posting 4 drafted comments based on config settings.

1. tests/test_cli.py:110

Draft comment:
The test for dataset name starting with a hyphen is identical to the test for ending with a hyphen. Update one of these to use a dataset name ending with a hyphen (e.g. 'my-org/INVALID-DATASET-') to ensure both edge cases are covered.
Reason this comment was not posted:
Marked as duplicate.

2. pandasai/helpers/path.py:45

Draft comment:
Ensure the regex in validate_name_format properly checks formatting constraints as intended. The current regex disallows names starting or ending with a hyphen, which is correct, but the error message might be more specific about not allowing leading or trailing hyphens.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50%
The comment violates several rules: 1) It starts with "Ensure" which is asking for verification 2) It's not clearly requiring a code change 3) It's more of a suggestion/observation than a clear issue. The error messages are already clear enough about using hyphens instead of spaces.
The error messages could be more specific, and having clearer error messages is generally good practice. Maybe I'm being too strict about the "Ensure" rule.
While better error messages would be nice, the comment is phrased as a verification request rather than a clear change requirement. The existing messages are adequate for users to understand the format.
Delete this comment as it's phrased as a verification request and doesn't clearly indicate required changes. The existing error messages are sufficient.

3. pandasai/__init__.py:250

Draft comment:
The check for empty org_name or dataset_name is redundant, as get_validated_dataset_path already validates and raises errors. Consider removing this extra check for cleaner code.
Reason this comment was not posted:
Marked as duplicate.

4. tests/test_cli.py:119

Draft comment:
The test for a dataset name ending with a hyphen is identical to the one for starting with a hyphen. Update it to use an input like 'my-org/my-dataset-' to properly test a trailing hyphen.
Reason this comment was not posted:
Marked as duplicate.

Workflow ID: wflow_aGEeFJXC2egSsXjR

Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

pandasai/__init__.py

ellipsis-dev

👍 Looks good to me! Incremental review on cc40b33 in 2 minutes and 21 seconds

More details

Looked at 63 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 13 drafted comments based on config settings.

1. tests/unit_tests/test_pandasai_init.py:123

Draft comment:
Dataset slug updated from underscore to hyphen. Ensure error messages and documentation reflect this new format.
Reason this comment was not posted:
Comment looked like it was already resolved.

2. tests/unit_tests/test_pandasai_init.py:141

Draft comment:
Ensure the error message string matches the new slug format with hyphen.
Reason this comment was not posted:
Marked as duplicate.

3. tests/unit_tests/test_pandasai_init.py:157

Draft comment:
Consistently update dataset path in error message. Verify both local and remote messages reflect 'org/dataset-name'.
Reason this comment was not posted:
Marked as duplicate.

4. tests/unit_tests/test_pandasai_init.py:170

Draft comment:
Dataset path updated in test_load_missing_api_url. Ensure consistency of slug format.
Reason this comment was not posted:
Marked as duplicate.

5. tests/unit_tests/test_pandasai_init.py:184

Draft comment:
Dataset slug updated for error check in test_load_missing_not_found. Confirm uniform usage of hyphen over underscore.
Reason this comment was not posted:
Marked as duplicate.

6. tests/unit_tests/test_pandasai_init.py:211

Draft comment:
Ensure dataset slug in test_load_successful_zip_extraction uses the new hyphen format consistently.
Reason this comment was not posted:
Marked as duplicate.

7. tests/unit_tests/test_pandasai_init.py:160

Draft comment:
Consider updating 'valid permits' to 'valid permissions' in error messages for clearer English.
Reason this comment was not posted:
Marked as duplicate.

8. tests/unit_tests/test_pandasai_init.py:123

Draft comment:
Excellent consistency: the dataset path is now updated to use a hyphen ('-') instead of an underscore ('_'). This change aligns with slug format conventions.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50%
None

9. tests/unit_tests/test_pandasai_init.py:141

Draft comment:
Dataset path string updated to 'org/dataset-name' for the not-found case. This ensures consistent slug format in the test.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50%
None

10. tests/unit_tests/test_pandasai_init.py:157

Draft comment:
Error message matched string now reflects the updated slug. Consider rephrasing the error text for clarity, e.g., 'Please set a valid API key to fetch the dataset from the remote server.'
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 50%
The current error message is actually more informative than the suggested one. It explains that: 1) The dataset wasn't found locally 2) No API key was provided 3) What action to take. The suggested message only covers point FIX README.md #3 and loses important context. The current message follows good error message practices by being specific and actionable.
I could be overvaluing verbosity - sometimes shorter, more direct messages are better. The current message might be too long.
While conciseness has value, error messages should prioritize clarity and actionability. The current message achieves this by providing complete context and clear next steps.
The comment should be deleted. The current error message is well-constructed, providing necessary context and clear guidance. The suggested change would remove valuable information without improving clarity.

11. tests/unit_tests/test_pandasai_init.py:170

Draft comment:
Consistently update dataset path to 'org/dataset-name' in test_load_missing_api_url.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50%
None

12. tests/unit_tests/test_pandasai_init.py:184

Draft comment:
Dataset path in test_load_missing_not_found updated correctly; ensuring consistency with slug format.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50%
None

13. tests/unit_tests/test_pandasai_init.py:211

Draft comment:
Updated dataset path in the successful zip extraction test to match new slug format.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50%
None

Workflow ID: wflow_emveKHs5SDafuHqK

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

codecov · 2025-02-13T15:33:02Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.02%. Comparing base (b746056) to head (a5f0926).
Report is 2 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1609      +/-   ##
==========================================
+ Coverage   88.98%   89.02%   +0.03%     
==========================================
  Files          71       71              
  Lines        2687     2687              
==========================================
+ Hits         2391     2392       +1     
+ Misses        296      295       -1

Flag	Coverage Δ
unittests	`89.02% <100.00%> (+0.03%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ellipsis-dev

❌ Changes requested. Incremental review on a5f0926 in 1 minute and 29 seconds

More details

Looked at 17 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 3 drafted comments based on config settings.

1. pandasai/__init__.py:248

Draft comment:
Removed tuple unpacking from get_validated_dataset_path. Ensure that this function now internally validates the slug format and raises an error if the format is incorrect.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50%
The comment is asking the author to verify/ensure something about the internal workings of another function that we can't see. This violates the rule about not asking authors to verify things. Additionally, we can't see the implementation of get_validated_dataset_path() so we can't verify if this change is correct or not. The comment is speculative about what might need to be checked.
The function name implies validation happens inside it, so maybe this is a valid concern about error handling being removed. The old code did explicitly check the return values.
Even if there's a potential issue, asking the author to "ensure" something is not actionable feedback. If there's a specific problem, it should be stated directly with evidence.
This comment should be deleted as it asks the author to verify/ensure something rather than making a specific, actionable suggestion, and we lack context about the validation function's implementation.

2. pandasai/__init__.py:247

Draft comment:
Ensure that get_validated_dataset_path now raises an error for invalid paths. Previously, the tuple values were checked to enforce the 'organization/dataset' format. With the new change, confirm that the helper performs this validation; otherwise, invalid slugs may pass silently.
Reason this comment was not posted:
Marked as duplicate.

3. pandasai/__init__.py:247

Draft comment:
Add or update unit tests to cover invalid dataset slug formats. This ensures robust validation when loading datasets.
Reason this comment was not posted:
Marked as duplicate.

Workflow ID: wflow_6WyTFnNbXF873sfU

Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

pandasai/__init__.py

…355-query * commit '61ba3530794fbe2b5739a04c26dc35e704ce69c9': fix(dataset): slug format validation on load (sinaptik-ai#1609) fix(views): transformation using raw sql (sinaptik-ai#1608) Release v3.0.0-beta.10

gventuri and others added 7 commits February 10, 2025 12:35

chore: remove leftover sqlite export from sql conector

9113e19

Merge branch 'main' of https://github.com/gventuri/pandas-ai

bce1803

Merge branch 'main' of https://github.com/gventuri/pandas-ai

f89ad4a

fix(dataset_slug): failed to validate dataset_slug format before send…

6b2d41d

…ing request

remove extras

5bdda75

fix: extra file

db7890b

validate slug name on load

e911abc

ArslanSaleem requested a review from gventuri February 13, 2025 15:18

ellipsis-dev bot reviewed Feb 13, 2025

View reviewed changes

pandasai/__init__.py Outdated Show resolved Hide resolved

fix: test cases for _ issue

cc40b33

ellipsis-dev bot reviewed Feb 13, 2025

View reviewed changes

fix: remove redundant condition

a5f0926

ellipsis-dev bot reviewed Feb 13, 2025

View reviewed changes

pandasai/__init__.py Show resolved Hide resolved

gventuri merged commit 61ba353 into main Feb 13, 2025
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(dataset): slug format validation on load #1609

fix(dataset): slug format validation on load #1609

ArslanSaleem commented Feb 13, 2025 •

edited by ellipsis-dev bot

Loading

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

codecov bot commented Feb 13, 2025 •

edited

Loading

ellipsis-dev bot left a comment

fix(dataset): slug format validation on load #1609

fix(dataset): slug format validation on load #1609

Conversation

ArslanSaleem commented Feb 13, 2025 • edited by ellipsis-dev bot Loading

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

codecov bot commented Feb 13, 2025 • edited Loading

Codecov Report

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ArslanSaleem commented Feb 13, 2025 •

edited by ellipsis-dev bot

Loading

codecov bot commented Feb 13, 2025 •

edited

Loading