-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(dataset): slug format validation on load #1609
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ Changes requested. Reviewed everything up to e911abc in 2 minutes and 28 seconds
More details
- Looked at
89
lines of code in3
files - Skipped
0
files when reviewing. - Skipped posting
4
drafted comments based on config settings.
1. tests/test_cli.py:110
- Draft comment:
The test for dataset name starting with a hyphen is identical to the test for ending with a hyphen. Update one of these to use a dataset name ending with a hyphen (e.g. 'my-org/INVALID-DATASET-') to ensure both edge cases are covered. - Reason this comment was not posted:
Marked as duplicate.
2. pandasai/helpers/path.py:45
- Draft comment:
Ensure the regex in validate_name_format properly checks formatting constraints as intended. The current regex disallows names starting or ending with a hyphen, which is correct, but the error message might be more specific about not allowing leading or trailing hyphens. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50%
The comment violates several rules: 1) It starts with "Ensure" which is asking for verification 2) It's not clearly requiring a code change 3) It's more of a suggestion/observation than a clear issue. The error messages are already clear enough about using hyphens instead of spaces.
The error messages could be more specific, and having clearer error messages is generally good practice. Maybe I'm being too strict about the "Ensure" rule.
While better error messages would be nice, the comment is phrased as a verification request rather than a clear change requirement. The existing messages are adequate for users to understand the format.
Delete this comment as it's phrased as a verification request and doesn't clearly indicate required changes. The existing error messages are sufficient.
3. pandasai/__init__.py:250
- Draft comment:
The check for empty org_name or dataset_name is redundant, as get_validated_dataset_path already validates and raises errors. Consider removing this extra check for cleaner code. - Reason this comment was not posted:
Marked as duplicate.
4. tests/test_cli.py:119
- Draft comment:
The test for a dataset name ending with a hyphen is identical to the one for starting with a hyphen. Update it to use an input like 'my-org/my-dataset-' to properly test a trailing hyphen. - Reason this comment was not posted:
Marked as duplicate.
Workflow ID: wflow_aGEeFJXC2egSsXjR
Want Ellipsis to fix these issues? Tag @ellipsis-dev
in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on cc40b33 in 2 minutes and 21 seconds
More details
- Looked at
63
lines of code in1
files - Skipped
0
files when reviewing. - Skipped posting
13
drafted comments based on config settings.
1. tests/unit_tests/test_pandasai_init.py:123
- Draft comment:
Dataset slug updated from underscore to hyphen. Ensure error messages and documentation reflect this new format. - Reason this comment was not posted:
Comment looked like it was already resolved.
2. tests/unit_tests/test_pandasai_init.py:141
- Draft comment:
Ensure the error message string matches the new slug format with hyphen. - Reason this comment was not posted:
Marked as duplicate.
3. tests/unit_tests/test_pandasai_init.py:157
- Draft comment:
Consistently update dataset path in error message. Verify both local and remote messages reflect 'org/dataset-name'. - Reason this comment was not posted:
Marked as duplicate.
4. tests/unit_tests/test_pandasai_init.py:170
- Draft comment:
Dataset path updated in test_load_missing_api_url. Ensure consistency of slug format. - Reason this comment was not posted:
Marked as duplicate.
5. tests/unit_tests/test_pandasai_init.py:184
- Draft comment:
Dataset slug updated for error check in test_load_missing_not_found. Confirm uniform usage of hyphen over underscore. - Reason this comment was not posted:
Marked as duplicate.
6. tests/unit_tests/test_pandasai_init.py:211
- Draft comment:
Ensure dataset slug in test_load_successful_zip_extraction uses the new hyphen format consistently. - Reason this comment was not posted:
Marked as duplicate.
7. tests/unit_tests/test_pandasai_init.py:160
- Draft comment:
Consider updating 'valid permits' to 'valid permissions' in error messages for clearer English. - Reason this comment was not posted:
Marked as duplicate.
8. tests/unit_tests/test_pandasai_init.py:123
- Draft comment:
Excellent consistency: the dataset path is now updated to use a hyphen ('-') instead of an underscore ('_'). This change aligns with slug format conventions. - Reason this comment was not posted:
Confidence changes required:0%
<= threshold50%
None
9. tests/unit_tests/test_pandasai_init.py:141
- Draft comment:
Dataset path string updated to 'org/dataset-name' for the not-found case. This ensures consistent slug format in the test. - Reason this comment was not posted:
Confidence changes required:0%
<= threshold50%
None
10. tests/unit_tests/test_pandasai_init.py:157
- Draft comment:
Error message matched string now reflects the updated slug. Consider rephrasing the error text for clarity, e.g., 'Please set a valid API key to fetch the dataset from the remote server.' - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 50%
The current error message is actually more informative than the suggested one. It explains that: 1) The dataset wasn't found locally 2) No API key was provided 3) What action to take. The suggested message only covers point FIX README.md #3 and loses important context. The current message follows good error message practices by being specific and actionable.
I could be overvaluing verbosity - sometimes shorter, more direct messages are better. The current message might be too long.
While conciseness has value, error messages should prioritize clarity and actionability. The current message achieves this by providing complete context and clear next steps.
The comment should be deleted. The current error message is well-constructed, providing necessary context and clear guidance. The suggested change would remove valuable information without improving clarity.
11. tests/unit_tests/test_pandasai_init.py:170
- Draft comment:
Consistently update dataset path to 'org/dataset-name' in test_load_missing_api_url. - Reason this comment was not posted:
Confidence changes required:0%
<= threshold50%
None
12. tests/unit_tests/test_pandasai_init.py:184
- Draft comment:
Dataset path in test_load_missing_not_found updated correctly; ensuring consistency with slug format. - Reason this comment was not posted:
Confidence changes required:0%
<= threshold50%
None
13. tests/unit_tests/test_pandasai_init.py:211
- Draft comment:
Updated dataset path in the successful zip extraction test to match new slug format. - Reason this comment was not posted:
Confidence changes required:0%
<= threshold50%
None
Workflow ID: wflow_emveKHs5SDafuHqK
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1609 +/- ##
==========================================
+ Coverage 88.98% 89.02% +0.03%
==========================================
Files 71 71
Lines 2687 2687
==========================================
+ Hits 2391 2392 +1
+ Misses 296 295 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ Changes requested. Incremental review on a5f0926 in 1 minute and 29 seconds
More details
- Looked at
17
lines of code in1
files - Skipped
0
files when reviewing. - Skipped posting
3
drafted comments based on config settings.
1. pandasai/__init__.py:248
- Draft comment:
Removed tuple unpacking from get_validated_dataset_path. Ensure that this function now internally validates the slug format and raises an error if the format is incorrect. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50%
The comment is asking the author to verify/ensure something about the internal workings of another function that we can't see. This violates the rule about not asking authors to verify things. Additionally, we can't see the implementation of get_validated_dataset_path() so we can't verify if this change is correct or not. The comment is speculative about what might need to be checked.
The function name implies validation happens inside it, so maybe this is a valid concern about error handling being removed. The old code did explicitly check the return values.
Even if there's a potential issue, asking the author to "ensure" something is not actionable feedback. If there's a specific problem, it should be stated directly with evidence.
This comment should be deleted as it asks the author to verify/ensure something rather than making a specific, actionable suggestion, and we lack context about the validation function's implementation.
2. pandasai/__init__.py:247
- Draft comment:
Ensure that get_validated_dataset_path now raises an error for invalid paths. Previously, the tuple values were checked to enforce the 'organization/dataset' format. With the new change, confirm that the helper performs this validation; otherwise, invalid slugs may pass silently. - Reason this comment was not posted:
Marked as duplicate.
3. pandasai/__init__.py:247
- Draft comment:
Add or update unit tests to cover invalid dataset slug formats. This ensures robust validation when loading datasets. - Reason this comment was not posted:
Marked as duplicate.
Workflow ID: wflow_6WyTFnNbXF873sfU
Want Ellipsis to fix these issues? Tag @ellipsis-dev
in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
…355-query * commit '61ba3530794fbe2b5739a04c26dc35e704ce69c9': fix(dataset): slug format validation on load (sinaptik-ai#1609) fix(views): transformation using raw sql (sinaptik-ai#1608) Release v3.0.0-beta.10
Important
Enhances dataset path validation in
load()
by enforcing 'organization/dataset-name' format and updates tests accordingly.load()
inpandasai/__init__.py
now usesget_validated_dataset_path()
for dataset path validation.validate_name_format()
inhelpers/path.py
to check name format.get_validated_dataset_path()
to usevalidate_name_format()
for organization and dataset names.test_cli.py
for invalid dataset names starting or ending with hyphens.test_pandasai_init.py
to use hyphenated dataset names.This description was created by
for a5f0926. It will automatically update as commits are pushed.