You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In order to reprocess records with new code/code changes, data.gov admins want a "force update" feature that ignores the metadata source comparison check and updates datasets from the source.
Acceptance Criteria
[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]
GIVEN harvest code has been changed
AND a harvest source needs to be re-processed
WHEN a manual harvest is initialized with the "force update" flag
THEN all datasets from the source are processed (regardless of whether they have changed or not)
Background
Usually, the only thing that would have us want to re-process a dataset would be the metadata changing. However if the harvester code or logic changes, there are reasons to ignore those optimization checks and simply re-pull the data.
In the past we have sometimes managed this by clearing and re-harvesting a data source; this is inelegant and also causes downtime for datasets, and sometimes can result in URL changes for dataset pages. Update in place is much better.
In the future we could even implement running a force for all datasets; manually re-syncing data sources once a year seems like a good practice.
This will also allow for better bug-fixes as we go live, as changes may be required.
User Story
In order to reprocess records with new code/code changes, data.gov admins want a "force update" feature that ignores the metadata source comparison check and updates datasets from the source.
Acceptance Criteria
[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]
AND a harvest source needs to be re-processed
WHEN a manual harvest is initialized with the "force update" flag
THEN all datasets from the source are processed (regardless of whether they have changed or not)
Background
Usually, the only thing that would have us want to re-process a dataset would be the metadata changing. However if the harvester code or logic changes, there are reasons to ignore those optimization checks and simply re-pull the data.
In the past we have sometimes managed this by clearing and re-harvesting a data source; this is inelegant and also causes downtime for datasets, and sometimes can result in URL changes for dataset pages. Update in place is much better.
In the future we could even implement running a force for all datasets; manually re-syncing data sources once a year seems like a good practice.
This will also allow for better bug-fixes as we go live, as changes may be required.
Security Considerations (required)
[Any security concerns that might be implicated in the change. "None" is OK, just be explicit here!]
Sketch
job
database object to support this flag.The text was updated successfully, but these errors were encountered: