Harvester force update option #5096

jbrown-xentity · 2025-02-19T18:03:10Z

User Story

In order to reprocess records with new code/code changes, data.gov admins want a "force update" feature that ignores the metadata source comparison check and updates datasets from the source.

Acceptance Criteria

[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]

GIVEN harvest code has been changed
AND a harvest source needs to be re-processed
WHEN a manual harvest is initialized with the "force update" flag
THEN all datasets from the source are processed (regardless of whether they have changed or not)

Background

Usually, the only thing that would have us want to re-process a dataset would be the metadata changing. However if the harvester code or logic changes, there are reasons to ignore those optimization checks and simply re-pull the data.
In the past we have sometimes managed this by clearing and re-harvesting a data source; this is inelegant and also causes downtime for datasets, and sometimes can result in URL changes for dataset pages. Update in place is much better.
In the future we could even implement running a force for all datasets; manually re-syncing data sources once a year seems like a good practice.

This will also allow for better bug-fixes as we go live, as changes may be required.

Security Considerations (required)

[Any security concerns that might be implicated in the change. "None" is OK, just be explicit here!]

Sketch

Update the UI to include a "force update" checkbox.
Update the job database object to support this flag.
Create logic in the harvester (https://github.com/GSA/datagov-harvester/blob/main/harvester/harvest.py#L277-L280) if this flag is set to ignore hash differences and re-process regardless.

The text was updated successfully, but these errors were encountered:

jbrown-xentity added the H2.0/Harvest-General General Harvesting 2.0 Issues label Feb 19, 2025

jbrown-xentity added this to data.gov team board Feb 19, 2025

rshewitt mentioned this issue Feb 20, 2025

improve ISO19115-1 & ISO19115-2 to DCATUS data transfer #5100

Open

1 task

Bagesary moved this to 📥 Queue in data.gov team board Feb 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Harvester force update option #5096

Harvester force update option #5096

jbrown-xentity commented Feb 19, 2025

Harvester force update option #5096

Harvester force update option #5096

Comments

jbrown-xentity commented Feb 19, 2025

User Story

Acceptance Criteria

Background

Security Considerations (required)

Sketch