Skip to content

Semantics of reprocessing data #33

@dpwrussell

Description

@dpwrussell

There are several use-cases that warrant reprocessing of data:

  • Failure during the scan stage to identify a fileset that might be a fixed in a new version of the scanner.
  • Failure during the extract stage to successfully extract a fileset that might be fixed in a new version of the extractor.
  • Failure during the scan/extract stage due to unpredicted serverside error that has been resolved.
  • Even if an extract phase is successfully completed, the extracted metadata or images might be less than optimal and benefit from reprocessing the fileset.

The exact semantics of this needs to be defined before coming up with an implementation strategy.

Questions:

  • Is a reprocessed import entirely replaced by the reprocessed one?
  • Is a reprocessed fileset entirely replaced by the reprocessed one?
  • If reprocessed imports/filesets do not replace the originals, what happens to the originals and how do we record this in the database?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions