Allow users to compute comps using existing model object

For a complicated set of reasons, it's currently quite annoying to rerun comps for an existing model, something that we've wanted to do for the past two final models. Let me explain the reasons in detail:

Currently, users can only compute comps in [the `interpret` stage](https://github.com/ccao-data/model-res-avm/blob/3a04b970bcd7ad5acf5a7c9251094e2043471d64/pipeline/04-interpret.R#L166-L350) as part of a model pipeline run. It is also extremely resource-intensive to compute comps, meaning we can generally only run it on remote infrastructure like AWS Batch unless we're running on a small subset of data. These two constraints on the comps calculation process combine to create a situation in which we have to run an entirely new model, generating new values and a new run ID in the process, if we want to regenerate comps for a final model. This situation is particularly problematic given that model values are not fully deterministic (https://github.com/ccao-data/model-res-avm/issues/373), so we can't just use a comps run in place of a final model run when publishing comps; instead, we need to take care to use a final model run for publishing values, while using a subsequent comps run for publishing comps. This creates unnecessary complexity in the process of publishing comps, and it also feels counterintuitive, since we're not actually changing anything related to the model structure itself, just rerunning comps.

I propose that we refactor our comps process and data model in order to make it possible for users to run remote jobs to recalculate comps for existing models.

High level steps for for this refactor include:

* Tweak the comps data model to allow for multiple runs using the same model run, since `model.comp` is currently unique by `(run_id, pin, card)` but the new approach will allow multiple runs per run ID
  * I think the easiest solution is probably adding an incrementing `version` column, such that subsequent runs using the same run ID increment the `version` field, and the row with the highest `version` is the published comp; if we do this we should migrate existing comps data to add `version = 1` for all existing comps  
* Update the `interpret` pipeline stage to save comps using the new data model
* Update code that consumes comps (like PINVAL and the 2-PIN comparison doc) to handle the new data model, which we could do two ways:
    1. Tweak the consuming code to query the most recent `version` for all comps
    2. Switch consuming code to point to a new view called something like `model.vw_comp` that performs the version filtering logic and thereby maintains uniqueness by `(run_id, pin, card)`
        * This is probably the better path forward 
* Make a new GitHub workflow to run comps based on an existing model run ID
  * Overall workflow logic: Pull model artifacts for the supplied run ID, then run `dvc repro interpret` to run the `interpret` stage with `comp_enable=True` and write a set of comps with an incremented `version`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow users to compute comps using existing model object #383

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Allow users to compute comps using existing model object #383

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions