Effect of `evaluate!(mach, ...)` is unpredictable. Retrain on all data and re-evaluate? #1027

ablaom · 2023-06-15T22:14:55Z

Currently, evaluate!(mach, ...) will mutate mach because mach is repeatedly retrained on different views of the data; mach is never trained on all supplied data (or all specified rows=...).

But there is really no reason for the user to guess the final state of the mach, as she is unlikely to know what the last train/test fold is without looking at the code, especially in the case of acceleration=CPUParallel (where I think it is impossible) or acceleration=CPUThreads.

So, mach, after the evaluate!(mach, ...) is in an ambiguous state, and so kind of useless unless the user explicitly retrains.

I wonder what others feel of the following suggestion:

New default behavior: After evaluate!(mach, ...) performs it's current function, mach is trained one last time on all specified rows (all rows by default) and the metrics are evaluated one last time to get training scores, which are added to the returned PerformanceEvaluation object.
Add a new keyword argument retrain=true which can be set to false to suppress the final retrain.

Probably we add the keyword option to evaluate(model, data...) as well, but maybe with retrain=false default??

@ExpandingMan @OkonSamuel

The text was updated successfully, but these errors were encountered:

ablaom · 2023-06-15T22:16:58Z

@CameronBieganek @tylerjthomas9

tylerjthomas9 · 2023-06-16T17:59:23Z

I think that this is a fantastic change. As you said, the current mach returned by evaluate! is not useful. It may add some complications that evaluate and evaluate have different default values for retrain, but I think that it makes logical sense to not retrain by default with evaluate(model, data...).

CameronBieganek · 2023-06-16T19:19:27Z

If we set aside learning networks, it seems like we don't really need evaluate!. The non-mutating evaluate is sufficient and more intuitive.

Is there a way to write a non-mutating evaluate that works on learning networks? Or is there already a way to use evaluate on learning networks that is not immediately obvious to me? I'm rusty on MLJ, since I unfortunately have to use Python at work. :(

OkonSamuel · 2023-06-16T19:44:35Z

If we set aside learning networks, it seems like we don't really need evaluate!. The non-mutating evaluate is sufficient and more intuitive.

Is there a way to write a non-mutating evaluate that works on learning networks? Or is there already a way to use evaluate on learning networks that is not immediately obvious to me? I'm rusty on MLJ, since I unfortunately have to use Python at work. :(

I agree with @CameronBieganek that it would have been best if evaluate just returned a performance evaluation object and does not mutate the machine.
We could still have the evaluate! method as an internal method for convenience.
For learning networks now that the data isn't attached to the network anymore, we can make it such that calling evaluate on a machine passes a deepcopy of the machine to the internal evaluate! method.
I don't know how this will affect models that wraps external c libraries.

ablaom · 2023-06-17T01:42:55Z

This is great feedback, thanks.

I'm torn between dumping mutation altogether (advantage: simplicity) and proceeding as I propose above (convenience). Some context: In a summer project we are working on auto logging of workflows using MLFlow, and in this context it seemed natural to log a training score, and a serialised set of learned parameters, each time a model is "evaluated". This saves having to specify metrics a second time if wanting the training score.

Is there a way to write a non-mutating evaluate that works on learning networks? Or is there already a way to use evaluate on learning networks that is not immediately obvious to me?

@CameronBieganek Not sure what you mean here. Perhaps you mean calling evaluate! on "learning network machines" (a special kind of machine), now deprecated? Generally the idea with learning networks is that they should be "exported" as stand-alone composite models in serious use. There was a simplification to this exporting process you may have missed. Learning network machines, once an intermediate step in the export process, have been deprecated as seen as unnecessary complication.

ablaom · 2023-06-21T03:31:41Z

Okay, I now remember the reason for the existing behaviour. The use case is evaluating models that support "warm restart". If using Holdout() (or other single test/train pair of row indices) and you re-evaluate the machine after changing a model hyperparameter then, for some hyperparameters like n_iterations, the re-evaluation will not require retraining the machine from scratch.

On another matter, perhaps a better way to get training scores, is to add a resampling strategy InSample() which has the single train/test pair of row indicestrain=rows and test=rows.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Effect of `evaluate!(mach, ...)` is unpredictable. Retrain on all data and re-evaluate? #1027

Effect of `evaluate!(mach, ...)` is unpredictable. Retrain on all data and re-evaluate? #1027

ablaom commented Jun 15, 2023 •

edited

ablaom commented Jun 15, 2023 •

edited

tylerjthomas9 commented Jun 16, 2023

CameronBieganek commented Jun 16, 2023

OkonSamuel commented Jun 16, 2023

ablaom commented Jun 17, 2023 •

edited

ablaom commented Jun 21, 2023

Effect of evaluate!(mach, ...) is unpredictable. Retrain on all data and re-evaluate? #1027

Effect of evaluate!(mach, ...) is unpredictable. Retrain on all data and re-evaluate? #1027

Comments

ablaom commented Jun 15, 2023 • edited

ablaom commented Jun 15, 2023 • edited

tylerjthomas9 commented Jun 16, 2023

CameronBieganek commented Jun 16, 2023

OkonSamuel commented Jun 16, 2023

ablaom commented Jun 17, 2023 • edited

ablaom commented Jun 21, 2023

Effect of `evaluate!(mach, ...)` is unpredictable. Retrain on all data and re-evaluate? #1027

Effect of `evaluate!(mach, ...)` is unpredictable. Retrain on all data and re-evaluate? #1027

ablaom commented Jun 15, 2023 •

edited

ablaom commented Jun 15, 2023 •

edited

ablaom commented Jun 17, 2023 •

edited