Using hyperparameters from a checkpoint that is "weight-only" #194

sfmig · 2024-06-25T14:55:07Z

If we merge #182 with the suggested fix, we will be able to restart training from a checkpoint. However, we will always use the hparams as specified in the config .yaml file.

Would we want to have the option to use the hparams from the checkpoint?
Checkpoints in pytorch lightning include hyperparameters, but it is not clear to me when these are loaded.

If we don't pass config to load_from_checkpoint we would in principle use the hparams from the checkpoint. However, this leads to a mismatch between the logged hparams in MLflow and the actual hparams used.

lightning_model = FasterRCNN.load_from_checkpoint(
	self.checkpoint_path,
	config=self.config,
)

To reproduce this bug:

Remove the config argument we pass to FasterRCNN.load_from_checkpoint()
Train a model for one epoch (specifying n_epochs=1 in the yaml file) and save a weights_only checkpoint.
- the checkpoint is at the path_to_checkpoints parameter logged in MLflow (the name is last.ckpt).
then launch a training job that starts from that checkpoint . Before I launch it, I edit the config file to have n_epochs=3.
In MLflow, this second training job has the same hyperparameters as the job that produced the training (so it has n_epochs=1 etc), but in reality the job runs for as many epochs as in the yaml file. So it logs n_epochs=1, but runs for n_epochs=3.

The text was updated successfully, but these errors were encountered:

sfmig added the bug Something isn't working label Jun 25, 2024

sfmig mentioned this issue Jun 25, 2024

Adding checkpoint_path for resume training #182

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using hyperparameters from a checkpoint that is "weight-only" #194

Using hyperparameters from a checkpoint that is "weight-only" #194

sfmig commented Jun 25, 2024

Using hyperparameters from a checkpoint that is "weight-only" #194

Using hyperparameters from a checkpoint that is "weight-only" #194

Comments

sfmig commented Jun 25, 2024