pytorch-lightning trainer.test(ckpt_path='best') does not work as expected

Bug description

trainer.test(model=model, ckpt_path='best') works after trainer.fit but not otherwise

We get ValueError: `.test(ckpt_path="best")` is set but `ModelCheckpoint` is not configured to save the best model.

ModelCheckpoint is configured to save the best model. In fact save_top_k=1 is the default

    checkpoint_callback = ModelCheckpoint(
                dirpath='/Users/adam.amster/Downloads/pl_test',
                monitor='val_loss',
                save_top_k=1
            )

Therefore the best model was saved and ckpt_path="best" should always work, regardless of if it's called after fit

What version are you seeing the problem on?

2.0+

How to reproduce the bug

import os

import lightning as L
import torch
from lightning.pytorch.callbacks import ModelCheckpoint
from torch import nn
from torch.nn import functional as F
from torch.utils.data import DataLoader, random_split
from torchmetrics import Accuracy
from torchvision import transforms
from torchvision.datasets import MNIST

PATH_DATASETS = os.environ.get("PATH_DATASETS", ".")
BATCH_SIZE = 256 if torch.cuda.is_available() else 64


class LitMNIST(L.LightningModule):
    def __init__(self, data_dir=PATH_DATASETS, hidden_size=64, learning_rate=2e-4):
        super().__init__()

        # Set our init args as class attributes
        self.data_dir = data_dir
        self.hidden_size = hidden_size
        self.learning_rate = learning_rate

        # Hardcode some dataset specific attributes
        self.num_classes = 10
        self.dims = (1, 28, 28)
        channels, width, height = self.dims
        self.transform = transforms.Compose(
            [
                transforms.ToTensor(),
                transforms.Normalize((0.1307,), (0.3081,)),
            ]
        )

        # Define PyTorch model
        self.model = nn.Sequential(
            nn.Flatten(),
            nn.Linear(channels * width * height, hidden_size),
            nn.ReLU(),
            nn.Dropout(0.1),
            nn.Linear(hidden_size, hidden_size),
            nn.ReLU(),
            nn.Dropout(0.1),
            nn.Linear(hidden_size, self.num_classes),
        )

        self.val_accuracy = Accuracy(task="multiclass", num_classes=10)
        self.test_accuracy = Accuracy(task="multiclass", num_classes=10)

    def forward(self, x):
        x = self.model(x)
        return F.log_softmax(x, dim=1)

    def training_step(self, batch, batch_idx):
        x, y = batch
        logits = self(x)
        loss = F.nll_loss(logits, y)

        self.log("train_loss", loss)

        return loss

    def validation_step(self, batch, batch_idx):
        x, y = batch
        logits = self(x)
        loss = F.nll_loss(logits, y)
        preds = torch.argmax(logits, dim=1)
        self.val_accuracy.update(preds, y)

        # Calling self.log will surface up scalars for you in TensorBoard
        self.log("val_loss", loss, prog_bar=True)
        self.log("val_acc", self.val_accuracy, prog_bar=True)

    def test_step(self, batch, batch_idx):
        x, y = batch
        logits = self(x)
        loss = F.nll_loss(logits, y)
        preds = torch.argmax(logits, dim=1)
        self.test_accuracy.update(preds, y)

        # Calling self.log will surface up scalars for you in TensorBoard
        self.log("test_loss", loss, prog_bar=True)
        self.log("test_acc", self.test_accuracy, prog_bar=True)

    def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr=self.learning_rate)
        return optimizer

    ####################
    # DATA RELATED HOOKS
    ####################

    def prepare_data(self):
        # download
        MNIST(self.data_dir, train=True, download=True)
        MNIST(self.data_dir, train=False, download=True)

    def setup(self, stage=None):
        # Assign train/val datasets for use in dataloaders
        if stage == "fit" or stage is None:
            mnist_full = MNIST(self.data_dir, train=True, transform=self.transform)
            self.mnist_train, self.mnist_val = random_split(mnist_full, [55000, 5000])

        # Assign test dataset for use in dataloader(s)
        if stage == "test" or stage is None:
            self.mnist_test = MNIST(self.data_dir, train=False, transform=self.transform)

    def train_dataloader(self):
        return DataLoader(self.mnist_train, batch_size=BATCH_SIZE)

    def val_dataloader(self):
        return DataLoader(self.mnist_val, batch_size=BATCH_SIZE)

    def test_dataloader(self):
        return DataLoader(self.mnist_test, batch_size=BATCH_SIZE)


if __name__ == '__main__':
    model = LitMNIST()
    checkpoint_callback = ModelCheckpoint(
                dirpath='/Users/adam.amster/Downloads/pl_test',
                monitor='val_loss',
                save_top_k=2
            )
    trainer = L.Trainer(
        accelerator="auto",
        devices=1,
        max_epochs=3,
        default_root_dir='/Users/adam.amster/Downloads/pl_test',
        callbacks=[checkpoint_callback]
    )
    # first fit, then comment, then run again
    trainer.fit(model)
    trainer.test(model=model, ckpt_path='best')

Error messages and logs

`ValueError: `.test(ckpt_path="best")` is set but `ModelCheckpoint` is not configured to save the best model.`

Environment

<details>
  <summary>Current environment</summary>

* CUDA:
        - GPU:               None
        - available:         False
        - version:           None
* Lightning:
        - lightning:         2.0.1
        - lightning-cloud:   0.5.32
        - lightning-utilities: 0.8.0
        - pytorch-lightning: 2.0.1
        - torch:             2.0.0
        - torchmetrics:      0.11.4
        - torchvision:       0.15.1
* Packages:
        - aiohttp:           3.8.4
        - aiosignal:         1.3.1
        - anyio:             3.6.2
        - arrow:             1.2.3
        - async-timeout:     4.0.2
        - attrs:             22.2.0
        - beautifulsoup4:    4.12.2
        - blessed:           1.20.0
        - certifi:           2022.12.7
        - charset-normalizer: 3.1.0
        - click:             8.1.3
        - contourpy:         1.0.7
        - croniter:          1.3.10
        - cycler:            0.11.0
        - dateutils:         0.6.12
        - deepdiff:          6.3.0
        - dnspython:         2.3.0
        - email-validator:   1.3.1
        - fastapi:           0.88.0
        - filelock:          3.11.0
        - fonttools:         4.39.3
        - frozenlist:        1.3.3
        - fsspec:            2023.4.0
        - h11:               0.14.0
        - httpcore:          0.16.3
        - httptools:         0.5.0
        - httpx:             0.23.3
        - idna:              3.4
        - imageio:           2.27.0
        - importlib-resources: 5.12.0
        - inquirer:          3.1.3
        - itsdangerous:      2.1.2
        - jinja2:            3.1.2
        - joblib:            1.2.0
        - kiwisolver:        1.4.4
        - lazy-loader:       0.2
        - lightning:         2.0.1
        - lightning-cloud:   0.5.32
        - lightning-utilities: 0.8.0
        - markdown-it-py:    2.2.0
        - markupsafe:        2.1.2
        - matplotlib:        3.7.1
        - mdurl:             0.1.2
        - mpmath:            1.3.0
        - multidict:         6.0.4
        - networkx:          3.0
        - numpy:             1.24.2
        - opencv-python:     4.7.0.72
        - ordered-set:       4.1.0
        - orjson:            3.8.10
        - packaging:         23.0
        - pandas:            1.5.3
        - pillow:            9.5.0
        - pip:               23.0.1
        - psutil:            5.9.4
        - pydantic:          1.10.7
        - pygments:          2.14.0
        - pyjwt:             2.6.0
        - pyparsing:         3.0.9
        - pytesseract:       0.3.10
        - python-dateutil:   2.8.2
        - python-dotenv:     1.0.0
        - python-editor:     1.0.4
        - python-multipart:  0.0.6
        - pytorch-lightning: 2.0.1
        - pytz:              2023.3
        - pywavelets:        1.4.1
        - pyyaml:            6.0
        - readchar:          4.0.5
        - requests:          2.28.2
        - rfc3986:           1.5.0
        - rich:              13.3.3
        - scikit-image:      0.20.0
        - scikit-learn:      1.2.2
        - scipy:             1.9.1
        - seaborn:           0.12.2
        - setuptools:        67.4.0
        - six:               1.16.0
        - sniffio:           1.3.0
        - soupsieve:         2.4
        - starlette:         0.22.0
        - starsessions:      1.3.0
        - sympy:             1.11.1
        - threadpoolctl:     3.1.0
        - tifffile:          2023.3.21
        - torch:             2.0.0
        - torchmetrics:      0.11.4
        - torchvision:       0.15.1
        - tqdm:              4.65.0
        - traitlets:         5.9.0
        - typing-extensions: 4.5.0
        - ujson:             5.7.0
        - urllib3:           1.26.15
        - uvicorn:           0.21.1
        - uvloop:            0.17.0
        - watchfiles:        0.19.0
        - wcwidth:           0.2.6
        - websocket-client:  1.5.1
        - websockets:        11.0.1
        - wheel:             0.38.4
        - yarl:              1.8.2
        - zipp:              3.15.0
* System:
        - OS:                Darwin
        - architecture:
                - 64bit
                - 
        - processor:         i386
        - python:            3.8.2
        - version:           Darwin Kernel Version 21.6.0: Mon Aug 22 20:17:10 PDT 2022; root:xnu-8020.140.49~2/RELEASE_X86_64

</details>

More info

No response

Apr 10 '23 01:04 aamster

According to the source, after training, you can access the best model path by checkpoint_callback.best_model_path. And this internal variable is updated during the training loop, so when a new trainer instance is instantiated, it does not have that information.

Modifying the checkpoint code to get all the models in a given path would mean that the models also would have to have information about their scores in the model name when saved (as it has to pick the best one, and since it's newly instantiated, it has no idea about the training process) which seems like overhead. Sometimes there might be best k from several runs with different models, so automatically getting best models just given path might be difficult

Apr 10 '23 10:04 LawJarp-A

Yeah seems like it works fine to me, but the value error probably isn't that helpful. It could actually tell the user for best checkpoint it has to be tested after fitting.

ValueError: .test(ckpt_path="best") is set but ModelCheckpoint has not saved any checkpoints yet. You must run .fit(model)

Could be more helpful in cases like this.

Apr 11 '23 07:04 ryan597

@ryan597 to me, that error message is even more confusing. The scenario is that fit has been called previously, and the best checkpoint was saved. Then you want to evaluate test performance using this best model, without calling fit. The confusion for me is that pytorch lightning is saving the best checkpoint and saving metrics. But then when we call test it is not accessing those, unless fit is called within the same run. If anything a better error message would be

NotImplementedError: Even though you have run `.fit` previously and have saved the best checkpoint, we have not implemented loading the best model without calling `.fit` in the same run. You must call `.fit` in the same run in order to use .ckpt_path='best', or must manually load the best checkpoint by passing in the path.

Apr 11 '23 12:04 aamster

@aamster well your test is that you commented out the .fit() call right?

if __name__ == '__main__':
    model = LitMNIST()
    checkpoint_callback = ModelCheckpoint(
                dirpath='/Users/adam.amster/Downloads/pl_test',
                monitor='val_loss',
                save_top_k=2
            )
    trainer = L.Trainer(
        accelerator="auto",
        devices=1,
        max_epochs=3,
        default_root_dir='/Users/adam.amster/Downloads/pl_test',
        callbacks=[checkpoint_callback]
    )
    # first fit, then comment, then run again
    #trainer.fit(model)
    trainer.test(model=model, ckpt_path='best')

So you then instantiated the trainer and checkpoint callback and called test, without a call to fit. So ModelCheckpoint does not have a best checkpoint, because it has not saved any checkpoints.

Is this any better? ValueError: .test(ckpt_path="best") is set but ModelCheckpoint has not saved any checkpoints yet. You must run .fit(model) or manually load the best checkpoint by passing the checkpoint path.

Apr 11 '23 12:04 ryan597

@ryan597 yes correct. My confusion is that you are telling the trainer where to save checkpoints, and which to save (best) as well as metrics during the training run are saved. But then it isn't using this information when you call .test even without calling .fit in the same run. Seems that if this information is persisted by the API then it should be able to use it. I disagree with any mention of ModelCheckpoint has not saved any checkpoints yet since, this is not the case, it has been saved and .fit has been run previously.

Apr 11 '23 13:04 aamster

@aamster the metrics for the saved models are not saved I believe, only the model weights. So when you call .fit, it won't know which model to take up with the .test(ckpt_path="best")

Apr 11 '23 14:04 LawJarp-A

@aamster The ModelCheckpoint is a newly instantiated class as this is a new python program and it has none of the previous variable or class instantiations. It has not saved any checkpoints. It does not have information of the previous run.

But to do so it could check for the last modified file in that directory.

https://github.com/Lightning-AI/lightning/blob/7aea1c6d20a12cedfdfb49a4d945a4a4717ec463/src/lightning/pytorch/trainer/connectors/checkpoint_connector.py#L157-L178

could include this instead of the value error

                import glob
                dirpath = self.trainer.checkpoint_callback.dirpath
                list_of_checkpoints = glob.glob(f'{dirpath}/*.ckpt')
                if len(list_of_checkpoints) > 0:
                    latest_checkpoint = max(list_of_checkpoints, key=os.path.getctime)
                    best_model_path = list(torch.load(latest_checkpoint)['callbacks'].values())[0]['best_model_path']
                    return self._parse_ckpt_path(state_fn, best_model_path, model_provided, model_connected)
                raise ValueError(
                    f'`.{fn}(ckpt_path="best")` is set but `ModelCheckpoint` has not saved any checkpoints to {dirpath}'
                )

Where now the value error is raised if there are no checkpoints in the directory. I did a quick test with this and your model for 100 epochs and saving the top_k=10, and it seems to work. However I think there is a lot more checking that has to be done due to the hack nature of this, eg.

I'm assuming only that specific models checkpoints are there
That the last modified checkpoint is the last created in the .fit (i.e. highest epoch)
AND that you have only run fit once!!!

This last point is why this likely shouldn't be included as a feature.

Apr 11 '23 14:04 ryan597

@ryan597 what if there are checkpoints for different architectures saved? In that case, using the latest modification might not work. It should also check if the weights correspond to the current model architecture. Ex: If I run for ResNet first, and then Inception. And now I call test with ResNet, it should not only take the latest modified but also the weights that fit the architecture.

Apr 11 '23 14:04 LawJarp-A

@LawJarp-A Yes I just edited with the cases it would be rather terrible for. Really this needs to be loading every checkpoint and checking them all for the best_model_score, and then you do still need to verify the state_dict is correct. When the alternative is just to give the path if you haven't run fit, this doesn't make sense to implement.

Apr 11 '23 14:04 ryan597

I have also encountered this problem, and the strange thing is that Trainer does not save any checkpoints. my code:

callbacks = [ModelCheckpoint(monitor='val_acc',
                                 dirpath='/ckpts',  # f"{args.checkpoint_dir}/{args.save_model_name}",
                                 filename='model_ckpt',
                                 save_top_k=2,
                                 mode="max",
                                 save_last=True),
                 TQDMProgressBar(refresh_rate=20),
                 LearningRateMonitor(logging_interval="epoch"),
                 EarlyStopping(monitor=monitor_index,
                               mode="max",
                               patience=args.max_epochs // 10,
                               check_finite=True)]

    trainer = Trainer(logger=logger,
                      callbacks=callbacks,
                      accelerator="gpu",
                      max_epochs=args.max_epochs,
                      fast_dev_run=False,
                      precision=args.precision,
                      log_every_n_steps=args.flush_logs_every_n_steps)
    trainer.fit(model, data_module, ckpt_path="last" if args.resume_from_ckpt else None)
    trainer.test(model, data_module, ckpt_path="best")
    trainer.predict(model, data_module, ckpt_path="best")

Jun 17 '23 04:06 zxyl1003

Hello

This isn't a bug because ckpt_path="best" isn't designed like this. The information of which file is the best is stored in the checkpoint callback and updated during training. It's not possible for this feature to load the best checkpoint if you initialize a new trainer, because just by looking at a folder of checkpoints alone you can't know which one is the best in general. The error message might be a bit misleading here and could be improved.

On the other hand, ckpt_path="last" works this way because there is a dedicated file called last.ckpt. A similar feature could be implemented to save a symlink to best.ckpt, then the trainer could load it like you desire.

Nov 25 '23 16:11 awaelchli

@awaelchli Any update on this issue? Having a symlink to best.ckpt would be very useful!

Jun 19 '24 18:06 surajpaib

At the very least it would be better to provide a clearer error message as suggested before.

Jun 21 '24 23:06 AmosHason

@awaelchli so， how should we fix this problem , the err code is " ValueError: .test(ckpt_path="best")is set butModelCheckpoint is not configured to save the best model".

Sep 28 '24 09:09 sunhan3787