transformers New metrics and different Loss in TF version of Segformer

Feature request

I am playing with the awesome Segformer finetuning example on the Keras website made by @sayakpaul that relies on HF Transformers. In this example no loss function nor any metrics are specified in model.compile() . I would like to be able to add metrics (e.g., IoU, Dice, etc), and potentially change the loss for the Segformer model.

When I tried to make these additions, the compile step failed. (From reading the Segformer paper and original code it seems like all metrics and losses need to have some form of masking?).

Any advice or info on how to implement these changes would be awesome (and i apologize in advance if I have missed the relevant docs (i did look!).

(based on comms with @sayakpaul , i am also cc:ing @Rocketknight1 )

Motivation

Track various metrics during the fine-tuning of the Segformer model.

Your contribution

I think once i understand the solution steps i would be able to determine if i could contribute

Mar 10 '23 20:03 ebgoldstein

Interesting! Can you document exactly what error you got with the compile step and what code you ran to cause them?

Mar 13 '23 15:03 Rocketknight1

Hi @Rocketknight1 I misremembered- the error is not after model.compile() - compiling a model with a different loss function, added metrics, a custom loss, or custom metrics all compile w/ no error. The errors appear with model.fit() .

So far I have tried to fit a model with a range of things :

a custom loss (my own version of Dice Loss)
added metrics (tf.keras.metrics.MeanIoU() and/or a (custom) Dice metric)
using KLDivergence loss (tf.keras.losses.KLDivergence())

All produce errors during model.fit(), and all produce their own sets of errors.. All of them seem to me to be some type of tensorshape issue, but the Tracebacks are all different. To make sure its not just me, my colleague @dbuscombe-usgs has also tried, and also reported similar issues (with different datasets, different number of classes, different TF versions, different machines, etc.). I can provide a reference dataset and the scripts I am working with, if needed...

Mar 15 '23 19:03 ebgoldstein

Yes please! Ideally if you could give us some minimal code that reproduces the issue, that would make it much easier for us to track it down.

Also, sorry for the delay in replying here - I was away on St. Patrick's Day so I'm only getting to my GitHub backlog now!

Mar 20 '23 18:03 Rocketknight1

I have done it myself. Lol to much green beer!

Sent from Yahoo Mail for iPhone

On Monday, March 20, 2023, 2:40 PM, Matt @.***> wrote:

Yes please! Ideally if you could give us some minimal code that reproduces the issue, that would make it much easier for us to track it down.

Also, sorry for the delay in replying here - I was away on St. Patrick's Day so I'm only getting to my GitHub backlog now!

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>

Mar 20 '23 18:03 Jagjeffery

Hi @Rocketknight1 , sorry for the delay.

Attached below is some code and example image & label pairs all zipped up. Let me know if you prefer another format/delivery mechanism

|-  TFSegFormerExample.py
L  ExampleData
      | - images
      L labels

on L165 of the code is the compile step, and different versions of the model can be commented/uncommented to see the various error codes:

L168 is the base case, where no loss function is defined - this works L171 defines SparseCatLoss - this does not work L174 defines KLD loss - this does not work L171 defines no loss but uses meanIoU as a metric - this does not work

(These look like tensorshape issues to me, and typically i would debug it by looking at the last layer shape of model.summary().. but the output of model.summary() for this model is not super expressive for this model, I'm not quite sure why - but maybe that is a whole different question)

TFSegformerExample.zip

Mar 23 '23 21:03 ebgoldstein

Ah, I see! The issue here is caused by some specific behaviour of the SegFormer models when using inputs of this resolution. The model outputs are actually at a lower resolution than the inputs - you can check this by manually passing in a batch. The output logits come out at 128x128, whereas the input is 512x512. This results in the loss computation failing because the logit and label tensors can't be aligned with each other.

If you use the model's internal loss computation by not passing any loss argument to compile(), then logits are upscaled before applying the cross-entropy loss and training works correctly. If you want to use your own custom loss function you'll have to do something similar.

I'm not sure exactly why the output resolution for SegFormer is different from the input resolution, but it's not a bug in the Hugging Face TensorFlow implementation because the original model and our PyTorch implementation do this as well. @sayakpaul do you know why the model does that?

Mar 27 '23 17:03 Rocketknight1

thx for that code highlight @Rocketknight1 , super helpful and i understand it now -- i would need a similar upsampling routine.

related Q to finding the output resolution - is there a reason that summary() does not provide info on all the layers/internal architecture of the model?

Mar 28 '23 00:03 ebgoldstein

@sayakpaul do you know why the model does that?

It's very likely because of how the model is designed and how it accumulates the multiple-resolution features and decodes them into a segmentation map. @NielsRogge might have better inputs.

i would need a similar upsampling routine.

You can check out this notebook that has this.

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    # logits are of shape (batch_size, num_labels, height, width), so
    # we first transpose them to (batch_size, height, width, num_labels)
    logits = tf.transpose(logits, perm=[0, 2, 3, 1])
    # scale the logits to the size of the label
    logits_resized = tf.image.resize(
        logits,
        size=tf.shape(labels)[1:],
        method="bilinear",
    )
    # compute the prediction labels and compute the metric
    pred_labels = tf.argmax(logits_resized, axis=-1)
    metrics = metric.compute(
        predictions=pred_labels,
        references=labels,
        num_labels=num_labels,
        ignore_index=-1,
        reduce_labels=image_processor.do_reduce_labels,
    )
    # add per category metrics as individual key-value pairs
    per_category_accuracy = metrics.pop("per_category_accuracy").tolist()
    per_category_iou = metrics.pop("per_category_iou").tolist()

    metrics.update(
        {f"accuracy_{id2label[i]}": v for i, v in enumerate(per_category_accuracy)}
    )
    metrics.update({f"iou_{id2label[i]}": v for i, v in enumerate(per_category_iou)})
    return {"val_" + k: v for k, v in metrics.items()}

related Q to finding the output resolution - is there a reason that summary() does not provide info on all the layers/internal architecture of the model?

That is because we wrap everything as layers, and that has a limitation like this one. We do this to support cross-loading from PyTorch (because of variable naming). @Rocketknight1 might have more to add to this.

Mar 28 '23 03:03 sayakpaul

Yeah, refactoring our TF models to make summary() more usable is absolutely on the list! Unfortunately it's quite a big list, but it's definitely there.

Mar 28 '23 12:03 Rocketknight1

Awesome, thanks so much for all the helpful info @Rocketknight1 & @sayakpaul . I can close this issue now as i understand the landscape much better and it seems the requested feature is already on your list! thanks again - i really appreciate it!

Mar 28 '23 12:03 ebgoldstein