ONE [onert] Support training of branching models

Why

Error when the output is used as multiple inputs

In this graph, the output of 53 Add is used in both 54 Conv2D and 57 Add operators. During back-propagation, both branches will be calculated with their gradient values. However, since there is only one 121 tensor, it is necessary to consider applying both gradient values to one tensor. Currently, the gradient calculated later overwrites the previous 121 gradient value. This makes an error in loss value.

/cc @ragmani

Originally posted by @jyoungyun in https://github.com/Samsung/ONE/issues/12325#issuecomment-1916159963

What

Tasks

Introduce DisposableTensor
- #12877
- #12896
- #12931
Make train backend's TensorManager manage memories of DisposableTensor
- #12874
- #12883
- #12930
- #12932
- #12944
- #12974
Introduce BackPropAccumulator layer that sums branched back-prop tensors accumulatively.
- #12878
Apply BackPropAccumulator
- #12976
Introduce BackPropInitializer layer that initializes back-prop tensors.
- #12928
Apply BackPropInitializer
- #12950
- #12945

Draft : #12603

Jan 30 '24 10:01 ragmani

I checked that a branching sample model was trained well with #12603 when using "mse" loss and "sgd" optimizer.

tensorflow

$ python3 tensorflow_run.py -m models/ -i out/train.input.1000.bin -l out/train.output.1000.bin --data_length 1000  --batch_size 1 --epoch 5 --loss mse --optimizer sgd --learning_rate 0.001 --loss_reduction_type=sum_over_batch_size

...

=================================================================
Total params: 118,282
Trainable params: 118,282
Non-trainable params: 0
_________________________________________________________________
Epoch 1/5
1000/1000 [==============================] - 1s 581us/step - loss: 0.0340 - categorical_accuracy: 0.8040
Epoch 2/5
1000/1000 [==============================] - 1s 561us/step - loss: 0.0316 - categorical_accuracy: 0.8220
Epoch 3/5
1000/1000 [==============================] - 1s 574us/step - loss: 0.0304 - categorical_accuracy: 0.8300
Epoch 4/5
1000/1000 [==============================] - 1s 565us/step - loss: 0.0296 - categorical_accuracy: 0.8340
Epoch 5/5
1000/1000 [==============================] - 1s 569us/step - loss: 0.0289 - categorical_accuracy: 0.8360
==========================
Total time: 3.0182

onert

$ ./Product/x86_64-linux.release/out/bin/onert_train mnist_branched.circle --load_expected:raw out/train.output.1000.bin --load_input:raw out/train.input.1000.bin --loss 1 --loss_reduction_type 1 --optimizer 1 --learning_rate 0.001 --batch_size 1
Model Expected Filename out/train.output.1000.bin
Model Input Filename out/train.input.1000.bin
Model Filename mnist_branched.circle
== training parameter ==
- learning_rate   = 0.001
- batch_size      = 1
- loss_info       = {loss = mean squared error, reduction = sum over batch size}
- optimizer       = sgd
========================
Epoch 1/5 - time: 0.225ms/step - loss: [0] 0.0340
Epoch 2/5 - time: 0.226ms/step - loss: [0] 0.0316
Epoch 3/5 - time: 0.225ms/step - loss: [0] 0.0304
Epoch 4/5 - time: 0.226ms/step - loss: [0] 0.0296
Epoch 5/5 - time: 0.228ms/step - loss: [0] 0.0289
===================================
MODEL_LOAD   takes 0.4220 ms
PREPARE      takes 2.9550 ms
EXECUTE      takes 1154.5840 ms
- Epoch 1      takes 224.8520 ms
- Epoch 2      takes 226.3390 ms
- Epoch 3      takes 225.4660 ms
- Epoch 4      takes 225.6470 ms
- Epoch 5      takes 228.2570 ms
===================================

But, with "cce" loss or adam optimizer, loss values are different. This problem is not related to the current issue so may be going to be dealt with in another issue.

Apr 16 '24 10:04 ragmani

Done.

May 20 '24 02:05 ragmani