about "NaNs in the encoded observations"
Thanks for your contribution!
How can I fix this problem, can you give me some advice?
By the way, will you upload the validation part, or how can i validate?
First, track down where the NaNs originate. Either the input observations contain some NaNs, or the weights became NaN during training due to instability or NaNs in the loss function.
I’ve used NaNs in the dataset to flag any invalid/illegal observations (e.g., far-away trajectories, occlusions) so that there is no leakage. After the mask is generated:
https://github.com/sisl/SceneInformer/blob/efce1976e939b08eb4608f4eb679a1179926e4e5/sceneinformer/model/encoder.py#L57
They should all be set to 0 to prevent any PyTorch errors: https://github.com/sisl/SceneInformer/blob/efce1976e939b08eb4608f4eb679a1179926e4e5/sceneinformer/model/encoder.py#L72
The same logic applies to polylines, so this issue shouldn't occur unless something has been commented out.
A similar approach is used for the loss function (sceneinformer/model/loss.py):
https://github.com/sisl/SceneInformer/blob/efce1976e939b08eb4608f4eb679a1179926e4e5/sceneinformer/model/loss.py#L23
and decoder:
https://github.com/sisl/SceneInformer/blob/efce1976e939b08eb4608f4eb679a1179926e4e5/sceneinformer/model/decoder.py#L43
Check them as well.
If none of the above applies, then it is most likely caused by training instability, in which case typical solutions should be applied, such as lowering the learning rate, clipping gradients, or increasing precision.
I’ve included the hyperparameters in the config file for easy adjustment: https://github.com/sisl/SceneInformer/blob/efce1976e939b08eb4608f4eb679a1179926e4e5/configs/scene_informer.yaml#L113-L122
Hopefully, that's helpful.
By the way, will you upload the validation part, or how can i validate?
I'll try to find the exact script I've used.
Validation script just compares the FDE/ADE of the generated trajectories and compares the anchor occupancy (like in the loss function).
There is also a visualization script that you can use to get a sense if anything reasonable is happening and plot some random samples from the training/validation splits during training.