GY_Zhang
GY_Zhang
Well, now I know swin-t == swin-transformer. But the last two question still confuse me a lot now.
And in my environment, my result is as this picture:  So it confuses me that `highest` results is different with `layer` 1-4 and the `combiner` layer.
Thank you for your timely reply. I have used single GPU for training and successfully, I will try your new code right now.
After turnning "rag" to "memorag", I find another question   Why the value of query is x["context"][0] and the value of context is x["question"][0] ?