BMTrain
BMTrain copied to clipboard
Output hidden states and attention scores for each transformer layer
Existing TransformerBlockList cannot output the hidden states and attention scores for each transformer layer. Sometimes we want to get the hiddens and attention scores to conduct analysis and feed them into the next modules.
Support backward of middle hidden states by PR #40 .