[BUG] 存下显存泄漏,及访问过快时候报错
Describe the bug
how to load CPM1 model form local, now i used the following way: 1、build my model model = GPT2Model(num_layers=args.num_layers, vocab_size=args.vocab_size, hidden_size=args.hidden_size, num_attention_heads=args.num_attention_heads, embedding_dropout_prob=args.hidden_dropout, attention_dropout_prob=args.attention_dropout, output_dropout_prob=args.hidden_dropout, max_sequence_length=args.max_position_embeddings, checkpoint_activations=args.checkpoint_activations, checkpoint_num_layers=args.checkpoint_num_layers, parallel_output=args.parallel_output)
the code from here 2、load_state_dict load state_dict form local model
3、use wrapper to use bminf model = bminf.wrapper(model)
Expected behavior
Screenshots
请求之前的显存占用
请求之后的显存占用

在访问速度过快的时候,也会报错。

其他: 怎么wrapper 一个transformers中加载出的模型?示例中实现没看明白。 Environment:
apex 0.1 bminf 2.0.0 deepspeed 0.3.15