Junyang Lin issues

Results 10 issues of


                                            Junyang Lin

HF API with a bug of cached examples

### Describe the bug Error with the HF demo, which is caused `IsADirectoryError: [Errno 21] Is a directory: 'gradio_cached_examples/'`. ### Is there an existing issue for this? - [X] I...

bug

Puzzled about the attention part

m_C = tf.reduce_sum(m_emb_C * self._encoding, 2) c_temp = tf.transpose(m_C, [0, 2, 1]) Here in this part, the first line with reduce_sum should turn the matrix into 2-dimension, so I think...

Errors with ZERO2 in the encoder-decoder model

Hi, I tried to implement the encoder-decoder model, namely BART [https://github.com/huggingface/transformers/tree/master/src/transformers/models/bart](https://github.com/huggingface/transformers/tree/master/src/transformers/models/bart), following the Megatron-LM tutorial. Everything works fine if I did not use ZERO2 for checkpointing. With ZERO2, the model...

Add qwen2

Add Qwen2 by modifying llama.py with adding the attention bias.

Feature/mue

merge MuE feature to main

fix model names

This PR fixes the model name problems existing in Qwen2 related codes and docs

Inference speed and memory footprint

I just tested the inference speed and memory footprint on my device. The statistics are here: https://qwen.readthedocs.io/en/latest/benchmark/hf_infer.html Tested on A100 80G. Try to use the least number of GPU as...

add support of codeqwen due to tokenizer

we just released codeqwen1.5, see [blog](https://qwenlm.github.io/blog/codeqwen1.5/) and [model](https://huggingface.co/Qwen/CodeQwen1.5-7B-Chat) for more info. since for codeqwen1.5, we use a different tokenizer, which is based on sentencepiece, we need to make some changes...

Adding Expert Prototyping to FastMoE

Hi, thanks for your provding end-to-end training framework in Pytorch for MoE models. We have recently implemented MoE in tensorflow and found out that categorizing experts to different groups can...

enhancement

AttributeError: 'LSTMStateTuple' object has no attribute 'get_shape'

I met this error when building the graph, and this is my code below for the encoding layer copied from stackoverflow, but it does not work for me... ``` def...