Junyang Lin

Results 10 issues of Junyang Lin

### Describe the bug Error with the HF demo, which is caused `IsADirectoryError: [Errno 21] Is a directory: 'gradio_cached_examples/'`. ### Is there an existing issue for this? - [X] I...

bug

m_C = tf.reduce_sum(m_emb_C * self._encoding, 2) c_temp = tf.transpose(m_C, [0, 2, 1]) Here in this part, the first line with reduce_sum should turn the matrix into 2-dimension, so I think...

Hi, I tried to implement the encoder-decoder model, namely BART [https://github.com/huggingface/transformers/tree/master/src/transformers/models/bart](https://github.com/huggingface/transformers/tree/master/src/transformers/models/bart), following the Megatron-LM tutorial. Everything works fine if I did not use ZERO2 for checkpointing. With ZERO2, the model...

Add Qwen2 by modifying llama.py with adding the attention bias.

merge MuE feature to main

This PR fixes the model name problems existing in Qwen2 related codes and docs

I just tested the inference speed and memory footprint on my device. The statistics are here: https://qwen.readthedocs.io/en/latest/benchmark/hf_infer.html Tested on A100 80G. Try to use the least number of GPU as...

we just released codeqwen1.5, see [blog](https://qwenlm.github.io/blog/codeqwen1.5/) and [model](https://huggingface.co/Qwen/CodeQwen1.5-7B-Chat) for more info. since for codeqwen1.5, we use a different tokenizer, which is based on sentencepiece, we need to make some changes...

Hi, thanks for your provding end-to-end training framework in Pytorch for MoE models. We have recently implemented MoE in tensorflow and found out that categorizing experts to different groups can...

enhancement

I met this error when building the graph, and this is my code below for the encoding layer copied from stackoverflow, but it does not work for me... ``` def...