rsong0606
rsong0606
@jeejeelee Hey Jee! I added the chat template as you described above. But I noticed the slower inference speed compared to other models I experimented before, like llama2. Do you...
@Eric-mingjie Thanks Eric, mine is 24 GB GPU memory. Given that at least 14GB would be used to load the model. I still have ~10 GB left in Nvidia L4....
@simlaharma I had a similar issue as you did. check this post, it worked for me. https://github.com/huggingface/datasets/issues/6746