Humza Sami comments

Results 17 comments of


                                            Humza Sami

ValueError: Cannot flatten integer dtype tensors

@HamidShojanazeri Please check

How to specify which GPU the model inference on?

> NCCL_P2P_DISABLE=1 CUDA_VISIBLE_DEVICES=2,3 python -m vllm.entrypoints.api_server --tensor-parallel-size 2 --host 127.0.0.1 @MasKong Can you bit elaborate this? This is my simple codebase and I want to use 1 and 3 gpus....

How to let codellama or codellama-python stop?

Use `` as end of string token in the generation

Run code llama from Hugging Face locally with GPU

Please share your cudadnn, cuda toolkit version along with GPU model. Althoug following code block automatically detects GPU. ```python pipeline = transformers.pipeline( "text-generation", model=model, torch_dtype=torch.float16, device_map="auto", ) ```

How to add tokens to tokenizer?

```python tokenizer.add_special_tokens(["[BOST]"])

instruct model's performance become poor when switching to different format

In the Huggingface generation pipeline, Are you using Instruct prompt instructions ? `[INST] user message 1 [/INST] response 1 [INST] user message 2 [/INST] response 2 ` In example_chat_completion.py, this...

instruct model's performance become poor when switching to different format

Can you share code for inference ?

instruct model's performance become poor when switching to different format

@for-just-we It would be helpful if you post your inference code here. Anyways, Could you try this **code snippet** and check if it is producing some better results. ``` from...

Cannot run HF example codes on all three codeLlama-Python-hf models

As far as I know, `codellama-Python` is not for infilling. Please refer to its documentation. ![image](https://github.com/facebookresearch/codellama/assets/63999516/5937eab9-271f-4abe-b18e-85ccb2388492) This model is not finetuned on infilling dataset. It is finetuned on only next...

What is the max length could the codellama-2-7B generate?

This base model can produce total 4096 tokens. You can set `max_new_token` to 4096. This 4096 tokens are including number of tokens of the prompts as well.