Jan Ploski comments

Results 66 comments of


                                            Jan Ploski

Short audio files not decoded (empty output)

> There is a check to stop processing if less than 1s of audio remains: > > https://github.com/ggerganov/whisper.cpp/blob/a750868428868abd437e228ae5cab763ef3dc387/whisper.cpp#L5271-L5277 > > I've figured it helps in most situations, but obviously can...

What are the eos_token_id and bos_token_id

> For everyone's convenience, I've uploaded **llama models converted with the latest transformer git head** here: > > **7B** - https://huggingface.co/yahma/llama-7b-hf **13B** - https://huggingface.co/yahma/llama-13b-hf Unfortunately, unlike the decapoda-research/llama-7b-hf model the...

What are the eos_token_id and bos_token_id

> > > For everyone's convenience, I've uploaded **llama models converted with the latest transformer git head** here: > > > **7B** - https://huggingface.co/yahma/llama-7b-hf **13B** - https://huggingface.co/yahma/llama-13b-hf > > >...

What are the eos_token_id and bos_token_id

> So to sum it up, it would be nice to have a test configuration which can execute in the free Google Colab notebook - which I know is technically...

RuntimeError: Failed to tokenize (LlamaCpp and QAWithSourcesChain)

Try passing in n_ctx=2048 as parameter.

llama.cpp BPE tokenization of wiki.test does not match the HF tokenization

> > `llm_tokenizer_bpe::tokenize` seems to be subtly broken > > I implemented an independent port of the [gpt2-tokenizer](https://github.com/openai/gpt-2/blob/master/src/encoder.py#L55-L101)(will share the code if someone is interested) and it shows the same...

Jan Ploski

Short audio files not decoded (empty output)

What are the eos_token_id and bos_token_id

What are the eos_token_id and bos_token_id

What are the eos_token_id and bos_token_id

RuntimeError: Failed to tokenize (LlamaCpp and QAWithSourcesChain)

llama.cpp BPE tokenization of wiki.test does not match the HF tokenization

llama.cpp BPE tokenization of wiki.test does not match the HF tokenization

create test.py

create test.py

Add GGML support