grapevine-AI
grapevine-AI
But, On 1x NVIDIA 3090 (DDR4-offload), IQ3_S and IQ3_M are slower than IQ4_XS (about 0.5x speed) I seem that Only NVIDIA can deal IQ3 with highspeed.
You should revise ``convert_hf_to_gguf.py`` lines 4086-4091 in the ``DeepseekV2Model`` class. ```python if self.hparams.get("rope_scaling") is not None and "factor" in self.hparams["rope_scaling"]: if self.hparams["rope_scaling"].get("type") == "yarn": self.gguf_writer.add_rope_scaling_type(gguf.RopeScalingType.YARN) self.gguf_writer.add_rope_scaling_factor(self.hparams["rope_scaling"]["factor"]) self.gguf_writer.add_rope_scaling_orig_ctx_len(self.hparams["rope_scaling"]["original_max_position_embeddings"]) self.gguf_writer.add_rope_scaling_yarn_log_mul(0.1 * hparams["rope_scaling"]["mscale_all_dim"])...
I encountered the exact same error when running sycl-ls on Windows 11. Interestingly, I did not face this problem with oneAPI 2024.0; the command worked without any issues after a...
You should use ``tokenizer.vocab_size`` instead of ``len(tokenizer.vocab)``
Would you change to ``tokenizer.get_vocab().values()``. Or, delete line if this error is ``assert max(tokenizer.vocab.values()) < vocab_size``. Assert is not necessary.
Yes, llama.cpp have not implement Moonlight's pre-tokenizer yet. But, you can substitute other model's pre-tokenizer. If you want it, you should add this code in line 702. ```python if chkhsh...