Ali Naeimi

Results 2 issues of Ali Naeimi

### What happened? When using speculative decoding in llama-server, when specifying different context sizes for the target model (-c) and the draft model (-cd), with the draft context being smaller...

### What happened? llama-server just won't produce anything and hangs when the draft model's KV cache CUDA memory allocation fails (OOM) whereas in mainline llama.cpp it crashes properly. on rtx...