justdoit

Results 13 comments of justdoit

Polite, may I ask if the build_tree function does not have a Triton version implementation!

I am using the latest code for offline testing。 At the beginning of the cycle,tps = 180 tokens/s, But after a few cycles, there will be a serious decrease in...

how can I make v3/r1 gptq int8 weight. use func per_token_group_quant_int8 to produce int8 blockwise weights?