shan zhou

Results 3 issues of shan zhou

**While using pmem allocator in the WDL model both on libpmem or memkind mode, it would cause "._/tensorflow/core/framework/embedding/value_ptr.h:273] Unsupport FreqCounter in subclass of ValuePtrBase Aborted (core dumped)_"** **Here are the...

I try to run the model on CPU in offline mode. But it depends on a package flash_attn, which needs to be compiled with nvcc on GPU. I am wondering...

When running vllm serving with 16 threads using the model DeepSeek-Distill-Qwen-7b, the result is wrong with the prompt below. xfastertransformer 1.8.2. vllm-xft 0.5.5.0 The result is correct while running 12...