BitNet icon indicating copy to clipboard operation
BitNet copied to clipboard

Official inference framework for 1-bit LLMs

Results 227 BitNet issues
Sort by recently updated
recently updated
newest added

I have spent several days trying to get this to work. I have tried everything under the sun and no matter what it just does not work. At the end...

Bring up the llm in server mode with command `python run_inference_server.py -m --host 0.0.0.0 --port 5000` When connect to the server using API endpoint `http://localhost:5000/completion ` with payload `{"prompt": "}`...

**Context and Purpose:** This PR automatically remediates a security vulnerability: - **Description:** Functions reliant on pickle can result in arbitrary code execution. Consider loading from `state_dict`, using fickling, or switching...

have some confused questions about gpu/test.py (1) ``` input0= torch.randint(-128,127,(1, K),dtype=torch.int8, device='cuda') input_np = input0.cpu().to(torch.int32).numpy() weight_np = weight.cpu().to(torch.int32).T.numpy() out_np = np.matmul(input_np,weight_np) out_np = torch.tensor(out_np).cuda().to(torch.bfloat16) s = torch.ones(1, dtype=torch.bfloat16, device='cuda') ws...

Hi, love the GPU kernels! any plans on adding them to the HF transformers lib?

``` torch.cuda.synchronize() stats.phase("decode" if use_cuda_graphs else "total") eos_id = self.tokenizer.eot_id ``` error: AttributeError: 'Tokenizer' object has no attribute 'eot_id'. Did you mean: 'eos_id'? is code wrong? should it be eos_id...