About custom == np contrast in gpu/test.py

Open Singapore-mor opened this issue 8 months ago • 0 comments

have some confused questions about gpu/test.py (1)

            input0= torch.randint(-128,127,(1, K),dtype=torch.int8, device='cuda') 
            input_np = input0.cpu().to(torch.int32).numpy()
            weight_np = weight.cpu().to(torch.int32).T.numpy()
            out_np = np.matmul(input_np,weight_np)
            out_np = torch.tensor(out_np).cuda().to(torch.bfloat16)

            s = torch.ones(1, dtype=torch.bfloat16, device='cuda')
            ws = torch.ones(6, dtype=torch.bfloat16, device='cuda')

            ret = torch.empty((1,N), dtype=torch.bfloat16, device=input0.device)
            out = bitnet_int8xint2_linear(input0, weight_compressed, s, ws, ret)
            print(f'custom == np {torch.all(out==out_np)}')

sorry I don't know the meaning of comparing 'the out_np' and ‘out’, because we get 'out_np' by the data input0 and weight which are int8 not bf16 (2) in function 'convert_weight_int8_to_int2', why do this process ’weight = weight+2’ ? (3) after we get permutated_weight, the weight is still int8 not int2. In ‘compress_int2_to_int8’ function, we just use the lowest two bits as int2? is this correct? (4) why we load model_state_fp16.pt for prefill and load model_state_int2.pt for decode?

May 26 '25 07:05 Singapore-mor