Getting triton compiler error while running the inference code mentioned in the README.

Open sahiljoshi515 opened this issue 10 months ago • 0 comments

triton.compiler.errors.CompilationError: at 114:14: else: if EVEN_HEADDIM: k = tl.load(k_ptrs + start_n * stride_kn, mask=(start_n + offs_n)[:, None] < seqlen_k, other=0.0) else: k = tl.load(k_ptrs + start_n * stride_kn, mask=((start_n + offs_n)[:, None] < seqlen_k) & (offs_d[None, :] < headdim), other=0.0) qk = tl.zeros([BLOCK_M, BLOCK_N], dtype=tl.float32) qk += tl.dot(q, k, trans_b=True)

Hi, I am not sure why I get this, I am simply running the code below:

`import torch from transformers import AutoTokenizer, AutoModel from transformers.models.bert.configuration_bert import BertConfig

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

config = BertConfig.from_pretrained("zhihan1996/DNABERT-2-117M") tokenizer = AutoTokenizer.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True, config=config) model = AutoModel.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True, config=config) model.to(device) model.eval()

dna = "ACGTAGCATCGGATCTATCTATCGACACTTGGTTATCGATCTACGAGCATCTCGTTAGC" inputs = tokenizer(dna, return_tensors = 'pt')["input_ids"].to(device) hidden_states = model(inputs)[0] # [1, sequence_length, 768]

embedding with mean pooling

embedding_mean = torch.mean(hidden_states[0], dim=0) print(embedding_mean.shape) # expect to be 768

embedding with max pooling

embedding_max = torch.max(hidden_states[0], dim=0)[0] print(embedding_max.shape) # expect to be 768 `

If anybody has solved this please let me know. Thank you for your awesome work!

Mar 22 '25 23:03 sahiljoshi515