Getting triton compiler error while running the inference code mentioned in the README.
triton.compiler.errors.CompilationError: at 114:14: else: if EVEN_HEADDIM: k = tl.load(k_ptrs + start_n * stride_kn, mask=(start_n + offs_n)[:, None] < seqlen_k, other=0.0) else: k = tl.load(k_ptrs + start_n * stride_kn, mask=((start_n + offs_n)[:, None] < seqlen_k) & (offs_d[None, :] < headdim), other=0.0) qk = tl.zeros([BLOCK_M, BLOCK_N], dtype=tl.float32) qk += tl.dot(q, k, trans_b=True)
Hi, I am not sure why I get this, I am simply running the code below:
`import torch from transformers import AutoTokenizer, AutoModel from transformers.models.bert.configuration_bert import BertConfig
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
config = BertConfig.from_pretrained("zhihan1996/DNABERT-2-117M") tokenizer = AutoTokenizer.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True, config=config) model = AutoModel.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True, config=config) model.to(device) model.eval()
dna = "ACGTAGCATCGGATCTATCTATCGACACTTGGTTATCGATCTACGAGCATCTCGTTAGC" inputs = tokenizer(dna, return_tensors = 'pt')["input_ids"].to(device) hidden_states = model(inputs)[0] # [1, sequence_length, 768]
embedding with mean pooling
embedding_mean = torch.mean(hidden_states[0], dim=0) print(embedding_mean.shape) # expect to be 768
embedding with max pooling
embedding_max = torch.max(hidden_states[0], dim=0)[0] print(embedding_max.shape) # expect to be 768 `
If anybody has solved this please let me know. Thank you for your awesome work!