LongNet icon indicating copy to clipboard operation
LongNet copied to clipboard

train error

Open ZTYyy opened this issue 2 years ago • 8 comments

当组件名为listItem编译到微信小程序无效

系统:windows10专业版 x64; node:v18.20.0;

v4.08 listItem无效,如下图:

v4 08

v3.99 listItem有效,如下图:

3-99

ZTYyy avatar Dec 23 '23 04:12 ZTYyy

This is my script:

`import gzip import random

import numpy as np import torch import torch.optim as optim import tqdm from torch.utils.data import DataLoader, Dataset

from long_net.model import LongNetTransformer, AutoregressiveWrapper from zeta.optim import StableAdamWUnfused

constants

NUM_BATCHES = int(1e5) BATCH_SIZE = 4 GRADIENT_ACCUMULATE_EVERY = 4 LEARNING_RATE = 2e-4 VALIDATE_EVERY = 100 GENERATE_EVERY = 500 GENERATE_LENGTH = 512 SEQ_LEN = 8196

helpers

def cycle(loader): while True: for data in loader: yield data

def decode_token(token): return str(chr(max(32, token)))

def decode_tokens(tokens): return "".join(list(map(decode_token, tokens)))

instantiate GPT-like decoder model

model = LongNetTransformer(num_tokens=256, dim=512, depth=8)

model = AutoregressiveWrapper(model, max_seq_len=SEQ_LEN)

model.cuda()

prepare enwik8 data

with open("./MGYG000002546-uvig-560334.txt") as file: X = np.fromstring(file.read(int(95e6)), dtype=np.uint8) trX, vaX = np.split(X, [int(90e6)]) data_train, data_val = torch.from_numpy(trX), torch.from_numpy(vaX)

class TextSamplerDataset(Dataset): def init(self, data, seq_len): super().init() self.data = data self.seq_len = seq_len

def __getitem__(self, index):
    rand_start = torch.randint(0, self.data.size(0) - self.seq_len, (1,))
    full_seq = self.data[rand_start : rand_start + self.seq_len + 1].long()
    return full_seq  # .cuda()

def __len__(self):
    return self.data.size(0) // self.seq_len

train_dataset = TextSamplerDataset(data_train, SEQ_LEN) val_dataset = TextSamplerDataset(data_val, SEQ_LEN) train_loader = cycle(DataLoader(train_dataset, batch_size=BATCH_SIZE)) val_loader = cycle(DataLoader(val_dataset, batch_size=BATCH_SIZE))

optimizer

optim = StableAdamWUnfused(model.parameters(), lr=LEARNING_RATE)

training

for i in tqdm.tqdm(range(NUM_BATCHES), mininterval=10.0, desc="training"): model.train()

for __ in range(GRADIENT_ACCUMULATE_EVERY):
    loss = model(next(train_loader))
    loss.backward()

print(f"training loss: {loss.item()}")
torch.nn.utils.clip_grad_norm_(model.parameters(), 0.5)
optim.step()
optim.zero_grad()

if i % VALIDATE_EVERY == 0:
    model.eval()
    with torch.no_grad():
        loss = model(next(val_loader))
        print(f"validation loss: {loss.item()}")

if i % GENERATE_EVERY == 0:
    model.eval()
    inp = random.choice(val_dataset)[:-1]
    prime = decode_tokens(inp)
    print("%s \n\n %s", (prime, "*" * 100))

    sample = model.generate(inp[None, ...], GENERATE_LENGTH)
    output_str = decode_tokens(sample[0])
    print(output_str)`

ZTYyy avatar Dec 23 '23 07:12 ZTYyy

The error is in model.cuda you can take that off or say model.to("cpu")

kyegomez avatar Dec 23 '23 15:12 kyegomez

Thank you, but I try both and got the same error.

ZTYyy avatar Dec 23 '23 15:12 ZTYyy

@ZTYyy can you please show me the stack trace

kyegomez avatar Dec 23 '23 20:12 kyegomez

Sorry, I don't know how to give you more trace.

2023-12-24 12:16:41,972 - root - ERROR - forward() takes 2 positional arguments but 4 were given Traceback (most recent call last): File "/public/home/wangycgroup/public/02_Data/Internal/phage/train.py", line 91, in loss = model(next(train_loader)) File "/public/home/wangycgroup/wangjn/software/miniconda3/envs/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/public/home/wangycgroup/wangjn/software/miniconda3/envs/venv/lib/python3.9/site-packages/long_net/model.py", line 325, in forward logits = self.net(x_inp, **kwargs) File "/public/home/wangycgroup/wangjn/software/miniconda3/envs/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/public/home/wangycgroup/wangjn/software/miniconda3/envs/venv/lib/python3.9/site-packages/long_net/model.py", line 271, in forward x = self.transformer(x) File "/public/home/wangycgroup/wangjn/software/miniconda3/envs/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/public/home/wangycgroup/wangjn/software/miniconda3/envs/venv/lib/python3.9/site-packages/long_net/model.py", line 244, in forward x = block(x) + x File "/public/home/wangycgroup/wangjn/software/miniconda3/envs/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/public/home/wangycgroup/wangjn/software/miniconda3/envs/venv/lib/python3.9/site-packages/long_net/model.py", line 205, in forward attn = self.attn(q, k, v) File "/public/home/wangycgroup/wangjn/software/miniconda3/envs/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) TypeError: forward() takes 2 positional arguments but 4 were given

ZTYyy avatar Dec 24 '23 04:12 ZTYyy

I want to try using my genomic data to train this model, because it is the only model I have found that allows for complete input of a genome (I am using a bacterial genome with a length of around 5 million base pairs).

ZTYyy avatar Dec 25 '23 03:12 ZTYyy

I think the problem might be in the line "attn = self.attn(q, k, v)" in model.py. "self.attn" is a DilatedAttention class, and its forward() can only accept one input value "def forward(self, x: torch.Tensor) -> torch.Tensor:". But three are given here.

ZTYyy avatar Dec 25 '23 06:12 ZTYyy

@ZTYyy Please attempt to run your script once more, I believe the error has been eliminated

kyegomez avatar Jan 07 '24 03:01 kyegomez