Video-guided-Machine-Translation icon indicating copy to clipboard operation
Video-guided-Machine-Translation copied to clipboard

Evaluation Error : RuntimeError: rnn: hx is not contiguous

Open LinuxBeginner opened this issue 5 years ago • 6 comments

Training was successful. Data: vatex_training_v1.0.json vatex_validation_v1.0.json vatex_public_test_english_v1.1.json

System: Google Colab GPU

When I tried to run the python eval.py , it is showing the following error

Vocab size src/tgt:10523/2907 train/val/test size: 254/30/59 ************ Start eval... ************ Use epoch 34 as the best model for testing Traceback (most recent call last): File "eval.py", line 123, in main(args) File "eval.py", line 63, in main eval(test_loader, encoder, decoder, cp_file, tok_tgt, result_path) File "eval.py", line 90, in eval preds, pred_lengths = decoder.beam_decoding(srccap, init_hidden, src_out, vid_out, args.MAX_INPUT_LENGTH, beam_size=5) File "/content/drive/My Drive/MMT/MMTvatex/Video-guided-Machine-Translation/model.py", line 208, in beam_decoding output, hidden_i, attn_weights = self.onestep(output, hidden_i, src_out_i, vid_out_i, src_mask_i) File "/content/drive/My Drive/MMT/MMTvatex/Video-guided-Machine-Translation/model.py", line 110, in onestep output, hidden = self.decoder(rnn_input, last_hidden) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in call result = self.forward(*input, **kwargs) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/rnn.py", line 570, in forward self.dropout, self.training, self.bidirectional, self.batch_first) RuntimeError: rnn: hx is not contiguous

Could you please tell me why is this happening? Thank you.

LinuxBeginner avatar Jun 06 '20 04:06 LinuxBeginner

Hi, can you try calling contiguous() for the inputs before feeding them into the decoder LSTM? The code is working on our end, so we cannot debug it.

eric-xw avatar Jun 08 '20 17:06 eric-xw

Hi eric, contiguous() is already implemented at line 169-173 in model.py


 src_out_i = src_out[i].unsqueeze(0).expand(beam_size, src_out.size(1), src_out.size(2)).contiguous() # (bs, seq_len, N)
 vid_out_i = vid_out[i].unsqueeze(0).expand(beam_size, vid_out.size(1), vid_out.size(2)).contiguous()
src_mask_i = src_mask[i].unsqueeze(0).expand(beam_size, src_mask.size(1)).contiguous()
hidden_i = [_[:, i, :].unsqueeze(1).expand(_.size(0), beam_size, _.size(2)).contiguous() for _ in
                            hidden] # (n_layers, bs, 1024)

But, it is still not working, there was no issue at the time of training. The issue is showing only on running the eval.py Please advice.

LinuxBeginner avatar Jun 09 '20 04:06 LinuxBeginner

Reading the error log, the issue is when calling the LSTM in Line 110. So try calling contiguous() for rnn_input, last_hidden.

eric-xw avatar Jun 09 '20 05:06 eric-xw

before this line output, hidden_i, attn_weights = self.onestep(output, hidden_i, src_out_i, vid_out_i, src_mask_i), I add .contiguous() after output and hidden_i as follows: output = torch.from_numpy(outputs).cuda().contiguous() def from_numpy(self, states): return [torch.from_numpy(state).cuda().contiguous() for state in states] it works. Apart from this, I find the code in beam_decoding is very hard for me to figure out. It is hugly different with the code in inference, which I thought they may be similar before.
The second output, hidden_i, attn_weights = self.onestep(output, hidden_i, src_out_i, vid_out_i, src_mask_i) may can be deleted.

bozhenhhu avatar Aug 03 '20 15:08 bozhenhhu

@bozhenhhu I've tried the method you've suggested, but the code still does not work:( `import math import torch import random import numpy as np from torch import nn import torch.nn.functional as F

from utils import sos_idx, eos_idx

class SoftDotAttention(nn.Module): def init(self, dim_ctx, dim_h): '''Initialize layer.''' super(SoftDotAttention, self).init() self.linear_in = nn.Linear(dim_h, dim_ctx, bias=False) self.sm = nn.Softmax(dim=1)

def forward(self, context, h, mask=None):
    '''Propagate h through the network.
    h: batch x dim
    context: batch x seq_len x dim
    mask: batch x seq_len indices to be masked
    '''
    target = self.linear_in(h).unsqueeze(2)  # batch x dim x 1
    # Get attention
    attn = torch.bmm(context, target).squeeze(2)  # batch x seq_len
    if mask is not None:
        # -Inf masking prior to the softmax
        attn.data.masked_fill_(mask, -float('inf'))
    attn = self.sm(attn)
    attn3 = attn.view(attn.size(0), 1, attn.size(1))  # batch x 1 x seq_len
    weighted_ctx = torch.bmm(attn3, context) # batch x dim
    return weighted_ctx, attn

class Encoder(nn.Module): def init(self, vocab_size, embed_size, hidden_size, n_layers=2, dropout=0.5): super(Encoder, self).init() self.hidden_size = hidden_size self.embed_size = embed_size self.src_embed = nn.Embedding(vocab_size, embed_size) self.src_encoder = nn.LSTM(input_size=embed_size, hidden_size=hidden_size // 2, num_layers=n_layers, dropout=dropout, batch_first=True, bidirectional=True)

    self.frame_embed = nn.Linear(1024, self.embed_size)
    self.video_encoder = nn.LSTM(input_size=embed_size, hidden_size=hidden_size // 2, num_layers=n_layers,
                                 dropout=dropout, batch_first=True, bidirectional=True)

    self.dropout = nn.Dropout(dropout, inplace=True)

def forward(self, src, vid, src_hidden=None, vid_hidden=None):
    batch_size = src.size(0)

    src_embedded = self.src_embed(src)
    src_out, src_states = self.src_encoder(src_embedded, src_hidden)
    src_h = src_states[0].permute(1, 0, 2).contiguous().view(
        batch_size, 2, -1).permute(1, 0, 2)
    src_c = src_states[1].permute(1, 0, 2).contiguous().view(
        batch_size, 2, -1).permute(1, 0, 2)

    vid_embedded = self.frame_embed(vid)
    vid_out, vid_states = self.video_encoder(vid_embedded, vid_hidden)

    vid_h = vid_states[0].permute(1, 0, 2).contiguous().view(
        batch_size, 2, -1).permute(1, 0, 2)
    vid_c = vid_states[0].permute(1, 0, 2).contiguous().view(
        batch_size, 2, -1).permute(1, 0, 2)

    init_h = torch.cat((src_h, vid_h), 2)
    init_c = torch.cat((src_c, vid_c), 2)

    return src_out, (init_h, init_c), vid_out

class Decoder(nn.Module): def init(self, embed_size, hidden_size, vocab_size, n_layers=2, dropout=0.5): super(Decoder, self).init() self.embed_size = embed_size self.hidden_size = hidden_size self.n_layers = n_layers self.vocab_size = vocab_size

    self.embed = nn.Embedding(vocab_size, embed_size)
    self.dropout = nn.Dropout(dropout, inplace=True)
    self.src_attention = SoftDotAttention(embed_size, hidden_size)
    self.vid_attention = SoftDotAttention(embed_size, hidden_size)

    self.decoder = nn.LSTM(embed_size*3, hidden_size,
                      n_layers, dropout=dropout, batch_first=True)

    self.fc = nn.Sequential(nn.Linear(self.hidden_size, self.embed_size),
                               nn.Tanh(),
                               nn.Dropout(p=dropout),
                               nn.Linear(embed_size, vocab_size))

def onestep(self, input, last_hidden, src_out, vid_out, src_mask):
    '''
    input: (B,)
    '''
    # Get the embedding of the current input word (last output word)
    embedded = self.embed(input).unsqueeze(1)  # (B,1, N)
    embedded = self.dropout(embedded)
    # Calculate attention weights and apply to encoder outputs
    src_ctx, src_attn = self.src_attention(src_out, last_hidden[0][0], mask=src_mask) # src_ctx: (mb, 1, dim) attn: (mb, 1, seqlen)
    vid_ctx, vid_attn = self.vid_attention(vid_out, last_hidden[0][0])
    # Combine embedded input word and attended context, run through RNN
    rnn_input = torch.cat([embedded, src_ctx, vid_ctx], 2) # (mb, 1, input_size)

    output, hidden = self.decoder(rnn_input, last_hidden) 
    output = output.squeeze(1)  # (B, 1, N) -> (B,N)
    output = self.fc(output)
    return output, hidden, (src_attn, vid_attn)

def forward(self, src, trg, init_hidden, src_out, vid_out, max_len, teacher_forcing_ratio):
    batch_size = trg.size(0)
    src_mask = (src == 0) # mask paddings.

    outputs = torch.zeros(batch_size, max_len, self.vocab_size).cuda()

    hidden = (init_hidden[0][:self.n_layers].contiguous(), init_hidden[1][:self.n_layers].contiguous())
    output = trg.data[:, 0] # <sos>
    for t in range(1, max_len):
        output, hidden, attn_weights = self.onestep(output, hidden, src_out, vid_out, src_mask) # (mb, vocab) (1, mb, N) (mb, 1, seqlen)
        outputs[:, t, :] = output
        is_teacher = random.random() < teacher_forcing_ratio
        top1 = output.data.max(1)[1]
        output = (trg.data[:, t] if is_teacher else top1).cuda() # output should be indices to feed into nn.embedding at next step
    return outputs

def inference(self, src, trg, init_hidden, src_out, vid_out, max_len, teacher_forcing_ratio=0):
    '''
    Greedy decoding
    '''
    batch_size = trg.size(0)
    src_mask = (src == 0) # mask paddings.

    outputs = torch.zeros(batch_size, max_len, self.vocab_size).cuda()

    hidden = (init_hidden[0][:self.n_layers].contiguous(), init_hidden[1][:self.n_layers].contiguous())
    output = trg.data[:, 0] # <sos>
    pred_lengths = [0]*batch_size
    for t in range(1, max_len):
        output, hidden, attn_weights = self.onestep(output, hidden, src_out, vid_out, src_mask) # (mb, vocab) (1, mb, N) (mb, 1, seqlen)
        outputs[:, t, :] = output
        is_teacher = random.random() < teacher_forcing_ratio
        top1 = output.data.max(1)[1]

        output = (trg.data[:, t] if is_teacher else top1).cuda()

        for i in range(batch_size):
            if output[i]==3 and pred_lengths[i]==0:
                pred_lengths[i] = t
    for i in range(batch_size):
        if pred_lengths[i]==0:
            pred_lengths[i] = max_len
    return outputs, pred_lengths

def beam_decoding(self, src, init_hidden, src_out, vid_out, max_len, beam_size=5):
    batch_size = src.size(0)
    src_mask = (src == 0) # mask padding
    hidden = (init_hidden[0][:self.n_layers].contiguous(), init_hidden[1][:self.n_layers].contiguous())

    seq = torch.LongTensor(max_len, batch_size).zero_()
    seq_log_probs = torch.FloatTensor(max_len, batch_size)

    for i in range(batch_size):
        # treat the problem as having a batch size of beam_size
        src_out_i = src_out[i].unsqueeze(0).expand(beam_size, src_out.size(1), src_out.size(2)).contiguous() # (bs, seq_len, N)
        vid_out_i = vid_out[i].unsqueeze(0).expand(beam_size, vid_out.size(1), vid_out.size(2)).contiguous()
        src_mask_i = src_mask[i].unsqueeze(0).expand(beam_size, src_mask.size(1)).contiguous()
        hidden_i = [_[:, i, :].unsqueeze(1).expand(_.size(0), beam_size, _.size(2)).contiguous() for _ in
                        hidden] # (n_layers, bs, 1024)
        
        output = torch.LongTensor([sos_idx] * beam_size).cuda()
        
        output, hidden_i, attn_weights = self.onestep(output, hidden_i, src_out_i, vid_out_i, src_mask_i) # (mb, vocab) (1, mb, N) (mb, 1, seqlen)
        log_probs = F.log_softmax(output, dim=1)
        log_probs[:, -1] = log_probs[:, -1] - 1000
        neg_log_probs = -log_probs

        all_outputs = np.ones((1, beam_size), dtype='int32')
        all_masks = np.ones_like(all_outputs, dtype="float32")
        all_costs = np.zeros_like(all_outputs, dtype="float32")
        
        for j in range(max_len):
            if all_masks[-1].sum() == 0:
                break

            next_costs = (
                all_costs[-1, :, None] + neg_log_probs.data.cpu().numpy() * all_masks[-1, :, None])
            (finished,) = np.where(all_masks[-1] == 0)
            next_costs[finished, 1:] = np.inf

            (indexes, outputs), chosen_costs = self._smallest(
                next_costs, beam_size, only_first_row=j == 0)
            

            new_state_d = [_.data.cpu().numpy()[:, indexes, :]
                           for _ in hidden_i]

            all_outputs = all_outputs[:, indexes]
            all_masks = all_masks[:, indexes]
            all_costs = all_costs[:, indexes]

            output = torch.from_numpy(outputs).cuda().contiguous()
            hidden_i = self.from_numpy(new_state_d)
            output, hidden_i, attn_weights = self.onestep(output, hidden_i, src_out_i, vid_out_i, src_mask_i)
            log_probs = F.log_softmax(output, dim=1)

            log_probs[:, -1] = log_probs[:, -1] - 1000
            neg_log_probs = -log_probs

            all_outputs = np.vstack([all_outputs, outputs[None, :]])
            all_costs = np.vstack([all_costs, chosen_costs[None, :]])
            mask = outputs != 0
            all_masks = np.vstack([all_masks, mask[None, :]])

        all_outputs = all_outputs[1:]
        all_costs = all_costs[1:] - all_costs[:-1]
        all_masks = all_masks[:-1]
        costs = all_costs.sum(axis=0)
        lengths = all_masks.sum(axis=0)
        normalized_cost = costs / lengths
        best_idx = np.argmin(normalized_cost)
        seq[:all_outputs.shape[0], i] = torch.from_numpy(
            all_outputs[:, best_idx])
        seq_log_probs[:all_costs.shape[0], i] = torch.from_numpy(
            all_costs[:, best_idx])

    seq, seq_log_probs = seq.transpose(0, 1), seq_log_probs.transpose(0, 1)

    pred_lengths = [0]*batch_size
    for i in range(batch_size):
        if sum(seq[i] == eos_idx) == 0:
            pred_lengths[i] = max_len
        else:
            pred_lengths[i] = (seq[i] == eos_idx).nonzero()[0][0]
    # return the samples and their log likelihoods
    return seq, pred_lengths # seq_log_probs 

def from_numpy(self, states):
    return [torch.from_numpy(state).cuda().contiguous() for state in states]

@staticmethod
def _smallest(matrix, k, only_first_row=False):
    if only_first_row:
        flatten = matrix[:1, :].flatten()
    else:
        flatten = matrix.flatten()
    args = np.argpartition(flatten, k)[:k]
    args = args[np.argsort(flatten[args])]
    return np.unravel_index(args, matrix.shape), flatten[args]

` This is the code I ran. What did I do wrong?

hynbjn avatar Nov 21 '22 06:11 hynbjn

@bozhenhhu I've tried the method you've suggested, but the code still does not work:( `import math import torch import random import numpy as np from torch import nn import torch.nn.functional as F

from utils import sos_idx, eos_idx

class SoftDotAttention(nn.Module): def init(self, dim_ctx, dim_h): '''Initialize layer.''' super(SoftDotAttention, self).init() self.linear_in = nn.Linear(dim_h, dim_ctx, bias=False) self.sm = nn.Softmax(dim=1)

def forward(self, context, h, mask=None):
    '''Propagate h through the network.
    h: batch x dim
    context: batch x seq_len x dim
    mask: batch x seq_len indices to be masked
    '''
    target = self.linear_in(h).unsqueeze(2)  # batch x dim x 1
    # Get attention
    attn = torch.bmm(context, target).squeeze(2)  # batch x seq_len
    if mask is not None:
        # -Inf masking prior to the softmax
        attn.data.masked_fill_(mask, -float('inf'))
    attn = self.sm(attn)
    attn3 = attn.view(attn.size(0), 1, attn.size(1))  # batch x 1 x seq_len
    weighted_ctx = torch.bmm(attn3, context) # batch x dim
    return weighted_ctx, attn

class Encoder(nn.Module): def init(self, vocab_size, embed_size, hidden_size, n_layers=2, dropout=0.5): super(Encoder, self).init() self.hidden_size = hidden_size self.embed_size = embed_size self.src_embed = nn.Embedding(vocab_size, embed_size) self.src_encoder = nn.LSTM(input_size=embed_size, hidden_size=hidden_size // 2, num_layers=n_layers, dropout=dropout, batch_first=True, bidirectional=True)

    self.frame_embed = nn.Linear(1024, self.embed_size)
    self.video_encoder = nn.LSTM(input_size=embed_size, hidden_size=hidden_size // 2, num_layers=n_layers,
                                 dropout=dropout, batch_first=True, bidirectional=True)

    self.dropout = nn.Dropout(dropout, inplace=True)

def forward(self, src, vid, src_hidden=None, vid_hidden=None):
    batch_size = src.size(0)

    src_embedded = self.src_embed(src)
    src_out, src_states = self.src_encoder(src_embedded, src_hidden)
    src_h = src_states[0].permute(1, 0, 2).contiguous().view(
        batch_size, 2, -1).permute(1, 0, 2)
    src_c = src_states[1].permute(1, 0, 2).contiguous().view(
        batch_size, 2, -1).permute(1, 0, 2)

    vid_embedded = self.frame_embed(vid)
    vid_out, vid_states = self.video_encoder(vid_embedded, vid_hidden)

    vid_h = vid_states[0].permute(1, 0, 2).contiguous().view(
        batch_size, 2, -1).permute(1, 0, 2)
    vid_c = vid_states[0].permute(1, 0, 2).contiguous().view(
        batch_size, 2, -1).permute(1, 0, 2)

    init_h = torch.cat((src_h, vid_h), 2)
    init_c = torch.cat((src_c, vid_c), 2)

    return src_out, (init_h, init_c), vid_out

class Decoder(nn.Module): def init(self, embed_size, hidden_size, vocab_size, n_layers=2, dropout=0.5): super(Decoder, self).init() self.embed_size = embed_size self.hidden_size = hidden_size self.n_layers = n_layers self.vocab_size = vocab_size

    self.embed = nn.Embedding(vocab_size, embed_size)
    self.dropout = nn.Dropout(dropout, inplace=True)
    self.src_attention = SoftDotAttention(embed_size, hidden_size)
    self.vid_attention = SoftDotAttention(embed_size, hidden_size)

    self.decoder = nn.LSTM(embed_size*3, hidden_size,
                      n_layers, dropout=dropout, batch_first=True)

    self.fc = nn.Sequential(nn.Linear(self.hidden_size, self.embed_size),
                               nn.Tanh(),
                               nn.Dropout(p=dropout),
                               nn.Linear(embed_size, vocab_size))

def onestep(self, input, last_hidden, src_out, vid_out, src_mask):
    '''
    input: (B,)
    '''
    # Get the embedding of the current input word (last output word)
    embedded = self.embed(input).unsqueeze(1)  # (B,1, N)
    embedded = self.dropout(embedded)
    # Calculate attention weights and apply to encoder outputs
    src_ctx, src_attn = self.src_attention(src_out, last_hidden[0][0], mask=src_mask) # src_ctx: (mb, 1, dim) attn: (mb, 1, seqlen)
    vid_ctx, vid_attn = self.vid_attention(vid_out, last_hidden[0][0])
    # Combine embedded input word and attended context, run through RNN
    rnn_input = torch.cat([embedded, src_ctx, vid_ctx], 2) # (mb, 1, input_size)

    output, hidden = self.decoder(rnn_input, last_hidden) 
    output = output.squeeze(1)  # (B, 1, N) -> (B,N)
    output = self.fc(output)
    return output, hidden, (src_attn, vid_attn)

def forward(self, src, trg, init_hidden, src_out, vid_out, max_len, teacher_forcing_ratio):
    batch_size = trg.size(0)
    src_mask = (src == 0) # mask paddings.

    outputs = torch.zeros(batch_size, max_len, self.vocab_size).cuda()

    hidden = (init_hidden[0][:self.n_layers].contiguous(), init_hidden[1][:self.n_layers].contiguous())
    output = trg.data[:, 0] # <sos>
    for t in range(1, max_len):
        output, hidden, attn_weights = self.onestep(output, hidden, src_out, vid_out, src_mask) # (mb, vocab) (1, mb, N) (mb, 1, seqlen)
        outputs[:, t, :] = output
        is_teacher = random.random() < teacher_forcing_ratio
        top1 = output.data.max(1)[1]
        output = (trg.data[:, t] if is_teacher else top1).cuda() # output should be indices to feed into nn.embedding at next step
    return outputs

def inference(self, src, trg, init_hidden, src_out, vid_out, max_len, teacher_forcing_ratio=0):
    '''
    Greedy decoding
    '''
    batch_size = trg.size(0)
    src_mask = (src == 0) # mask paddings.

    outputs = torch.zeros(batch_size, max_len, self.vocab_size).cuda()

    hidden = (init_hidden[0][:self.n_layers].contiguous(), init_hidden[1][:self.n_layers].contiguous())
    output = trg.data[:, 0] # <sos>
    pred_lengths = [0]*batch_size
    for t in range(1, max_len):
        output, hidden, attn_weights = self.onestep(output, hidden, src_out, vid_out, src_mask) # (mb, vocab) (1, mb, N) (mb, 1, seqlen)
        outputs[:, t, :] = output
        is_teacher = random.random() < teacher_forcing_ratio
        top1 = output.data.max(1)[1]

        output = (trg.data[:, t] if is_teacher else top1).cuda()

        for i in range(batch_size):
            if output[i]==3 and pred_lengths[i]==0:
                pred_lengths[i] = t
    for i in range(batch_size):
        if pred_lengths[i]==0:
            pred_lengths[i] = max_len
    return outputs, pred_lengths

def beam_decoding(self, src, init_hidden, src_out, vid_out, max_len, beam_size=5):
    batch_size = src.size(0)
    src_mask = (src == 0) # mask padding
    hidden = (init_hidden[0][:self.n_layers].contiguous(), init_hidden[1][:self.n_layers].contiguous())

    seq = torch.LongTensor(max_len, batch_size).zero_()
    seq_log_probs = torch.FloatTensor(max_len, batch_size)

    for i in range(batch_size):
        # treat the problem as having a batch size of beam_size
        src_out_i = src_out[i].unsqueeze(0).expand(beam_size, src_out.size(1), src_out.size(2)).contiguous() # (bs, seq_len, N)
        vid_out_i = vid_out[i].unsqueeze(0).expand(beam_size, vid_out.size(1), vid_out.size(2)).contiguous()
        src_mask_i = src_mask[i].unsqueeze(0).expand(beam_size, src_mask.size(1)).contiguous()
        hidden_i = [_[:, i, :].unsqueeze(1).expand(_.size(0), beam_size, _.size(2)).contiguous() for _ in
                        hidden] # (n_layers, bs, 1024)
        
        output = torch.LongTensor([sos_idx] * beam_size).cuda()
        
        output, hidden_i, attn_weights = self.onestep(output, hidden_i, src_out_i, vid_out_i, src_mask_i) # (mb, vocab) (1, mb, N) (mb, 1, seqlen)
        log_probs = F.log_softmax(output, dim=1)
        log_probs[:, -1] = log_probs[:, -1] - 1000
        neg_log_probs = -log_probs

        all_outputs = np.ones((1, beam_size), dtype='int32')
        all_masks = np.ones_like(all_outputs, dtype="float32")
        all_costs = np.zeros_like(all_outputs, dtype="float32")
        
        for j in range(max_len):
            if all_masks[-1].sum() == 0:
                break

            next_costs = (
                all_costs[-1, :, None] + neg_log_probs.data.cpu().numpy() * all_masks[-1, :, None])
            (finished,) = np.where(all_masks[-1] == 0)
            next_costs[finished, 1:] = np.inf

            (indexes, outputs), chosen_costs = self._smallest(
                next_costs, beam_size, only_first_row=j == 0)
            

            new_state_d = [_.data.cpu().numpy()[:, indexes, :]
                           for _ in hidden_i]

            all_outputs = all_outputs[:, indexes]
            all_masks = all_masks[:, indexes]
            all_costs = all_costs[:, indexes]

            output = torch.from_numpy(outputs).cuda().contiguous()
            hidden_i = self.from_numpy(new_state_d)
            output, hidden_i, attn_weights = self.onestep(output, hidden_i, src_out_i, vid_out_i, src_mask_i)
            log_probs = F.log_softmax(output, dim=1)

            log_probs[:, -1] = log_probs[:, -1] - 1000
            neg_log_probs = -log_probs

            all_outputs = np.vstack([all_outputs, outputs[None, :]])
            all_costs = np.vstack([all_costs, chosen_costs[None, :]])
            mask = outputs != 0
            all_masks = np.vstack([all_masks, mask[None, :]])

        all_outputs = all_outputs[1:]
        all_costs = all_costs[1:] - all_costs[:-1]
        all_masks = all_masks[:-1]
        costs = all_costs.sum(axis=0)
        lengths = all_masks.sum(axis=0)
        normalized_cost = costs / lengths
        best_idx = np.argmin(normalized_cost)
        seq[:all_outputs.shape[0], i] = torch.from_numpy(
            all_outputs[:, best_idx])
        seq_log_probs[:all_costs.shape[0], i] = torch.from_numpy(
            all_costs[:, best_idx])

    seq, seq_log_probs = seq.transpose(0, 1), seq_log_probs.transpose(0, 1)

    pred_lengths = [0]*batch_size
    for i in range(batch_size):
        if sum(seq[i] == eos_idx) == 0:
            pred_lengths[i] = max_len
        else:
            pred_lengths[i] = (seq[i] == eos_idx).nonzero()[0][0]
    # return the samples and their log likelihoods
    return seq, pred_lengths # seq_log_probs 

def from_numpy(self, states):
    return [torch.from_numpy(state).cuda().contiguous() for state in states]

@staticmethod
def _smallest(matrix, k, only_first_row=False):
    if only_first_row:
        flatten = matrix[:1, :].flatten()
    else:
        flatten = matrix.flatten()
    args = np.argpartition(flatten, k)[:k]
    args = args[np.argsort(flatten[args])]
    return np.unravel_index(args, matrix.shape), flatten[args]

` This is the code I ran. What did I do wrong?

Do you have the same environment as this repository, like the prerequisites? It has been a long time since this model was published, and many packages have been updated, which may result in incompatibility. Why not try the up-to-date methods?

bozhenhhu avatar Nov 21 '22 11:11 bozhenhhu