The model massively overfits
Hi,
thanks a lot for releasing a 3rd party implementation of the paper. Neverthless, I'm afraid there is some problem with your code or at least the hyperparameters are not chosen well. This can be shown by looking at the validation error as follows:
n_epoch = 100
batches_per_epoch = 100
data_test = gen(n_objects, True)
losses = []
losses_test = []
for epoch in range(n_epoch):
for _ in range(batches_per_epoch):
objects, sender_relations, receiver_relations, relation_info, target = get_batch(data, 30)
predicted = interaction_network(objects, sender_relations, receiver_relations, relation_info)
loss = criterion(predicted, target)
optimizer.zero_grad()
loss.backward()
optimizer.step()
losses.append(np.sqrt(loss.data[0]))
objects, sender_relations, receiver_relations, relation_info, target = get_batch(data_test, 30)
predicted = interaction_network(objects, sender_relations, receiver_relations, relation_info)
losses_test.append(np.sqrt(criterion(predicted, target).data[0]))
clear_output(True)
plt.figure(figsize=(20,5))
plt.subplot(131)
plt.title('Epoch %s RMS Train Error %s' % (epoch, np.mean(losses[-100:])))
plt.plot(losses)
plt.subplot(132)
plt.title('Epoch %s RMS Test Error %s' % (epoch, np.mean(losses_test[-100:])))
plt.plot(losses_test)
plt.show()
I got train RMS 2.3 but validation RMS 209.6. Update: This is mostly because you train on just a single "scene", so that the network actually never sees any other masses and thus cannot generalise to those.
Hi @mys007 and @higgsfield . By chance have you get further with this?
I wonder how this network is used once trained. This is my attempt:
n_steps = len(data)
# Preallocate space for pos and velocity and get the first value.
speed_predictions = torch.from_numpy(np.zeros_like(data[:, :, 3:]))
pos_predictions = torch.from_numpy(np.zeros_like(data[:, :, 1:3]))
prev_state = torch.Tensor(data[0:1]).cuda()
with torch.no_grad():
for ii in range(n_steps):
speed_prediction = interaction_network(prev_state, sender_relations_1, receiver_relations_1, relation_info_1)
pos_prediction = prev_state[0, :, 1:3] + speed_prediction * diff_t
speed_predictions[ii] = speed_prediction
pos_predictions[ii] = pos_prediction
prev_state[0, :, 1:3] = pos_prediction
prev_state[0, :, 3:] = speed_prediction
ii = 1
plt.plot(data[:, ii, 1], data[:, ii, 2], label='real')
plt.plot(pos_predictions[:, ii, 0], pos_predictions[:, ii, 1], label='real')
In a few words, since we are predicting the speed, I assume that we need to build the next state by adding the speed * time to he previous position. But like this, the result is awful...
Since I'm using the same data used for training, if it is completely overfitting this result should be good.
Hi @rpicatoste . I haven't been working on this since. But just to give you two tips for debugging :
- You can have a look if even the first prediction is right. If not exactly, then the integrated measurements likely aren't part of the training set.
- You can try to modify the code to predict positions instead of speed.