FFiNet icon indicating copy to clipboard operation
FFiNet copied to clipboard

Saved model EoF

Open BogdanCiambur opened this issue 2 years ago • 3 comments

Hello,

Very nice work!

When trying to run the MoleculeNet example script, I am receiving a problem after training is complete, in train_graph: line 85,

1033 magic_number = pickle_module.load(f, **pickle_load_args) 1034 if magic_number != MAGIC_NUMBER: 1035 raise RuntimeError("Invalid magic number; corrupt file?")

I am using torch with python version 3.9.12

Additional question, what's the quickest way to use this code in inference mode?

Thank you in advance, B

BogdanCiambur avatar Jun 21 '23 07:06 BogdanCiambur

Hi, B. This error seems to be related to inconsistent save and load processes, it would be more clear if you provide the code where you changed the original code (if you changed it), especially the code about model save. Moreover, I wonder what is your saved model name.

As for the inference mode, here are some codes which may inspire you:

# set model hyper-parameters
model_args = {
    'hidden_dim': 16,
    'hidden_layers': 2,
    'num_heads': 8,
    'activation': nn.PReLU(), 
    'dropout': 0.2,
    'prediction_layers': 1,
    'prediction_dropout': 0.1,
    'prediction_hidden_dim': 256,
}

# create model
input_dim = 66 if task != 'pdbbind' else 65
model = FFiNet(
            feature_per_layer=[input_dim] + [model_args['hidden_dim']] * model_args['hidden_layers'], 
            num_heads=model_args['num_heads'], 
            pred_hidden_dim=model_args['prediction_hidden_dim'], 
            pred_dropout=model_args['prediction_dropout'], 
            pred_layers=model_args['prediction_layers'], 
            activation=model_args['activation'], 
            dropout=model_args['dropout'],
            num_tasks=train_args.num_tasks
        )

# load model parameters
model.load_state_dict(torch.load(saved_model_path)) # saved_model_path: e.g., './saved_models/model.pt'

# load data and make it to a data loader
dataset = torch.load(data_path)  # data_path: e.g. './data/freesolv.pt'
data_loader = DataLoader(dataset, batch_size=1000000000)
batch = next(iter(data_loader)

# run and get results
output = model(batch)

I hope the above information will help you, and I am glad to keep track of this problem.

fate1997 avatar Jun 21 '23 15:06 fate1997

Hello, thank you very much for your reply.

The only changes I've made to the example code were a different input dataset (I've also created some folders that were missing, in /data or /train_evaluate

I am using python 3.8, for a torch version (compatible with Mac M1 CPU), could it be related ?

I also haven't given the model any particular name, is there this option? The saved model is called FFiNetModel_(0).pt

In /train/evaluate there is a csv file called FFiNetModel_(0).csv, I guess this is the output of the last successful epoch before the EOF Error is triggered?

If you think it would be useful I can send the notebook

Thanks again for your time

Best wishes, B

BogdanCiambur avatar Jul 10 '23 14:07 BogdanCiambur

I'm sorry for not getting back to you sooner.

I test the code under python=3.8.3, and it runs OK, so the Python version is not the problem.

As for the saved model name, it is weird that it has no content after "FFiNetModel_", where should be the name of the dataset (i.e. the file name in data_path of the evaluate function). You may be running FFiNet on your own dataset, could you try running the code by default, and see whether it runs OK?

It would be also helpful if you can send me the notebook, and the screenshot of the folder tree.

fate1997 avatar Aug 07 '23 02:08 fate1997