OutEffHop how to load and use Transformers version model

@robinzixuan Hello authors, I came across this arXiv paper which mentions the use of this model and Iwould like to know how to use this model to reproduce the retrieval results in the paper.

Specifically, I'm looking into the magicslabnu/OutEffHop_bert_base (the one used in the paper?) model card, from HuggingFace Transformers model Hub. Could you provide instructions on how to load and use this model (w/ Transformers package), and to reproduce the results in the abovementioned paper?

Thank you.!

Jun 05 '24 05:06 schmidtj3

Thanks for your message, we will upload the model file to Hugging Face next week. And you can use the model file from Hugging Face in a short period. However, for now, you still can reproduce the result. For example, as we mentioned the Attention is a special case of the Hopfield, and BERT model is based on the Attention Architecture, so you can simply change the Vanilla SoftMax to Softmax_1 in BERT model, then you get the OutEffHop version of the BERT model. After that you can reproduce your results from the Hugging Face checkpoints (load the model from the Hugging Face with the changed Architecture). If you have more question, welcome to directly contact me with [email protected]

Jun 05 '24 06:06 robinzixuan

Thanks for your message, we will upload the model file to Hugging Face next week. And you can use the model file from Hugging Face in a short period. However, for now, you still can reproduce the result. For example, as we mentioned the Attention is a special case of the Hopfield, and BERT model is based on the Attention Architecture, so you can simply change the Vanilla SoftMax to Softmax_1 in BERT model, then you get the OutEffHop version of the BERT model. After that you can reproduce your results from the Hugging Face checkpoints (load the model from the Hugging Face with the changed Architecture). If you have more question, welcome to directly contact me with [email protected]

Thank you for your response! Could you also list down the steps to load the model on HuggingFace Hub? And just to check does it mean that model weights that reproduce the retrieval results in the paper will be uploaded to HuggingFace Hub next week? thank you!

Jun 05 '24 07:06 schmidtj3

Sorry for confusion, the model weight of the model is on Hugging Face. But as you known, if we directly load the model weight into our model, the Transformers will give us the Vanilla Version of the model (BERT), so we should change the code for that

Jun 05 '24 07:06 robinzixuan

modeling_bert.py.zip You can use this model directly for OutEffHop version BERT model. In our experiment, we use the hooks to replace the softmax to softmax_1. You can also use like that.

Jun 05 '24 08:06 robinzixuan

Sorry for the delay of the update on Hugging Face, because this week I am qualify exam. `if model_args.model_name_or_path: torch_dtype = ( model_args.torch_dtype if model_args.torch_dtype in ["auto", None] else getattr(torch, model_args.torch_dtype) ) model = AutoModelForMaskedLM.from_pretrained( model_args.model_name_or_path, from_tf=bool(".ckpt" in model_args.model_name_or_path), config=config, cache_dir=model_args.cache_dir, revision=model_args.model_revision, token=model_args.token, trust_remote_code=model_args.trust_remote_code, torch_dtype=torch_dtype, low_cpu_mem_usage=model_args.low_cpu_mem_usage, ) else: logger.info("Training new model from scratch") model = AutoModelForMaskedLM.from_config(config, trust_remote_code=model_args.trust_remote_code)

# >> replace Self-attention module with ours
# NOTE: currently assumes BERT
for layer_idx in range(len(model.bert.encoder.layer)):
    old_self = model.bert.encoder.layer[layer_idx].attention.self
    print("----------------------------------------------------------")
    print("Inside BERT custom attention")
    print("----------------------------------------------------------")
    new_self = BertUnpadSelfAttentionWithExtras(
        config,
        position_embedding_type=None,
        softmax_fn=SOFTMAX_MAPPING["softmax1"],
        ssm_eps=None,
        tau=None,
        max_seq_length=data_args.max_seq_length,
        skip_attn=False,
        fine_tuning=False,
    )

    # copy loaded weights
    if model_args.model_name_or_path is not None:
        new_self.load_state_dict(old_self.state_dict(), strict=False)
    model.bert.encoder.layer[layer_idx].attention.self = new_self
print(model)`

Jun 05 '24 08:06 robinzixuan

@robinzixuan Thanks for inclduing these implementations! Would it be possible to also provide the finetuned weights (HR w/ training in paper) for reproducing the retrieval results?

Jun 05 '24 08:06 schmidtj3

I think you can find the related code on theory verification

Jun 05 '24 18:06 robinzixuan