fairseq Problem in the generation with a transformer model trained with Iterative Product Quantization

🐛 Bug

I am not able to generate the translation with a transformer model trained according to the Iterative Product Quantization instructions (https://github.com/pytorch/fairseq/tree/master/examples/quant_noise)

From the error below, it seems that the model expects a no-quantized model but reads parameters from a quantized-checkpoint.

Who can help me to solve the problem?

To Reproduce

Steps to reproduce the behavior (always include the command you ran):

Preprocess any training dataset

fairseq-preprocess --source-lang en --target-lang it     --trainpref $TEXT/train --validpref $TEXT/valid --testpref $TEXT/test     --destdir data-bin/prova_en__it

Train a 1-layer transfomer with Quant-Noise

CUDA_VISIBLE_DEVICES=0 fairseq-train data-bin/prova_en__it/ --task translation --arch transformer --max-tokens 512 --optimizer adam --encoder-layers 1  --decoder-layers 1 --max-update 100 --quant-noise-pq 0.1 --quant-noise-pq-block-size 8

Apply Iterative Product Quantization

CUDA_VISIBLE_DEVICES=0 fairseq-train data-bin/prova_en__it/ --task translation --arch transformer  --save-dir checkpoints/transformer_finetuned --max-tokens 512 --optimizer adam --encoder-layers 1  --decoder-layers 1 --max-update 200   --quantization-config-path ./transformer_quantization_config.yaml --restore-file   checkpoints/model.pt

Generate translation

CUDA_VISIBLE_DEVICES=0 fairseq-generate data-bin/prova_en__it/     --path checkpoints/transformer_finetuned/checkpoint_best.pt     --batch-size 128 --beam 5

See error

2020-12-03 14:04:48 | INFO | fairseq_cli.generate | Namespace(all_gather_list_size=16384, batch_size=12, batch_size_valid=12, beam=5, bf16=False, bpe=None, broadcast_buffers=False, bucket_cap_mb=25, checkpoint_shard_count=1, checkpoint_suffix='', constraints=None, cpu=False, criterion='cross_entropy', curriculum=0, data='data-bin/prova_en__it/', data_buffer_size=10, dataset_impl=None, ddp_backend='c10d', decoding_format=None, device_id=0, disable_validation=False, distributed_backend='nccl', distributed_init_method=None, distributed_no_spawn=False, distributed_port=-1, distributed_rank=0, distributed_world_size=1, distributed_wrapper='DDP', diverse_beam_groups=-1, diverse_beam_strength=0.5, diversity_rate=-1.0, empty_cache_freq=0, eval_bleu=False, eval_bleu_args=None, eval_bleu_detok='space', eval_bleu_detok_args=None, eval_bleu_print_samples=False, eval_bleu_remove_bpe=None, eval_tokenized_bleu=False, fast_stat_sync=False, find_unused_parameters=False, fix_batches_to_gpus=False, fixed_validation_seed=None, force_anneal=None, fp16=False, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=None, gen_subset='test', iter_decode_eos_penalty=0.0, iter_decode_force_max_iter=False, iter_decode_max_iter=10, iter_decode_with_beam=1, iter_decode_with_external_reranker=False, left_pad_source='True', left_pad_target='False', lenpen=1, lm_path=None, lm_weight=0.0, load_alignments=False, local_rank=0, localsgd_frequency=3, log_format=None, log_interval=100, lr_scheduler='fixed', lr_shrink=0.1, match_source_len=False, max_len_a=0, max_len_b=200, max_source_positions=1024, max_target_positions=1024, max_tokens=None, max_tokens_valid=None, memory_efficient_bf16=False, memory_efficient_fp16=False, min_len=1, min_loss_scale=0.0001, model_overrides='{}', model_parallel_size=1, nbest=1, no_beamable_mm=False, no_early_stop=False, no_progress_bar=False, no_repeat_ngram_size=0, no_seed_provided=False, nprocs_per_node=2, num_batch_buckets=0, num_shards=1, num_workers=1, optimizer=None, path='checkpoints/transformer_finetuned/checkpoint_best.pt', pipeline_balance=None, pipeline_checkpoint='never', pipeline_chunks=0, pipeline_decoder_balance=None, pipeline_decoder_devices=None, pipeline_devices=None, pipeline_encoder_balance=None, pipeline_encoder_devices=None, pipeline_model_parallel=False, prefix_size=0, print_alignment=False, print_step=False, profile=False, quantization_config_path=None, quiet=False, remove_bpe=None, replace_unk=None, required_batch_size_multiple=8, required_seq_len_multiple=1, results_path=None, retain_dropout=False, retain_dropout_modules=None, retain_iter_history=False, sacrebleu=False, sampling=False, sampling_topk=-1, sampling_topp=-1.0, score_reference=False, scoring='bleu', seed=1, shard_id=0, skip_invalid_size_inputs_valid_test=False, slowmo_algorithm='LocalSGD', slowmo_momentum=None, source_lang=None, target_lang=None, task='translation', temperature=1.0, tensorboard_logdir=None, threshold_loss_scale=None, tokenizer=None, tpu=False, train_subset='train', truncate_source=False, unkpen=0, unnormalized=False, upsample_primary=1, user_dir=None, valid_subset='valid', validate_after_updates=0, validate_interval=1, validate_interval_updates=0, warmup_updates=0, zero_sharding='none')
2020-12-03 14:04:48 | INFO | fairseq.tasks.translation | [en] dictionary: 6784 types
2020-12-03 14:04:48 | INFO | fairseq.tasks.translation | [it] dictionary: 8240 types
2020-12-03 14:04:48 | INFO | fairseq.data.data_utils | loaded 24 examples from: data-bin/prova_en__it/test.en-it.en
2020-12-03 14:04:48 | INFO | fairseq.data.data_utils | loaded 24 examples from: data-bin/prova_en__it/test.en-it.it
2020-12-03 14:04:48 | INFO | fairseq.tasks.translation | data-bin/prova_en__it/ test en-it 24 examples
2020-12-03 14:04:48 | INFO | fairseq_cli.generate | loading model(s) from checkpoints/transformer_finetuned/checkpoint_best.pt
2020-12-03 14:04:49 | INFO | fairseq.quantization_utils | quantize_model_scalar args:Namespace(activation_dropout=0.0, activation_fn='relu', adam_betas='(0.9, 0.999)', adam_eps=1e-08, adaptive_input=False, adaptive_softmax_cutoff=None, adaptive_softmax_dropout=0, all_gather_list_size=16384, arch='transformer', attention_dropout=0.0, batch_size=None, batch_size_valid=None, best_checkpoint_metric='loss', bf16=False, bpe=None, broadcast_buffers=False, bucket_cap_mb=25, checkpoint_shard_count=1, checkpoint_suffix='', clip_norm=25.0, cpu=False, criterion='cross_entropy', cross_self_attention=False, curriculum=0, data='data-bin/prova_en__it/', data_buffer_size=10, dataset_impl=None, ddp_backend='c10d', decoder_attention_heads=8, decoder_embed_dim=512, decoder_embed_path=None, decoder_ffn_embed_dim=2048, decoder_input_dim=512, decoder_layerdrop=0, decoder_layers=1, decoder_layers_to_keep=None, decoder_learned_pos=False, decoder_normalize_before=False, decoder_output_dim=512, device_id=0, disable_validation=False, distributed_backend='nccl', distributed_init_method=None, distributed_no_spawn=False, distributed_num_procs=1, distributed_port=-1, distributed_rank=0, distributed_world_size=1, distributed_wrapper='DDP', dropout=0.1, empty_cache_freq=0, encoder_attention_heads=8, encoder_embed_dim=512, encoder_embed_path=None, encoder_ffn_embed_dim=2048, encoder_layerdrop=0, encoder_layers=1, encoder_layers_to_keep=None, encoder_learned_pos=False, encoder_normalize_before=False, eval_bleu=False, eval_bleu_args=None, eval_bleu_detok='space', eval_bleu_detok_args=None, eval_bleu_print_samples=False, eval_bleu_remove_bpe=None, eval_tokenized_bleu=False, fast_stat_sync=False, find_unused_parameters=False, finetune_from_model=None, fix_batches_to_gpus=False, fixed_validation_seed=None, force_anneal=None, fp16=False, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=None, gen_subset='test', keep_best_checkpoints=-1, keep_interval_updates=-1, keep_last_epochs=-1, layernorm_embedding=False, left_pad_source=True, left_pad_target=False, load_alignments=False, local_rank=0, localsgd_frequency=3, log_format=None, log_interval=100, lr=[0.25], lr_scheduler='fixed', lr_shrink=0.1, max_epoch=0, max_source_positions=1024, max_target_positions=1024, max_tokens=512, max_tokens_valid=512, max_update=300, maximize_best_checkpoint_metric=False, memory_efficient_bf16=False, memory_efficient_fp16=False, min_loss_scale=0.0001, min_lr=-1.0, model_parallel_size=1, no_cross_attention=False, no_epoch_checkpoints=False, no_last_checkpoints=False, no_progress_bar=False, no_save=False, no_save_optimizer_state=False, no_scale_embedding=False, no_seed_provided=False, no_token_positional_embeddings=False, nprocs_per_node=1, num_batch_buckets=0, num_shards=1, num_workers=1, optimizer='adam', optimizer_overrides='{}', patience=-1, pipeline_balance=None, pipeline_checkpoint='never', pipeline_chunks=0, pipeline_decoder_balance=None, pipeline_decoder_devices=None, pipeline_devices=None, pipeline_encoder_balance=None, pipeline_encoder_devices=None, pipeline_model_parallel=False, profile=False, quant_noise_pq=0, quant_noise_pq_block_size=8, quant_noise_scalar=0, quantization_config_path='./transformer_quantization_config.yaml', required_batch_size_multiple=8, required_seq_len_multiple=1, reset_dataloader=False, reset_lr_scheduler=False, reset_meters=False, reset_optimizer=False, restore_file='checkpoint_last.pt', save_dir='checkpoints/transformer_finetuned', save_interval=1, save_interval_updates=0, scoring='bleu', seed=1, sentence_avg=False, shard_id=0, share_all_embeddings=False, share_decoder_input_output_embed=False, skip_invalid_size_inputs_valid_test=False, slowmo_algorithm='LocalSGD', slowmo_momentum=None, source_lang='en', stop_time_hours=0, target_lang='it', task='translation', tensorboard_logdir=None, threshold_loss_scale=None, tie_adaptive_weights=False, tokenizer=None, tpu=False, train_subset='train', truncate_source=False, update_freq=[1], upsample_primary=1, use_bmuf=False, use_old_adam=False, user_dir=None, valid_subset='valid', validate_after_updates=0, validate_interval=1, validate_interval_updates=0, warmup_updates=0, weight_decay=0.0, zero_sharding='none')
2020-12-03 14:04:49 | INFO | fairseq.quantization_utils | quantize_model_scalar quant_noise_scalar:0
Traceback (most recent call last):
  File "/home/ubuntu/workspace/experiments/QUANTIZED_TRASFORMER/fairseq_0.10.0/bin/fairseq-generate", line 8, in <module>
    sys.exit(cli_main())
  File "/home/ubuntu/workspace/experiments/QUANTIZED_TRASFORMER/fairseq_0.10.0/lib/python3.6/site-packages/fairseq_cli/generate.py", line 379, in cli_main
    main(args)
  File "/home/ubuntu/workspace/experiments/QUANTIZED_TRASFORMER/fairseq_0.10.0/lib/python3.6/site-packages/fairseq_cli/generate.py", line 41, in main
    return _main(args, sys.stdout)
  File "/home/ubuntu/workspace/experiments/QUANTIZED_TRASFORMER/fairseq_0.10.0/lib/python3.6/site-packages/fairseq_cli/generate.py", line 94, in _main
    num_shards=args.checkpoint_shard_count,
  File "/home/ubuntu/workspace/experiments/QUANTIZED_TRASFORMER/fairseq_0.10.0/lib/python3.6/site-packages/fairseq/checkpoint_utils.py", line 256, in load_model_ensemble
    num_shards,
  File "/home/ubuntu/workspace/experiments/QUANTIZED_TRASFORMER/fairseq_0.10.0/lib/python3.6/site-packages/fairseq/checkpoint_utils.py", line 287, in load_model_ensemble_and_task
    model.load_state_dict(state["model"], strict=strict, args=args)
  File "/home/ubuntu/workspace/experiments/QUANTIZED_TRASFORMER/fairseq_0.10.0/lib/python3.6/site-packages/fairseq/models/fairseq_model.py", line 98, in load_state_dict
    return super().load_state_dict(new_state_dict, strict)
  File "/home/ubuntu/workspace/experiments/QUANTIZED_TRASFORMER/fairseq_0.10.0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1052, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for TransformerModel:
	Missing key(s) in state_dict: "encoder.embed_tokens.weight", "decoder.embed_tokens.weight", "decoder.layers.0.self_attn.k_proj.weight", "decoder.layers.0.self_attn.v_proj.weight", "decoder.layers.0.self_attn.q_proj.weight", "decoder.layers.0.self_attn.out_proj.weight", "decoder.layers.0.fc1.weight", "decoder.layers.0.fc2.weight". 
	Unexpected key(s) in state_dict: "encoder.embed_tokens.centroids", "encoder.embed_tokens.assignments", "encoder.embed_tokens.counts", "decoder.embed_tokens.centroids", "decoder.embed_tokens.assignments", "decoder.embed_tokens.counts", "decoder.layers.0.self_attn.k_proj.centroids", "decoder.layers.0.self_attn.k_proj.assignments", "decoder.layers.0.self_attn.k_proj.counts", "decoder.layers.0.self_attn.v_proj.centroids", "decoder.layers.0.self_attn.v_proj.assignments", "decoder.layers.0.self_attn.v_proj.counts", "decoder.layers.0.self_attn.q_proj.centroids", "decoder.layers.0.self_attn.q_proj.assignments", "decoder.layers.0.self_attn.q_proj.counts", "decoder.layers.0.self_attn.out_proj.centroids", "decoder.layers.0.self_attn.out_proj.assignments", "decoder.layers.0.self_attn.out_proj.counts", "decoder.layers.0.fc1.centroids", "decoder.layers.0.fc1.assignments", "decoder.layers.0.fc1.counts", "decoder.layers.0.fc2.centroids", "decoder.layers.0.fc2.assignments", "decoder.layers.0.fc2.counts".

Code sample

I am using standard fairseq code

Expected behavior

The expected output is the standard output similar to that I get using the non-quantized mdoel

S-0	I think that internal security .....
T-0	Ritengo che la sicurezza ....
H-0	-2.4014294147491455	impresa impresa ....
D-0	-2.4014294147491455	impresa impresa ....
P-0	-1.5273 -1.4997 -1.4997 ......

Environment

fairseq Version: 0.10.0
PyTorch Version: 1.7.0
numpy version: 1.19.4
OS: Ubuntu 18.04.5 LTS (x86_64)
How you installed fairseq: pip
Python version: 3.6 (64-bit runtime)
CUDA Version: 11.0
Nvidia driver version: 450.66
cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5
GPU models and configuration: 2 x GeForce RTX 2080 Ti

Dec 03 '20 14:12 nicolabertoldi

@nicolabertoldi did you manage to solve this problem? I'm stuck with the same..

Dec 26 '20 11:12 lsawaniewski

I'm troubled by it, too.

Jun 10 '21 11:06 shepherd233

I'am troubled by it, too lol

Aug 15 '22 02:08 manliu1225

lol,me too, please inform me if you solved it...

Apr 09 '23 08:04 miko8422