Problem in the generation with a transformer model trained with Iterative Product Quantization
🐛 Bug
I am not able to generate the translation with a transformer model trained according to the Iterative Product Quantization instructions (https://github.com/pytorch/fairseq/tree/master/examples/quant_noise)
From the error below, it seems that the model expects a no-quantized model but reads parameters from a quantized-checkpoint.
Who can help me to solve the problem?
To Reproduce
Steps to reproduce the behavior (always include the command you ran):
- Preprocess any training dataset
fairseq-preprocess --source-lang en --target-lang it --trainpref $TEXT/train --validpref $TEXT/valid --testpref $TEXT/test --destdir data-bin/prova_en__it
- Train a 1-layer transfomer with Quant-Noise
CUDA_VISIBLE_DEVICES=0 fairseq-train data-bin/prova_en__it/ --task translation --arch transformer --max-tokens 512 --optimizer adam --encoder-layers 1 --decoder-layers 1 --max-update 100 --quant-noise-pq 0.1 --quant-noise-pq-block-size 8
- Apply Iterative Product Quantization
CUDA_VISIBLE_DEVICES=0 fairseq-train data-bin/prova_en__it/ --task translation --arch transformer --save-dir checkpoints/transformer_finetuned --max-tokens 512 --optimizer adam --encoder-layers 1 --decoder-layers 1 --max-update 200 --quantization-config-path ./transformer_quantization_config.yaml --restore-file checkpoints/model.pt
- Generate translation
CUDA_VISIBLE_DEVICES=0 fairseq-generate data-bin/prova_en__it/ --path checkpoints/transformer_finetuned/checkpoint_best.pt --batch-size 128 --beam 5
- See error
2020-12-03 14:04:48 | INFO | fairseq_cli.generate | Namespace(all_gather_list_size=16384, batch_size=12, batch_size_valid=12, beam=5, bf16=False, bpe=None, broadcast_buffers=False, bucket_cap_mb=25, checkpoint_shard_count=1, checkpoint_suffix='', constraints=None, cpu=False, criterion='cross_entropy', curriculum=0, data='data-bin/prova_en__it/', data_buffer_size=10, dataset_impl=None, ddp_backend='c10d', decoding_format=None, device_id=0, disable_validation=False, distributed_backend='nccl', distributed_init_method=None, distributed_no_spawn=False, distributed_port=-1, distributed_rank=0, distributed_world_size=1, distributed_wrapper='DDP', diverse_beam_groups=-1, diverse_beam_strength=0.5, diversity_rate=-1.0, empty_cache_freq=0, eval_bleu=False, eval_bleu_args=None, eval_bleu_detok='space', eval_bleu_detok_args=None, eval_bleu_print_samples=False, eval_bleu_remove_bpe=None, eval_tokenized_bleu=False, fast_stat_sync=False, find_unused_parameters=False, fix_batches_to_gpus=False, fixed_validation_seed=None, force_anneal=None, fp16=False, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=None, gen_subset='test', iter_decode_eos_penalty=0.0, iter_decode_force_max_iter=False, iter_decode_max_iter=10, iter_decode_with_beam=1, iter_decode_with_external_reranker=False, left_pad_source='True', left_pad_target='False', lenpen=1, lm_path=None, lm_weight=0.0, load_alignments=False, local_rank=0, localsgd_frequency=3, log_format=None, log_interval=100, lr_scheduler='fixed', lr_shrink=0.1, match_source_len=False, max_len_a=0, max_len_b=200, max_source_positions=1024, max_target_positions=1024, max_tokens=None, max_tokens_valid=None, memory_efficient_bf16=False, memory_efficient_fp16=False, min_len=1, min_loss_scale=0.0001, model_overrides='{}', model_parallel_size=1, nbest=1, no_beamable_mm=False, no_early_stop=False, no_progress_bar=False, no_repeat_ngram_size=0, no_seed_provided=False, nprocs_per_node=2, num_batch_buckets=0, num_shards=1, num_workers=1, optimizer=None, path='checkpoints/transformer_finetuned/checkpoint_best.pt', pipeline_balance=None, pipeline_checkpoint='never', pipeline_chunks=0, pipeline_decoder_balance=None, pipeline_decoder_devices=None, pipeline_devices=None, pipeline_encoder_balance=None, pipeline_encoder_devices=None, pipeline_model_parallel=False, prefix_size=0, print_alignment=False, print_step=False, profile=False, quantization_config_path=None, quiet=False, remove_bpe=None, replace_unk=None, required_batch_size_multiple=8, required_seq_len_multiple=1, results_path=None, retain_dropout=False, retain_dropout_modules=None, retain_iter_history=False, sacrebleu=False, sampling=False, sampling_topk=-1, sampling_topp=-1.0, score_reference=False, scoring='bleu', seed=1, shard_id=0, skip_invalid_size_inputs_valid_test=False, slowmo_algorithm='LocalSGD', slowmo_momentum=None, source_lang=None, target_lang=None, task='translation', temperature=1.0, tensorboard_logdir=None, threshold_loss_scale=None, tokenizer=None, tpu=False, train_subset='train', truncate_source=False, unkpen=0, unnormalized=False, upsample_primary=1, user_dir=None, valid_subset='valid', validate_after_updates=0, validate_interval=1, validate_interval_updates=0, warmup_updates=0, zero_sharding='none')
2020-12-03 14:04:48 | INFO | fairseq.tasks.translation | [en] dictionary: 6784 types
2020-12-03 14:04:48 | INFO | fairseq.tasks.translation | [it] dictionary: 8240 types
2020-12-03 14:04:48 | INFO | fairseq.data.data_utils | loaded 24 examples from: data-bin/prova_en__it/test.en-it.en
2020-12-03 14:04:48 | INFO | fairseq.data.data_utils | loaded 24 examples from: data-bin/prova_en__it/test.en-it.it
2020-12-03 14:04:48 | INFO | fairseq.tasks.translation | data-bin/prova_en__it/ test en-it 24 examples
2020-12-03 14:04:48 | INFO | fairseq_cli.generate | loading model(s) from checkpoints/transformer_finetuned/checkpoint_best.pt
2020-12-03 14:04:49 | INFO | fairseq.quantization_utils | quantize_model_scalar args:Namespace(activation_dropout=0.0, activation_fn='relu', adam_betas='(0.9, 0.999)', adam_eps=1e-08, adaptive_input=False, adaptive_softmax_cutoff=None, adaptive_softmax_dropout=0, all_gather_list_size=16384, arch='transformer', attention_dropout=0.0, batch_size=None, batch_size_valid=None, best_checkpoint_metric='loss', bf16=False, bpe=None, broadcast_buffers=False, bucket_cap_mb=25, checkpoint_shard_count=1, checkpoint_suffix='', clip_norm=25.0, cpu=False, criterion='cross_entropy', cross_self_attention=False, curriculum=0, data='data-bin/prova_en__it/', data_buffer_size=10, dataset_impl=None, ddp_backend='c10d', decoder_attention_heads=8, decoder_embed_dim=512, decoder_embed_path=None, decoder_ffn_embed_dim=2048, decoder_input_dim=512, decoder_layerdrop=0, decoder_layers=1, decoder_layers_to_keep=None, decoder_learned_pos=False, decoder_normalize_before=False, decoder_output_dim=512, device_id=0, disable_validation=False, distributed_backend='nccl', distributed_init_method=None, distributed_no_spawn=False, distributed_num_procs=1, distributed_port=-1, distributed_rank=0, distributed_world_size=1, distributed_wrapper='DDP', dropout=0.1, empty_cache_freq=0, encoder_attention_heads=8, encoder_embed_dim=512, encoder_embed_path=None, encoder_ffn_embed_dim=2048, encoder_layerdrop=0, encoder_layers=1, encoder_layers_to_keep=None, encoder_learned_pos=False, encoder_normalize_before=False, eval_bleu=False, eval_bleu_args=None, eval_bleu_detok='space', eval_bleu_detok_args=None, eval_bleu_print_samples=False, eval_bleu_remove_bpe=None, eval_tokenized_bleu=False, fast_stat_sync=False, find_unused_parameters=False, finetune_from_model=None, fix_batches_to_gpus=False, fixed_validation_seed=None, force_anneal=None, fp16=False, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=None, gen_subset='test', keep_best_checkpoints=-1, keep_interval_updates=-1, keep_last_epochs=-1, layernorm_embedding=False, left_pad_source=True, left_pad_target=False, load_alignments=False, local_rank=0, localsgd_frequency=3, log_format=None, log_interval=100, lr=[0.25], lr_scheduler='fixed', lr_shrink=0.1, max_epoch=0, max_source_positions=1024, max_target_positions=1024, max_tokens=512, max_tokens_valid=512, max_update=300, maximize_best_checkpoint_metric=False, memory_efficient_bf16=False, memory_efficient_fp16=False, min_loss_scale=0.0001, min_lr=-1.0, model_parallel_size=1, no_cross_attention=False, no_epoch_checkpoints=False, no_last_checkpoints=False, no_progress_bar=False, no_save=False, no_save_optimizer_state=False, no_scale_embedding=False, no_seed_provided=False, no_token_positional_embeddings=False, nprocs_per_node=1, num_batch_buckets=0, num_shards=1, num_workers=1, optimizer='adam', optimizer_overrides='{}', patience=-1, pipeline_balance=None, pipeline_checkpoint='never', pipeline_chunks=0, pipeline_decoder_balance=None, pipeline_decoder_devices=None, pipeline_devices=None, pipeline_encoder_balance=None, pipeline_encoder_devices=None, pipeline_model_parallel=False, profile=False, quant_noise_pq=0, quant_noise_pq_block_size=8, quant_noise_scalar=0, quantization_config_path='./transformer_quantization_config.yaml', required_batch_size_multiple=8, required_seq_len_multiple=1, reset_dataloader=False, reset_lr_scheduler=False, reset_meters=False, reset_optimizer=False, restore_file='checkpoint_last.pt', save_dir='checkpoints/transformer_finetuned', save_interval=1, save_interval_updates=0, scoring='bleu', seed=1, sentence_avg=False, shard_id=0, share_all_embeddings=False, share_decoder_input_output_embed=False, skip_invalid_size_inputs_valid_test=False, slowmo_algorithm='LocalSGD', slowmo_momentum=None, source_lang='en', stop_time_hours=0, target_lang='it', task='translation', tensorboard_logdir=None, threshold_loss_scale=None, tie_adaptive_weights=False, tokenizer=None, tpu=False, train_subset='train', truncate_source=False, update_freq=[1], upsample_primary=1, use_bmuf=False, use_old_adam=False, user_dir=None, valid_subset='valid', validate_after_updates=0, validate_interval=1, validate_interval_updates=0, warmup_updates=0, weight_decay=0.0, zero_sharding='none')
2020-12-03 14:04:49 | INFO | fairseq.quantization_utils | quantize_model_scalar quant_noise_scalar:0
Traceback (most recent call last):
File "/home/ubuntu/workspace/experiments/QUANTIZED_TRASFORMER/fairseq_0.10.0/bin/fairseq-generate", line 8, in <module>
sys.exit(cli_main())
File "/home/ubuntu/workspace/experiments/QUANTIZED_TRASFORMER/fairseq_0.10.0/lib/python3.6/site-packages/fairseq_cli/generate.py", line 379, in cli_main
main(args)
File "/home/ubuntu/workspace/experiments/QUANTIZED_TRASFORMER/fairseq_0.10.0/lib/python3.6/site-packages/fairseq_cli/generate.py", line 41, in main
return _main(args, sys.stdout)
File "/home/ubuntu/workspace/experiments/QUANTIZED_TRASFORMER/fairseq_0.10.0/lib/python3.6/site-packages/fairseq_cli/generate.py", line 94, in _main
num_shards=args.checkpoint_shard_count,
File "/home/ubuntu/workspace/experiments/QUANTIZED_TRASFORMER/fairseq_0.10.0/lib/python3.6/site-packages/fairseq/checkpoint_utils.py", line 256, in load_model_ensemble
num_shards,
File "/home/ubuntu/workspace/experiments/QUANTIZED_TRASFORMER/fairseq_0.10.0/lib/python3.6/site-packages/fairseq/checkpoint_utils.py", line 287, in load_model_ensemble_and_task
model.load_state_dict(state["model"], strict=strict, args=args)
File "/home/ubuntu/workspace/experiments/QUANTIZED_TRASFORMER/fairseq_0.10.0/lib/python3.6/site-packages/fairseq/models/fairseq_model.py", line 98, in load_state_dict
return super().load_state_dict(new_state_dict, strict)
File "/home/ubuntu/workspace/experiments/QUANTIZED_TRASFORMER/fairseq_0.10.0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1052, in load_state_dict
self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for TransformerModel:
Missing key(s) in state_dict: "encoder.embed_tokens.weight", "decoder.embed_tokens.weight", "decoder.layers.0.self_attn.k_proj.weight", "decoder.layers.0.self_attn.v_proj.weight", "decoder.layers.0.self_attn.q_proj.weight", "decoder.layers.0.self_attn.out_proj.weight", "decoder.layers.0.fc1.weight", "decoder.layers.0.fc2.weight".
Unexpected key(s) in state_dict: "encoder.embed_tokens.centroids", "encoder.embed_tokens.assignments", "encoder.embed_tokens.counts", "decoder.embed_tokens.centroids", "decoder.embed_tokens.assignments", "decoder.embed_tokens.counts", "decoder.layers.0.self_attn.k_proj.centroids", "decoder.layers.0.self_attn.k_proj.assignments", "decoder.layers.0.self_attn.k_proj.counts", "decoder.layers.0.self_attn.v_proj.centroids", "decoder.layers.0.self_attn.v_proj.assignments", "decoder.layers.0.self_attn.v_proj.counts", "decoder.layers.0.self_attn.q_proj.centroids", "decoder.layers.0.self_attn.q_proj.assignments", "decoder.layers.0.self_attn.q_proj.counts", "decoder.layers.0.self_attn.out_proj.centroids", "decoder.layers.0.self_attn.out_proj.assignments", "decoder.layers.0.self_attn.out_proj.counts", "decoder.layers.0.fc1.centroids", "decoder.layers.0.fc1.assignments", "decoder.layers.0.fc1.counts", "decoder.layers.0.fc2.centroids", "decoder.layers.0.fc2.assignments", "decoder.layers.0.fc2.counts".
Code sample
I am using standard fairseq code
Expected behavior
The expected output is the standard output similar to that I get using the non-quantized mdoel
S-0 I think that internal security .....
T-0 Ritengo che la sicurezza ....
H-0 -2.4014294147491455 impresa impresa ....
D-0 -2.4014294147491455 impresa impresa ....
P-0 -1.5273 -1.4997 -1.4997 ......
Environment
- fairseq Version: 0.10.0
- PyTorch Version: 1.7.0
- numpy version: 1.19.4
- OS: Ubuntu 18.04.5 LTS (x86_64)
- How you installed fairseq: pip
- Python version: 3.6 (64-bit runtime)
- CUDA Version: 11.0
- Nvidia driver version: 450.66
- cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5
- GPU models and configuration: 2 x GeForce RTX 2080 Ti
@nicolabertoldi did you manage to solve this problem? I'm stuck with the same..
I'm troubled by it, too.
I'am troubled by it, too lol
lol,me too, please inform me if you solved it...