nvcc Unsupported gpu architecture error

Open rkoystart opened this issue 4 years ago • 4 comments

Hi, i have installed , lightseq, fairseq, sacremoses using the following command

pip install lightseq==2.0.2
pip install fairseq
pip install sacremoses
conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=11.0 -c pytorch

After all these installation when i run

sh examples/training/fairseq/ls_fairseq_wmt14en2de.sh

I am getting the following error.

+ THIS_DIR=/data/rkoy/lightseq/lightseq/examples/training/fairseq                                                                                                                           [271/1821]
+ cd /data/rkoy/lightseq/lightseq/examples/training/fairseq/../../..                                                                                                                                  
+ lightseq-train wmt14_en_de/ --task translation --arch ls_transformer_wmt_en_de_big --optimizer ls_adam --adam-betas (0.9, 0.98) --clip-norm 0.0 --lr 5e-4 --lr-scheduler inverse_sqrt --warmup-updates 40
00 --weight-decay 0.0001 --criterion ls_label_smoothed_cross_entropy --label-smoothing 0.1 --max-tokens 8192 --eval-bleu --eval-bleu-args {"beam": 5, "max_len_a": 1.2, "max_len_b": 10} --eval-bleu-detok 
moses --eval-bleu-remove-bpe --eval-bleu-print-samples --best-checkpoint-metric bleu --maximize-best-checkpoint-metric                                                                                     
2021-09-27 22:26:20 | INFO | fairseq.distributed_utils | distributed init (rank 0): tcp://localhost:11236                                                                                                  
2021-09-27 22:26:20 | INFO | fairseq.distributed_utils | distributed init (rank 5): tcp://localhost:11236                                                                                                  
2021-09-27 22:26:20 | INFO | fairseq.distributed_utils | distributed init (rank 7): tcp://localhost:11236                                                                                                  
2021-09-27 22:26:21 | INFO | fairseq.distributed_utils | distributed init (rank 4): tcp://localhost:11236                                                                                                  
2021-09-27 22:26:23 | INFO | fairseq.distributed_utils | distributed init (rank 2): tcp://localhost:11236                                                                                                  
2021-09-27 22:26:24 | INFO | fairseq.distributed_utils | distributed init (rank 3): tcp://localhost:11236                                                                                                  
2021-09-27 22:26:24 | INFO | fairseq.distributed_utils | distributed init (rank 6): tcp://localhost:11236                                                                                                  
2021-09-27 22:26:24 | INFO | fairseq.distributed_utils | distributed init (rank 1): tcp://localhost:11236                                                                                                  
2021-09-27 22:26:29 | INFO | fairseq.distributed_utils | initialized host rkoy-gpu-machine as rank 6                                                                                                            
2021-09-27 22:26:29 | INFO | fairseq.distributed_utils | initialized host rkoy-gpu-machine as rank 7                                                                                                            
2021-09-27 22:26:29 | INFO | fairseq.distributed_utils | initialized host rkoy-gpu-machine as rank 0                                                                                                            
2021-09-27 22:26:29 | INFO | fairseq.distributed_utils | initialized host rkoy-gpu-machine as rank 2                                                                                                            
2021-09-27 22:26:29 | INFO | fairseq.distributed_utils | initialized host rkoy-gpu-machine as rank 1                                                                                                            
2021-09-27 22:26:29 | INFO | fairseq.distributed_utils | initialized host rkoy-gpu-machine as rank 3
2021-09-27 22:26:29 | INFO | fairseq.distributed_utils | initialized host rkoy-gpu-machine as rank 5
2021-09-27 22:26:29 | INFO | fairseq.distributed_utils | initialized host rkoy-gpu-machine as rank 4
2021-09-27 22:26:29 | INFO | fairseq_cli.train | Namespace(activation_dropout=0.0, activation_fn='relu', adam_betas='(0.9, 0.98)', adam_eps=1e-08, adaptive_input=False, adaptive_softmax_cutoff=None, adap
tive_softmax_dropout=0, all_gather_list_size=16384, arch='ls_transformer_wmt_en_de_big', attention_dropout=0.1, batch_size=None, batch_size_valid=None, best_checkpoint_metric='bleu', bf16=False, bpe=None
, broadcast_buffers=False, bucket_cap_mb=25, checkpoint_activations=False, checkpoint_shard_count=1, checkpoint_suffix='', clip_norm=0.0, cpu=False, criterion='ls_label_smoothed_cross_entropy', cross_sel
f_attention=False, curriculum=0, data='wmt14_en_de/', data_buffer_size=10, dataset_impl=None, ddp_backend='c10d', decoder_attention_heads=16, decoder_embed_dim=1024, decoder_embed_path=None, decoder_ffn_
embed_dim=4096, decoder_input_dim=1024, decoder_layerdrop=0, decoder_layers=6, decoder_layers_to_keep=None, decoder_learned_pos=False, decoder_normalize_before=False, decoder_output_dim=1024, device_id=0
, disable_validation=False, distributed_backend='nccl', distributed_init_method='tcp://localhost:11236', distributed_no_spawn=False, distributed_num_procs=8, distributed_port=-1, distributed_rank=0, dist
ributed_world_size=8, distributed_wrapper='DDP', dropout=0.3, empty_cache_freq=0, encoder_attention_heads=16, encoder_embed_dim=1024, encoder_embed_path=None, encoder_ffn_embed_dim=4096, encoder_layerdro
p=0, encoder_layers=6, encoder_layers_to_keep=None, encoder_learned_pos=False, encoder_normalize_before=False, eval_bleu=True, eval_bleu_args='{"beam": 5, "max_len_a": 1.2, "max_len_b": 10}', eval_bleu_d
etok='moses', eval_bleu_detok_args=None, eval_bleu_print_samples=True, eval_bleu_remove_bpe='@@ ', eval_tokenized_bleu=False, fast_stat_sync=False, find_unused_parameters=False, finetune_from_model=None,
 fix_batches_to_gpus=False, fixed_validation_seed=None, fp16=False, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=None, gen_subset='test', ignore_prefix_si
ze=0, keep_best_checkpoints=-1, keep_interval_updates=-1, keep_last_epochs=-1, label_smoothing=0.1, layernorm_embedding=False, left_pad_source='True', left_pad_target='False', load_alignments=False, loca
lsgd_frequency=3, log_format=None, log_interval=100, lr=[0.0005], lr_scheduler='inverse_sqrt', max_epoch=0, max_source_positions=1024, max_target_positions=1024, max_tokens=8192, max_tokens_valid=8192, m
ax_update=0, maximize_best_checkpoint_metric=True, memory_efficient_bf16=False, memory_efficient_fp16=False, min_loss_scale=0.0001, min_lr=-1.0, min_params_to_wrap=100000000, model_parallel_size=1, no_cr
oss_attention=False, no_epoch_checkpoints=False, no_last_checkpoints=False, no_progress_bar=False, no_save=False, no_save_optimizer_state=False, no_scale_embedding=False, no_seed_provided=False, no_token
_positional_embeddings=False, nprocs_per_node=8, num_batch_buckets=0, num_shards=1, num_workers=1, offload_activations=False, optimizer='ls_adam', optimizer_overrides='{}', patience=-1, pipeline_balance=
None, pipeline_checkpoint='never', pipeline_chunks=0, pipeline_decoder_balance=None, pipeline_decoder_devices=None, pipeline_devices=None, pipeline_encoder_balance=None, pipeline_encoder_devices=None, pi
peline_model_parallel=False, profile=False, quant_noise_pq=0, quant_noise_pq_block_size=8, quant_noise_scalar=0, quantization_config_path=None, report_accuracy=False, required_batch_size_multiple=8, requ
ired_seq_len_multiple=1, reset_dataloader=False, reset_lr_scheduler=False, reset_meters=False, reset_optimizer=False, restore_file='checkpoint_last.pt', save_dir='checkpoints', save_interval=1, save_inte
rval_updates=0, scoring='bleu', seed=1, sentence_avg=False, shard_id=0, share_all_embeddings=False, share_decoder_input_output_embed=False, skip_invalid_size_inputs_valid_test=False, slowmo_algorithm='Lo
calSGD', slowmo_momentum=None, source_lang=None, stop_time_hours=0, target_lang=None, task='translation', tensorboard_logdir=None, threshold_loss_scale=None, tie_adaptive_weights=False, tokenizer=None, t
pu=False, train_subset='train', truncate_source=False, update_freq=[1], upsample_primary=1, use_bmuf=False, use_old_adam=False, user_dir='/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packag
es/examples/training/fairseq/fs_modules', valid_subset='valid', validate_after_updates=0, validate_interval=1, validate_interval_updates=0, warmup_init_lr=-1, warmup_updates=4000, weight_decay=0.0001, ze
ro_sharding='none')
2021-09-27 22:26:29 | INFO | fairseq.tasks.translation | [en] dictionary: 40480 types
2021-09-27 22:26:29 | INFO | fairseq.tasks.translation | [de] dictionary: 42720 types
2021-09-27 22:26:29 | INFO | fairseq.data.data_utils | loaded 39414 examples from: wmt14_en_de/valid.en-de.en
2021-09-27 22:26:29 | INFO | fairseq.data.data_utils | loaded 39414 examples from: wmt14_en_de/valid.en-de.de
2021-09-27 22:26:29 | INFO | fairseq.tasks.translation | wmt14_en_de/ valid en-de 39414 examples
Using /home/rkoy/.cache/torch_extensions as PyTorch extensions root...
Using /home/rkoy/.cache/torch_extensions as PyTorch extensions root...
Using /home/rkoy/.cache/torch_extensions as PyTorch extensions root...
Using /home/rkoy/.cache/torch_extensions as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/rkoy/.cache/torch_extensions/lightseq_layers/build.ninja...
Building extension module lightseq_layers...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
Using /home/rkoy/.cache/torch_extensions as PyTorch extensions root...
[1/15] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi101
1\" -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/c
src/ops/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/torc
h/csrc/api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include
/THC -isystem /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO
_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__C
UDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels
/cublas_wrappers.cu -o cublas_wrappers.cuda.o 
FAILED: cublas_wrappers.cuda.o 
/usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/
data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/ops
/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/torch/csrc/
api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/THC -i
system /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_
OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_
HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/cublas
_wrappers.cu -o cublas_wrappers.cuda.o 
nvcc fatal   : Unsupported gpu architecture 'compute_86'
[2/15] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi101
1\" -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/c
src/ops/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/torc
h/csrc/api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include
/THC -isystem /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO
_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__C
UDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels
/transform_kernels.cu -o transform_kernels.cuda.o  
FAILED: transform_kernels.cuda.o 
/usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/
data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/ops
/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/torch/csrc/
api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/THC -i
system /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_
OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_
HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/transf
orm_kernels.cu -o transform_kernels.cuda.o 
nvcc fatal   : Unsupported gpu architecture 'compute_86'
[3/15] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi101
1\" -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/c
src/ops/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/torc
h/csrc/api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include
/THC -isystem /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO
_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__C
UDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels
/dropout_kernels.cu -o dropout_kernels.cuda.o 
FAILED: dropout_kernels.cuda.o                                                                                                                                                                   [169/1821]
/usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/
data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/ops
/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/torch/csrc/
api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/THC -i
system /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_
OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_
HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/dropou
t_kernels.cu -o dropout_kernels.cuda.o 
nvcc fatal   : Unsupported gpu architecture 'compute_86'
[4/15] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi101
1\" -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/c
src/ops/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/torc
h/csrc/api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include
/THC -isystem /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO
_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__C
UDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels
/normalize_kernels.cu -o normalize_kernels.cuda.o  
FAILED: normalize_kernels.cuda.o 
/usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/
data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/ops
/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/torch/csrc/
api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/THC -i
system /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_
OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_
HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/normal
ize_kernels.cu -o normalize_kernels.cuda.o 
nvcc fatal   : Unsupported gpu architecture 'compute_86'
[5/15] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi101
1\" -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/c
src/ops/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/torc
h/csrc/api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include
/THC -isystem /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO
_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__C
UDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels
/softmax_kernels.cu -o softmax_kernels.cuda.o 
FAILED: softmax_kernels.cuda.o 
/usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/
data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/ops
/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/torch/csrc/
api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/THC -i
system /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_
OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_
HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/softma
x_kernels.cu -o softmax_kernels.cuda.o 
nvcc fatal   : Unsupported gpu architecture 'compute_86'
[6/15] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"[123/1821]
1\" -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/c
src/ops/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/torc
h/csrc/api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include
/THC -isystem /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO
_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__C
UDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels
/cuda_util.cu -o cuda_util.cuda.o 
FAILED: cuda_util.cuda.o 
/usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/
data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/ops
/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/torch/csrc/
api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/THC -i
system /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_
OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_
HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/cuda_u
til.cu -o cuda_util.cuda.o 
nvcc fatal   : Unsupported gpu architecture 'compute_86'
[7/15] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi101
1\" -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/c
src/ops/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/torc
h/csrc/api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include
/THC -isystem /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO
_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__C
UDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels
/general_kernels.cu -o general_kernels.cuda.o 
FAILED: general_kernels.cuda.o 
/usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/
data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/ops
/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/torch/csrc/
api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/THC -i
system /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_
OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_
HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/genera
l_kernels.cu -o general_kernels.cuda.o 
nvcc fatal   : Unsupported gpu architecture 'compute_86'
[8/15] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi101
1\" -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/c
src/ops/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/torc
h/csrc/api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include
/THC -isystem /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO
_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__C
UDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels
/embedding_kernels.cu -o embedding_kernels.cuda.o  
FAILED: embedding_kernels.cuda.o 
/usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/
data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/ops
/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/torch/csrc/
api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/THC -i
system /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_
OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_
HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/embedd
ing_kernels.cu -o embedding_kernels.cuda.o 
nvcc fatal   : Unsupported gpu architecture 'compute_86'                                                                                                                                          [70/1821]
[9/15] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi101
1\" -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/c
src/ops/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/torc
h/csrc/api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include
/THC -isystem /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO
_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__C
UDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels
/cross_entropy.cu -o cross_entropy.cuda.o 
FAILED: cross_entropy.cuda.o 
/usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/
data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/ops
/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/torch/csrc/
api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/THC -i
system /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_
OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_
HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/cross_
entropy.cu -o cross_entropy.cuda.o 
nvcc fatal   : Unsupported gpu architecture 'compute_86'
Using /home/rkoy/.cache/torch_extensions as PyTorch extensions root...
Using /home/rkoy/.cache/torch_extensions as PyTorch extensions root...
Using /home/rkoy/.cache/torch_extensions as PyTorch extensions root...
[10/15] c++ -MMD -MF cross_entropy_layer.o.d -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI
=\"_cxxabi1011\" -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lights
eq/training/csrc/ops/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch
/include/torch/csrc/api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/
torch/include/THC -isystem /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -std=c++14 -g -Wno-reorder -c /data/r
avi-9151/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/ops/cross_entropy_layer.cpp -o cross_entropy_layer.o 
[11/15] c++ -MMD -MF transformer_embedding_layer.o.d -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_B
UILD_ABI=\"_cxxabi1011\" -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-package
s/lightseq/training/csrc/ops/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packag
es/torch/include/torch/csrc/api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-p
ackages/torch/include/THC -isystem /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -std=c++14 -g -Wno-reorder -c
 /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/ops/transformer_embedding_layer.cpp -o transformer_embedding_layer.o 
[12/15] c++ -MMD -MF transformer_encoder_layer.o.d -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUI
LD_ABI=\"_cxxabi1011\" -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/
lightseq/training/csrc/ops/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages
/torch/include/torch/csrc/api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-pac
kages/torch/include/THC -isystem /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -std=c++14 -g -Wno-reorder -c /
data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/ops/transformer_encoder_layer.cpp -o transformer_encoder_layer.o 
[13/15] c++ -MMD -MF transformer_decoder_layer.o.d -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUI
LD_ABI=\"_cxxabi1011\" -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/
lightseq/training/csrc/ops/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages
/torch/include/torch/csrc/api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-pac
kages/torch/include/THC -isystem /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -std=c++14 -g -Wno-reorder -c /
data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/ops/transformer_decoder_layer.cpp -o transformer_decoder_layer.o 
[14/15] c++ -MMD -MF pybind_op.o.d -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi
1011\" -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/trainin
g/csrc/ops/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/t
orch/csrc/api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/incl
ude/THC -isystem /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -std=c++14 -g -Wno-reorder -c /data/rkoy/a
naconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/torch/pybind_op.cpp -o pybind_op.o 
ninja: build stopped: subcommand failed.
Loading extension module lightseq_layers...                                                                                                                                                       [17/1821]
Loading extension module lightseq_layers...
Loading extension module lightseq_layers...
Loading extension module lightseq_layers...
Loading extension module lightseq_layers...
Loading extension module lightseq_layers...
Loading extension module lightseq_layers...
Traceback (most recent call last):
  File "/data/rkoy/anaconda3/envs/lightseq/bin/lightseq-train", line 8, in <module>
    sys.exit(ls_cli_main())
  File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/examples/training/fairseq/lightseq_fairseq_train_cli.py", line 10, in ls_cli_main
    cli_main(*args, **kwargs)
  File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/fairseq_cli/train.py", line 352, in cli_main
    distributed_utils.call_main(args, main)
  File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/fairseq/distributed_utils.py", line 283, in call_main
    torch.multiprocessing.spawn(
  File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 199, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 157, in start_processes
    while not context.join():
  File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 118, in join
    raise Exception(msg)
Exception: 

-- Process 1 terminated with the following error:
Traceback (most recent call last):
  File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1533, in _run_ninja_build
    subprocess.run(
  File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
    fn(i, *args)
  File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/fairseq/distributed_utils.py", line 270, in distributed_main
    main(args, **kwargs)
  File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/fairseq_cli/train.py", line 68, in main
    model = task.build_model(args)
  File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/fairseq/tasks/translation.py", line 327, in build_model
    model = super().build_model(args)
  File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/fairseq/tasks/fairseq_task.py", line 547, in build_model
    model = models.build_model(args, self)
  File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/fairseq/models/__init__.py", line 58, in build_model
    return ARCH_MODEL_REGISTRY[model_cfg.arch].build_model(model_cfg, task)
  File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/examples/training/fairseq/fs_modules/ls_transformer.py", line 136, in build_model
    encoder_embed_tokens = cls.build_embedding(
  File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/examples/training/fairseq/fs_modules/ls_transformer.py", line 159, in build_embedding
    emb = LSTransformerEmbeddingLayer(config)
  File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/ops/pytorch/transformer_embedding_layer.py", line 96, in __init__
    transformer_cuda_module = TransformerBuilder().load()
  File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/ops/pytorch/builder/builder.py", line 203, in load
    return self.jit_load(verbose)
  File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/ops/pytorch/builder/builder.py", line 231, in jit_load
    op_module = load(
  File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 986, in load
    return _jit_compile(
  File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1193, in _jit_compile
    _write_ninja_file_and_build_library(
  File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1297, in _write_ninja_file_and_build_library
    _run_ninja_build(
  File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1555, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension 'lightseq_layers'

And these are the details of my gpu

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.91.03    Driver Version: 460.91.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  RTX A6000           Off  | 00000000:01:00.0 Off |                  Off |
| 30%   53C    P2    71W / 300W |    363MiB / 48685MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  RTX A6000           Off  | 00000000:25:00.0 Off |                  Off |
| 30%   36C    P8    26W / 300W |      2MiB / 48685MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

This is the content of sh examples/training/fairseq/ls_fairseq_wmt14en2de.sh

#!/usr/bin/env bash
set -ex
THIS_DIR=$(dirname $(readlink -f $0))
cd $THIS_DIR/../../..

#if [ ! -d "/tmp/wmt14_en_de" ]; then
#    echo "Downloading dataset"
#    wget http://sf3-ttcdn-tos.pstatp.com/obj/nlp-opensource/lightseq/wmt_data/databin_wmt14_en_de.tar.gz -P /tmp
#    tar -zxvf /tmp/databin_wmt14_en_de.tar.gz -C /tmp && rm /tmp/databin_wmt14_en_de.tar.gz
#fi

lightseq-train wmt14_en_de/ \
    --task translation \
    --arch ls_transformer_wmt_en_de_big \
    --optimizer ls_adam --adam-betas '(0.9, 0.98)' \
    --clip-norm 0.0 \
    --lr 5e-4 --lr-scheduler inverse_sqrt --warmup-updates 4000 --weight-decay 0.0001 \
    --criterion ls_label_smoothed_cross_entropy --label-smoothing 0.1 \
    --max-tokens 8192 \
    --eval-bleu \
    --eval-bleu-args '{"beam": 5, "max_len_a": 1.2, "max_len_b": 10}' \
    --eval-bleu-detok moses \
    --eval-bleu-remove-bpe \
    --eval-bleu-print-samples \
    --best-checkpoint-metric bleu \
    --maximize-best-checkpoint-metric \

The only change i made is downloaded the dataset manually and have changes the path in the command as well and have also changes the architecture.

Please let me know the mistake i have made. Thanks in advance!!!!!!!

Sep 27 '21 17:09 rkoystart

It seems like a torch extension problem, you can try this solution https://github.com/torch/torch7/issues/1190#issuecomment-498934400

Sep 28 '21 05:09 Taka152

I have 2 doubts

apart from nvcc fatal : Unsupported gpu architecture 'compute_86' it is also saying RuntimeError: Error building extension 'lightseq_layers'. Is it also because of the torch version

It seems like a torch extension problem, you can try this solution torch/torch7#1190 (comment)

The mentioned repo seems to be still in development phase and i always prefer installing pytorch from here https://pytorch.org/get-started/previous-versions/ . I would like to know what torch version or cudatoolkit version i need to install in the gpu machine based on the gpu specs i have already mentioned.

Sep 28 '21 11:09 rkoystart

Could you check your nvcc version? https://forums.developer.nvidia.com/t/nvcc-fatal-unsupported-gpu-architecture-compute-86/161424/5?u=xiongying152

Oct 09 '21 03:10 Taka152

i have encountered the same problem. Has this problem been solved now

Nov 18 '21 06:11 Andrewlesson