nvcc Unsupported gpu architecture error
Hi, i have installed , lightseq, fairseq, sacremoses using the following command
pip install lightseq==2.0.2
pip install fairseq
pip install sacremoses
conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=11.0 -c pytorch
After all these installation when i run
sh examples/training/fairseq/ls_fairseq_wmt14en2de.sh
I am getting the following error.
+ THIS_DIR=/data/rkoy/lightseq/lightseq/examples/training/fairseq [271/1821]
+ cd /data/rkoy/lightseq/lightseq/examples/training/fairseq/../../..
+ lightseq-train wmt14_en_de/ --task translation --arch ls_transformer_wmt_en_de_big --optimizer ls_adam --adam-betas (0.9, 0.98) --clip-norm 0.0 --lr 5e-4 --lr-scheduler inverse_sqrt --warmup-updates 40
00 --weight-decay 0.0001 --criterion ls_label_smoothed_cross_entropy --label-smoothing 0.1 --max-tokens 8192 --eval-bleu --eval-bleu-args {"beam": 5, "max_len_a": 1.2, "max_len_b": 10} --eval-bleu-detok
moses --eval-bleu-remove-bpe --eval-bleu-print-samples --best-checkpoint-metric bleu --maximize-best-checkpoint-metric
2021-09-27 22:26:20 | INFO | fairseq.distributed_utils | distributed init (rank 0): tcp://localhost:11236
2021-09-27 22:26:20 | INFO | fairseq.distributed_utils | distributed init (rank 5): tcp://localhost:11236
2021-09-27 22:26:20 | INFO | fairseq.distributed_utils | distributed init (rank 7): tcp://localhost:11236
2021-09-27 22:26:21 | INFO | fairseq.distributed_utils | distributed init (rank 4): tcp://localhost:11236
2021-09-27 22:26:23 | INFO | fairseq.distributed_utils | distributed init (rank 2): tcp://localhost:11236
2021-09-27 22:26:24 | INFO | fairseq.distributed_utils | distributed init (rank 3): tcp://localhost:11236
2021-09-27 22:26:24 | INFO | fairseq.distributed_utils | distributed init (rank 6): tcp://localhost:11236
2021-09-27 22:26:24 | INFO | fairseq.distributed_utils | distributed init (rank 1): tcp://localhost:11236
2021-09-27 22:26:29 | INFO | fairseq.distributed_utils | initialized host rkoy-gpu-machine as rank 6
2021-09-27 22:26:29 | INFO | fairseq.distributed_utils | initialized host rkoy-gpu-machine as rank 7
2021-09-27 22:26:29 | INFO | fairseq.distributed_utils | initialized host rkoy-gpu-machine as rank 0
2021-09-27 22:26:29 | INFO | fairseq.distributed_utils | initialized host rkoy-gpu-machine as rank 2
2021-09-27 22:26:29 | INFO | fairseq.distributed_utils | initialized host rkoy-gpu-machine as rank 1
2021-09-27 22:26:29 | INFO | fairseq.distributed_utils | initialized host rkoy-gpu-machine as rank 3
2021-09-27 22:26:29 | INFO | fairseq.distributed_utils | initialized host rkoy-gpu-machine as rank 5
2021-09-27 22:26:29 | INFO | fairseq.distributed_utils | initialized host rkoy-gpu-machine as rank 4
2021-09-27 22:26:29 | INFO | fairseq_cli.train | Namespace(activation_dropout=0.0, activation_fn='relu', adam_betas='(0.9, 0.98)', adam_eps=1e-08, adaptive_input=False, adaptive_softmax_cutoff=None, adap
tive_softmax_dropout=0, all_gather_list_size=16384, arch='ls_transformer_wmt_en_de_big', attention_dropout=0.1, batch_size=None, batch_size_valid=None, best_checkpoint_metric='bleu', bf16=False, bpe=None
, broadcast_buffers=False, bucket_cap_mb=25, checkpoint_activations=False, checkpoint_shard_count=1, checkpoint_suffix='', clip_norm=0.0, cpu=False, criterion='ls_label_smoothed_cross_entropy', cross_sel
f_attention=False, curriculum=0, data='wmt14_en_de/', data_buffer_size=10, dataset_impl=None, ddp_backend='c10d', decoder_attention_heads=16, decoder_embed_dim=1024, decoder_embed_path=None, decoder_ffn_
embed_dim=4096, decoder_input_dim=1024, decoder_layerdrop=0, decoder_layers=6, decoder_layers_to_keep=None, decoder_learned_pos=False, decoder_normalize_before=False, decoder_output_dim=1024, device_id=0
, disable_validation=False, distributed_backend='nccl', distributed_init_method='tcp://localhost:11236', distributed_no_spawn=False, distributed_num_procs=8, distributed_port=-1, distributed_rank=0, dist
ributed_world_size=8, distributed_wrapper='DDP', dropout=0.3, empty_cache_freq=0, encoder_attention_heads=16, encoder_embed_dim=1024, encoder_embed_path=None, encoder_ffn_embed_dim=4096, encoder_layerdro
p=0, encoder_layers=6, encoder_layers_to_keep=None, encoder_learned_pos=False, encoder_normalize_before=False, eval_bleu=True, eval_bleu_args='{"beam": 5, "max_len_a": 1.2, "max_len_b": 10}', eval_bleu_d
etok='moses', eval_bleu_detok_args=None, eval_bleu_print_samples=True, eval_bleu_remove_bpe='@@ ', eval_tokenized_bleu=False, fast_stat_sync=False, find_unused_parameters=False, finetune_from_model=None,
fix_batches_to_gpus=False, fixed_validation_seed=None, fp16=False, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=None, gen_subset='test', ignore_prefix_si
ze=0, keep_best_checkpoints=-1, keep_interval_updates=-1, keep_last_epochs=-1, label_smoothing=0.1, layernorm_embedding=False, left_pad_source='True', left_pad_target='False', load_alignments=False, loca
lsgd_frequency=3, log_format=None, log_interval=100, lr=[0.0005], lr_scheduler='inverse_sqrt', max_epoch=0, max_source_positions=1024, max_target_positions=1024, max_tokens=8192, max_tokens_valid=8192, m
ax_update=0, maximize_best_checkpoint_metric=True, memory_efficient_bf16=False, memory_efficient_fp16=False, min_loss_scale=0.0001, min_lr=-1.0, min_params_to_wrap=100000000, model_parallel_size=1, no_cr
oss_attention=False, no_epoch_checkpoints=False, no_last_checkpoints=False, no_progress_bar=False, no_save=False, no_save_optimizer_state=False, no_scale_embedding=False, no_seed_provided=False, no_token
_positional_embeddings=False, nprocs_per_node=8, num_batch_buckets=0, num_shards=1, num_workers=1, offload_activations=False, optimizer='ls_adam', optimizer_overrides='{}', patience=-1, pipeline_balance=
None, pipeline_checkpoint='never', pipeline_chunks=0, pipeline_decoder_balance=None, pipeline_decoder_devices=None, pipeline_devices=None, pipeline_encoder_balance=None, pipeline_encoder_devices=None, pi
peline_model_parallel=False, profile=False, quant_noise_pq=0, quant_noise_pq_block_size=8, quant_noise_scalar=0, quantization_config_path=None, report_accuracy=False, required_batch_size_multiple=8, requ
ired_seq_len_multiple=1, reset_dataloader=False, reset_lr_scheduler=False, reset_meters=False, reset_optimizer=False, restore_file='checkpoint_last.pt', save_dir='checkpoints', save_interval=1, save_inte
rval_updates=0, scoring='bleu', seed=1, sentence_avg=False, shard_id=0, share_all_embeddings=False, share_decoder_input_output_embed=False, skip_invalid_size_inputs_valid_test=False, slowmo_algorithm='Lo
calSGD', slowmo_momentum=None, source_lang=None, stop_time_hours=0, target_lang=None, task='translation', tensorboard_logdir=None, threshold_loss_scale=None, tie_adaptive_weights=False, tokenizer=None, t
pu=False, train_subset='train', truncate_source=False, update_freq=[1], upsample_primary=1, use_bmuf=False, use_old_adam=False, user_dir='/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packag
es/examples/training/fairseq/fs_modules', valid_subset='valid', validate_after_updates=0, validate_interval=1, validate_interval_updates=0, warmup_init_lr=-1, warmup_updates=4000, weight_decay=0.0001, ze
ro_sharding='none')
2021-09-27 22:26:29 | INFO | fairseq.tasks.translation | [en] dictionary: 40480 types
2021-09-27 22:26:29 | INFO | fairseq.tasks.translation | [de] dictionary: 42720 types
2021-09-27 22:26:29 | INFO | fairseq.data.data_utils | loaded 39414 examples from: wmt14_en_de/valid.en-de.en
2021-09-27 22:26:29 | INFO | fairseq.data.data_utils | loaded 39414 examples from: wmt14_en_de/valid.en-de.de
2021-09-27 22:26:29 | INFO | fairseq.tasks.translation | wmt14_en_de/ valid en-de 39414 examples
Using /home/rkoy/.cache/torch_extensions as PyTorch extensions root...
Using /home/rkoy/.cache/torch_extensions as PyTorch extensions root...
Using /home/rkoy/.cache/torch_extensions as PyTorch extensions root...
Using /home/rkoy/.cache/torch_extensions as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/rkoy/.cache/torch_extensions/lightseq_layers/build.ninja...
Building extension module lightseq_layers...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
Using /home/rkoy/.cache/torch_extensions as PyTorch extensions root...
[1/15] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi101
1\" -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/c
src/ops/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/torc
h/csrc/api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include
/THC -isystem /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO
_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__C
UDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels
/cublas_wrappers.cu -o cublas_wrappers.cuda.o
FAILED: cublas_wrappers.cuda.o
/usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/
data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/ops
/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/torch/csrc/
api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/THC -i
system /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_
OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_
HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/cublas
_wrappers.cu -o cublas_wrappers.cuda.o
nvcc fatal : Unsupported gpu architecture 'compute_86'
[2/15] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi101
1\" -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/c
src/ops/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/torc
h/csrc/api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include
/THC -isystem /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO
_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__C
UDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels
/transform_kernels.cu -o transform_kernels.cuda.o
FAILED: transform_kernels.cuda.o
/usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/
data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/ops
/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/torch/csrc/
api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/THC -i
system /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_
OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_
HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/transf
orm_kernels.cu -o transform_kernels.cuda.o
nvcc fatal : Unsupported gpu architecture 'compute_86'
[3/15] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi101
1\" -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/c
src/ops/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/torc
h/csrc/api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include
/THC -isystem /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO
_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__C
UDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels
/dropout_kernels.cu -o dropout_kernels.cuda.o
FAILED: dropout_kernels.cuda.o [169/1821]
/usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/
data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/ops
/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/torch/csrc/
api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/THC -i
system /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_
OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_
HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/dropou
t_kernels.cu -o dropout_kernels.cuda.o
nvcc fatal : Unsupported gpu architecture 'compute_86'
[4/15] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi101
1\" -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/c
src/ops/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/torc
h/csrc/api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include
/THC -isystem /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO
_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__C
UDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels
/normalize_kernels.cu -o normalize_kernels.cuda.o
FAILED: normalize_kernels.cuda.o
/usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/
data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/ops
/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/torch/csrc/
api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/THC -i
system /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_
OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_
HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/normal
ize_kernels.cu -o normalize_kernels.cuda.o
nvcc fatal : Unsupported gpu architecture 'compute_86'
[5/15] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi101
1\" -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/c
src/ops/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/torc
h/csrc/api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include
/THC -isystem /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO
_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__C
UDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels
/softmax_kernels.cu -o softmax_kernels.cuda.o
FAILED: softmax_kernels.cuda.o
/usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/
data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/ops
/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/torch/csrc/
api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/THC -i
system /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_
OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_
HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/softma
x_kernels.cu -o softmax_kernels.cuda.o
nvcc fatal : Unsupported gpu architecture 'compute_86'
[6/15] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"[123/1821]
1\" -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/c
src/ops/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/torc
h/csrc/api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include
/THC -isystem /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO
_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__C
UDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels
/cuda_util.cu -o cuda_util.cuda.o
FAILED: cuda_util.cuda.o
/usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/
data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/ops
/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/torch/csrc/
api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/THC -i
system /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_
OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_
HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/cuda_u
til.cu -o cuda_util.cuda.o
nvcc fatal : Unsupported gpu architecture 'compute_86'
[7/15] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi101
1\" -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/c
src/ops/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/torc
h/csrc/api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include
/THC -isystem /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO
_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__C
UDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels
/general_kernels.cu -o general_kernels.cuda.o
FAILED: general_kernels.cuda.o
/usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/
data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/ops
/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/torch/csrc/
api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/THC -i
system /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_
OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_
HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/genera
l_kernels.cu -o general_kernels.cuda.o
nvcc fatal : Unsupported gpu architecture 'compute_86'
[8/15] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi101
1\" -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/c
src/ops/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/torc
h/csrc/api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include
/THC -isystem /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO
_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__C
UDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels
/embedding_kernels.cu -o embedding_kernels.cuda.o
FAILED: embedding_kernels.cuda.o
/usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/
data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/ops
/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/torch/csrc/
api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/THC -i
system /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_
OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_
HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/embedd
ing_kernels.cu -o embedding_kernels.cuda.o
nvcc fatal : Unsupported gpu architecture 'compute_86' [70/1821]
[9/15] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi101
1\" -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/c
src/ops/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/torc
h/csrc/api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include
/THC -isystem /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO
_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__C
UDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels
/cross_entropy.cu -o cross_entropy.cuda.o
FAILED: cross_entropy.cuda.o
/usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/
data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/ops
/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/torch/csrc/
api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/THC -i
system /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_
OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_
HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/cross_
entropy.cu -o cross_entropy.cuda.o
nvcc fatal : Unsupported gpu architecture 'compute_86'
Using /home/rkoy/.cache/torch_extensions as PyTorch extensions root...
Using /home/rkoy/.cache/torch_extensions as PyTorch extensions root...
Using /home/rkoy/.cache/torch_extensions as PyTorch extensions root...
[10/15] c++ -MMD -MF cross_entropy_layer.o.d -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI
=\"_cxxabi1011\" -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lights
eq/training/csrc/ops/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch
/include/torch/csrc/api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/
torch/include/THC -isystem /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -std=c++14 -g -Wno-reorder -c /data/r
avi-9151/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/ops/cross_entropy_layer.cpp -o cross_entropy_layer.o
[11/15] c++ -MMD -MF transformer_embedding_layer.o.d -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_B
UILD_ABI=\"_cxxabi1011\" -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-package
s/lightseq/training/csrc/ops/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packag
es/torch/include/torch/csrc/api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-p
ackages/torch/include/THC -isystem /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -std=c++14 -g -Wno-reorder -c
/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/ops/transformer_embedding_layer.cpp -o transformer_embedding_layer.o
[12/15] c++ -MMD -MF transformer_encoder_layer.o.d -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUI
LD_ABI=\"_cxxabi1011\" -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/
lightseq/training/csrc/ops/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages
/torch/include/torch/csrc/api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-pac
kages/torch/include/THC -isystem /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -std=c++14 -g -Wno-reorder -c /
data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/ops/transformer_encoder_layer.cpp -o transformer_encoder_layer.o
[13/15] c++ -MMD -MF transformer_decoder_layer.o.d -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUI
LD_ABI=\"_cxxabi1011\" -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/
lightseq/training/csrc/ops/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages
/torch/include/torch/csrc/api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-pac
kages/torch/include/THC -isystem /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -std=c++14 -g -Wno-reorder -c /
data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/ops/transformer_decoder_layer.cpp -o transformer_decoder_layer.o
[14/15] c++ -MMD -MF pybind_op.o.d -DTORCH_EXTENSION_NAME=lightseq_layers -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi
1011\" -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/kernels/includes -I/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/trainin
g/csrc/ops/includes -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/t
orch/csrc/api/include -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/include/TH -isystem /data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/incl
ude/THC -isystem /usr/local/cuda/include -isystem /data/rkoy/anaconda3/envs/lightseq/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -std=c++14 -g -Wno-reorder -c /data/rkoy/a
naconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/csrc/torch/pybind_op.cpp -o pybind_op.o
ninja: build stopped: subcommand failed.
Loading extension module lightseq_layers... [17/1821]
Loading extension module lightseq_layers...
Loading extension module lightseq_layers...
Loading extension module lightseq_layers...
Loading extension module lightseq_layers...
Loading extension module lightseq_layers...
Loading extension module lightseq_layers...
Traceback (most recent call last):
File "/data/rkoy/anaconda3/envs/lightseq/bin/lightseq-train", line 8, in <module>
sys.exit(ls_cli_main())
File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/examples/training/fairseq/lightseq_fairseq_train_cli.py", line 10, in ls_cli_main
cli_main(*args, **kwargs)
File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/fairseq_cli/train.py", line 352, in cli_main
distributed_utils.call_main(args, main)
File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/fairseq/distributed_utils.py", line 283, in call_main
torch.multiprocessing.spawn(
File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 199, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 157, in start_processes
while not context.join():
File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 118, in join
raise Exception(msg)
Exception:
-- Process 1 terminated with the following error:
Traceback (most recent call last):
File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1533, in _run_ninja_build
subprocess.run(
File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
fn(i, *args)
File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/fairseq/distributed_utils.py", line 270, in distributed_main
main(args, **kwargs)
File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/fairseq_cli/train.py", line 68, in main
model = task.build_model(args)
File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/fairseq/tasks/translation.py", line 327, in build_model
model = super().build_model(args)
File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/fairseq/tasks/fairseq_task.py", line 547, in build_model
model = models.build_model(args, self)
File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/fairseq/models/__init__.py", line 58, in build_model
return ARCH_MODEL_REGISTRY[model_cfg.arch].build_model(model_cfg, task)
File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/examples/training/fairseq/fs_modules/ls_transformer.py", line 136, in build_model
encoder_embed_tokens = cls.build_embedding(
File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/examples/training/fairseq/fs_modules/ls_transformer.py", line 159, in build_embedding
emb = LSTransformerEmbeddingLayer(config)
File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/ops/pytorch/transformer_embedding_layer.py", line 96, in __init__
transformer_cuda_module = TransformerBuilder().load()
File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/ops/pytorch/builder/builder.py", line 203, in load
return self.jit_load(verbose)
File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/lightseq/training/ops/pytorch/builder/builder.py", line 231, in jit_load
op_module = load(
File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 986, in load
return _jit_compile(
File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1193, in _jit_compile
_write_ninja_file_and_build_library(
File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1297, in _write_ninja_file_and_build_library
_run_ninja_build(
File "/data/rkoy/anaconda3/envs/lightseq/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1555, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'lightseq_layers'
And these are the details of my gpu
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.91.03 Driver Version: 460.91.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 RTX A6000 Off | 00000000:01:00.0 Off | Off |
| 30% 53C P2 71W / 300W | 363MiB / 48685MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 RTX A6000 Off | 00000000:25:00.0 Off | Off |
| 30% 36C P8 26W / 300W | 2MiB / 48685MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
This is the content of sh examples/training/fairseq/ls_fairseq_wmt14en2de.sh
#!/usr/bin/env bash
set -ex
THIS_DIR=$(dirname $(readlink -f $0))
cd $THIS_DIR/../../..
#if [ ! -d "/tmp/wmt14_en_de" ]; then
# echo "Downloading dataset"
# wget http://sf3-ttcdn-tos.pstatp.com/obj/nlp-opensource/lightseq/wmt_data/databin_wmt14_en_de.tar.gz -P /tmp
# tar -zxvf /tmp/databin_wmt14_en_de.tar.gz -C /tmp && rm /tmp/databin_wmt14_en_de.tar.gz
#fi
lightseq-train wmt14_en_de/ \
--task translation \
--arch ls_transformer_wmt_en_de_big \
--optimizer ls_adam --adam-betas '(0.9, 0.98)' \
--clip-norm 0.0 \
--lr 5e-4 --lr-scheduler inverse_sqrt --warmup-updates 4000 --weight-decay 0.0001 \
--criterion ls_label_smoothed_cross_entropy --label-smoothing 0.1 \
--max-tokens 8192 \
--eval-bleu \
--eval-bleu-args '{"beam": 5, "max_len_a": 1.2, "max_len_b": 10}' \
--eval-bleu-detok moses \
--eval-bleu-remove-bpe \
--eval-bleu-print-samples \
--best-checkpoint-metric bleu \
--maximize-best-checkpoint-metric \
The only change i made is downloaded the dataset manually and have changes the path in the command as well and have also changes the architecture.
Please let me know the mistake i have made. Thanks in advance!!!!!!!
It seems like a torch extension problem, you can try this solution https://github.com/torch/torch7/issues/1190#issuecomment-498934400
I have 2 doubts
-
apart from
nvcc fatal : Unsupported gpu architecture 'compute_86'it is also sayingRuntimeError: Error building extension 'lightseq_layers'. Is it also because of the torch version
It seems like a torch extension problem, you can try this solution torch/torch7#1190 (comment)
The mentioned repo seems to be still in development phase and i always prefer installing pytorch from here https://pytorch.org/get-started/previous-versions/ . I would like to know what torch version or cudatoolkit version i need to install in the gpu machine based on the gpu specs i have already mentioned.
Could you check your nvcc version? https://forums.developer.nvidia.com/t/nvcc-fatal-unsupported-gpu-architecture-compute-86/161424/5?u=xiongying152
i have encountered the same problem. Has this problem been solved now