[Museformer] Errors about Ninja when inference
Errors occurred when inference:
2023-06-13 17:13:11 | INFO | fairseq_cli.interactive | Type the input sentence and press return:
Traceback (most recent call last):
File "/root/miniconda3/bin/fairseq-interactive", line 8, in <module>
sys.exit(cli_main())
File "/root/miniconda3/lib/python3.8/site-packages/fairseq_cli/interactive.py", line 307, in cli_main
distributed_utils.call_main(args, main)
File "/root/miniconda3/lib/python3.8/site-packages/fairseq/distributed_utils.py", line 301, in call_main
main(args, **kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/fairseq_cli/interactive.py", line 223, in main
translations = task.inference_step(
File "/root/miniconda3/lib/python3.8/site-packages/fairseq/tasks/language_modeling.py", line 313, in inference_step
return generator.generate(
File "/root/miniconda3/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/fairseq/sequence_generator.py", line 177, in generate
return self._generate(sample, **kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/fairseq/sequence_generator.py", line 312, in _generate
lprobs, avg_attn_scores = self.model.forward_decoder(
File "/root/miniconda3/lib/python3.8/site-packages/fairseq/sequence_generator.py", line 830, in forward_decoder
decoder_out = model.decoder.forward(tokens, encoder_out=encoder_out)
File "/root/museformer/museformer_decoder.py", line 413, in forward
x, extra = self.extract_features(
File "/root/museformer/museformer_decoder.py", line 582, in extract_features
reg_bar_pos_ids = construct_reg_bar_ids(reg_chunk_ranges, num_chunks, reg_len) # (bsz, reg_len)
File "/root/museformer/museformer_decoder.py", line 29, in construct_reg_bar_ids
sample_bar_ids = range_fill(sample_ranges, torch.arange(1, sample_num_chunk + 1, device=device),
File "/root/museformer/kernels/range_fill/main.py", line 60, in range_fill
return range_fill_cuda(ranges, values, seq_len, pad_value, dtype=dtype)
File "/root/museformer/kernels/range_fill/main.py", line 72, in range_fill_cuda
module = load_cuda_module()
File "/root/museformer/kernels/range_fill/main.py", line 24, in load_cuda_module
module = load(name='range_fill_cuda',
File "/root/miniconda3/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1144, in load
return _jit_compile(
File "/root/miniconda3/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1357, in _jit_compile
_write_ninja_file_and_build_library(
File "/root/miniconda3/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1438, in _write_ninja_file_and_build_library
verify_ninja_availability()
File "/root/miniconda3/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1494, in verify_ninja_availability
raise RuntimeError("Ninja is required to load C++ extensions")
RuntimeError: Ninja is required to load C++ extensions
tried to solve manually pip install ninja, but followed:
2023-06-13 17:14:38 | INFO | fairseq_cli.interactive | Type the input sentence and press return:
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1740, in _run_ninja_build
subprocess.run(
File "/root/miniconda3/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/root/miniconda3/bin/fairseq-interactive", line 8, in <module>
sys.exit(cli_main())
File "/root/miniconda3/lib/python3.8/site-packages/fairseq_cli/interactive.py", line 307, in cli_main
distributed_utils.call_main(args, main)
File "/root/miniconda3/lib/python3.8/site-packages/fairseq/distributed_utils.py", line 301, in call_main
main(args, **kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/fairseq_cli/interactive.py", line 223, in main
translations = task.inference_step(
File "/root/miniconda3/lib/python3.8/site-packages/fairseq/tasks/language_modeling.py", line 313, in inference_step
return generator.generate(
File "/root/miniconda3/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/fairseq/sequence_generator.py", line 177, in generate
return self._generate(sample, **kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/fairseq/sequence_generator.py", line 312, in _generate
lprobs, avg_attn_scores = self.model.forward_decoder(
File "/root/miniconda3/lib/python3.8/site-packages/fairseq/sequence_generator.py", line 830, in forward_decoder
decoder_out = model.decoder.forward(tokens, encoder_out=encoder_out)
File "/root/museformer/museformer_decoder.py", line 413, in forward
x, extra = self.extract_features(
File "/root/museformer/museformer_decoder.py", line 582, in extract_features
reg_bar_pos_ids = construct_reg_bar_ids(reg_chunk_ranges, num_chunks, reg_len) # (bsz, reg_len)
File "/root/museformer/museformer_decoder.py", line 29, in construct_reg_bar_ids
sample_bar_ids = range_fill(sample_ranges, torch.arange(1, sample_num_chunk + 1, device=device),
File "/root/museformer/kernels/range_fill/main.py", line 60, in range_fill
return range_fill_cuda(ranges, values, seq_len, pad_value, dtype=dtype)
File "/root/museformer/kernels/range_fill/main.py", line 72, in range_fill_cuda
module = load_cuda_module()
File "/root/museformer/kernels/range_fill/main.py", line 24, in load_cuda_module
module = load(name='range_fill_cuda',
File "/root/miniconda3/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1144, in load
return _jit_compile(
File "/root/miniconda3/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1357, in _jit_compile
_write_ninja_file_and_build_library(
File "/root/miniconda3/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1469, in _write_ninja_file_and_build_library
_run_ninja_build(
File "/root/miniconda3/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1756, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'range_fill_cuda': [1/3] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=range_fill_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /root/miniconda3/lib/python3.8/site-packages/torch/include -isystem /root/miniconda3/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /root/miniconda3/lib/python3.8/site-packages/torch/include/TH -isystem /root/miniconda3/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /root/miniconda3/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -std=c++14 -c /root/museformer/kernels/range_fill/cuda_src/range_fill.cu -o range_fill.cuda.o
FAILED: range_fill.cuda.o
/usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=range_fill_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /root/miniconda3/lib/python3.8/site-packages/torch/include -isystem /root/miniconda3/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /root/miniconda3/lib/python3.8/site-packages/torch/include/TH -isystem /root/miniconda3/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /root/miniconda3/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -std=c++14 -c /root/museformer/kernels/range_fill/cuda_src/range_fill.cu -o range_fill.cuda.o
/root/museformer/kernels/range_fill/cuda_src/range_fill.cu(62): error: identifier "THCudaCheck" is undefined
1 error detected in the compilation of "/root/museformer/kernels/range_fill/cuda_src/range_fill.cu".
[2/3] c++ -MMD -MF range_fill.o.d -DTORCH_EXTENSION_NAME=range_fill_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /root/miniconda3/lib/python3.8/site-packages/torch/include -isystem /root/miniconda3/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /root/miniconda3/lib/python3.8/site-packages/torch/include/TH -isystem /root/miniconda3/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /root/miniconda3/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O2 -c /root/museformer/kernels/range_fill/cuda_src/range_fill.cpp -o range_fill.o
ninja: build stopped: subcommand failed.
I tried again with CUDA 11.8 but still failed. Part of the packages installed:
- CUDA 11.3
- Python 3.8.10
- fairseq: 0.10.2
- tensorboardX: 2.2 The dataset has been preprocessed and metadata edited, since the data size after my track compression (into 6 tracks) is around 28,000.
I tried editing PyTorch to solve "ninja, -v" by replacing with "ninja, --version", but it followed
ImportError: /root/.cache/torch_extensions/py38_cu113/range_fill_cuda/range_fill_cuda.so: cannot open shared object file: No such file or directory
which is the same as
https://github.com/microsoft/muzic/issues/110
but I've pip installed ninja
@btyu Hi there
Compilation problem solved by using docker. But still cannot generate.
When inference, stuck at: (w/ no error reported and it takes forever)
2023-06-18 15:43:48 | INFO | fairseq_cli.interactive | Type the input sentence and press return:
So I run the evaluation, w/ log:
2023-06-18 15:03:30 | INFO | museformer.museformer_lm_task | loaded 2155 samples for test 2023-06-18 15:11:36 | INFO | test | | valid on 'test' subset | loss 3.337 | ppl 10.11 | wps 0 | wpb 2.14513e+07 | bsz 2155
My system is Ubuntu 20.04 LTS 64 with CUDA 11.4.3, cnDNN 8.2.4, GPU drive ver. 470.82.01
With docker env, I use default checkpoint and src code. I preprocess midi on my own (28251 midi) and rewrite the metadata to train (24,580 lines), test (2,155) and valid (1,516). Since I only compress the track to 6, so no various time signatures or instruments.
And when fairseq-interactive, GPU's not fully used so I guess it's not simply slow (wait for up to an hour QAQ
With respect to results, it's similar with #113. So there might be some problem in arbitrary dataset I guess...?