" ValueError: max() arg is an empty sequence " while converting mamba 2 hybrid checkpoint to nemo
Describe the bug
As described in the title, after finishing all of the installs and building nemo and megatron-lm from source, assuming that the model has been trained with megatron-lm.
Steps/Code to reproduce bug
[NeMo W 2024-08-16 12:43:58 nemo_logging:349] /workspace/megatron/Megatron-LM/megatron/core/tensor_parallel/layers.py:280: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
def forward(ctx, input, weight, bias, allreduce_dgrad):
[NeMo W 2024-08-16 12:43:58 nemo_logging:349] /workspace/megatron/Megatron-LM/megatron/core/tensor_parallel/layers.py:290: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
def backward(ctx, grad_output):
[NeMo W 2024-08-16 12:43:58 nemo_logging:349] /workspace/megatron/Megatron-LM/megatron/core/tensor_parallel/layers.py:380: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
def forward(
[NeMo W 2024-08-16 12:43:58 nemo_logging:349] /workspace/megatron/Megatron-LM/megatron/core/tensor_parallel/layers.py:419: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
def backward(ctx, grad_output):
[WARNING | megatron.core.dist_checkpointing.strategies.zarr]: `zarr` distributed checkpoint backend is deprecated. Please switch to PyTorch Distributed format (`torch_dist`).
[NeMo W 2024-08-16 12:43:59 nemo_logging:349] /workspace/megatron/Megatron-LM/megatron/core/dist_checkpointing/strategies/torch.py:22: DeprecationWarning: `torch.distributed._sharded_tensor` will be deprecated, use `torch.distributed._shard.sharded_tensor` instead
from torch.distributed._sharded_tensor import ShardedTensor as TorchShardedTensor
[NeMo W 2024-08-16 12:43:59 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/modelopt/torch/quantization/tensor_quant.py:84: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
scaled_e4m3_abstract = torch.library.impl_abstract("trt::quantize_fp8")(
[NeMo W 2024-08-16 12:44:02 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/mamba_ssm/ops/selective_scan_interface.py:164: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
def forward(ctx, xz, conv1d_weight, conv1d_bias, x_proj_weight, delta_proj_weight,
[NeMo W 2024-08-16 12:44:02 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/mamba_ssm/ops/selective_scan_interface.py:240: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
def backward(ctx, dout):
[NeMo W 2024-08-16 12:44:02 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/mamba_ssm/ops/triton/layer_norm.py:959: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
def forward(
[NeMo W 2024-08-16 12:44:02 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/mamba_ssm/ops/triton/layer_norm.py:1018: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
def backward(ctx, dout, *args):
[NeMo W 2024-08-16 12:44:02 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/mamba_ssm/distributed/tensor_parallel.py:26: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
def forward(ctx, x, weight, bias, process_group=None, sequence_parallel=True):
[NeMo W 2024-08-16 12:44:02 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/mamba_ssm/distributed/tensor_parallel.py:62: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
def backward(ctx, grad_output):
[NeMo W 2024-08-16 12:44:03 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/mamba_ssm/ops/triton/ssd_combined.py:736: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
def forward(ctx, zxbcdt, conv1d_weight, conv1d_bias, dt_bias, A, D, chunk_size, initial_states=None, seq_idx=None, dt_limit=(0.0, float("inf")), return_final_states=False, activation="silu",
[NeMo W 2024-08-16 12:44:03 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/mamba_ssm/ops/triton/ssd_combined.py:814: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
def backward(ctx, dout, *args):
Traceback (most recent call last):
File "/workspace/nemo/NeMo/scripts/checkpoint_converters/convert_mamba2_pyt_to_nemo.py", line 190, in <module>
convert(args)
File "/workspace/nemo/NeMo/scripts/checkpoint_converters/convert_mamba2_pyt_to_nemo.py", line 115, in convert
num_layers = max(layer_numbers) + 1'
Expected behavior
Expected to convert the mamba trained model to a .nemo format for fine-tuning.
Environment overview (please complete the following information)
- Environment location: ubuntu, Docker, fluidstack VM 2 * A100 80.
- Method of NeMo install: Installed from source and integrated megatron from source.
- If method of install is [Docker], provide
docker pull&docker runcommands used :
Docker pull command :
sudo docker pull nvcr.io/nvidia/pytorch:24.07-py3
Docker Run command :
docker run --gpus all -it --rm --ipc=host \
--shm-size=40g \
-v /ephemeral/megatron:/workspace/megatron \
-v /ephemeral/data:/workspace/dataset/data \
-v /ephemeral/outfix:/workspace/dataset/outfix \
-v /ephemeral/tok:/workspace/dataset/tok \
-v /ephemeral/checkpoints:/workspace/checkpoints \
-v /ephemeral/nemo:/workspace/nemo \
nvcr.io/nvidia/pytorch:24.07-py3
Environment details
If NVIDIA docker image is used you don't need to specify these. Otherwise, please provide:
- OS version : Ubuntu 22.04.3 LTS
- PyTorch version : 2.4
- Python version : 3.10.12
Additional context
Nvidia pytorch container : 24.07 (assmuming training was made with 24.03) GPUS : 2 * GPU A100 80
Followed steps here : tutorials/llm/mamba/mamba.rst
Hi @SkanderBS2024, I see you are mounting. You are not using the NeMo container nvcr.io/nvidia/nemo:24.07, and you are mounting the NeMo. I tested the conversion script in the nvcr.io/nvidia/nemo:24.07, and it works fine. However, there is an update needed for the latest main, for which I have raised a PR. https://github.com/NVIDIA/NeMo/pull/10224. You can either checkout this PR or use the 24.07 nemo container. Thanks for reporting the issue!
hello @JRD971000 , yep i worked with the nvcr.io/nvidia/nemo:24.07 container and everything worked fine thank you for your response.
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue was closed because it has been inactive for 7 days since being marked as stale.