nccl:send not found
Describe the Bug
When I run the pytorch converter, it shows nccl:send comm_type not supported, is there any plan to support this or this comm_type is not expected in the trace?
admin@admin: ~/llm/chakra(main)$ python3 -m chakra.et_converter.et_converter --input_type PyTorch --input_filename et_plus/profile_et_rank_0_plus.json --output_filename et_plus/profile_chakra.0.et
Traceback (most recent call last):
File "/home/admin/miniconda3/lib/python3.12/site-packages/chakra/et_converter/et_converter.py", line 89, in main
converter.convert()
File "/home/admin/miniconda3/lib/python3.12/site-packages/chakra/et_converter/pytorch2chakra_converter.py", line 169, in convert
collective_comm_type = self.get_collective_comm_type(pytorch_node.name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/admin/miniconda3/lib/python3.12/site-packages/chakra/et_converter/pytorch2chakra_converter.py", line 395, in get_collective_comm_type
raise ValueError(f"'{name}' not found in collective communication mapping. "
ValueError: 'nccl:send' not found in collective communication mapping. Please add this collective communication name to the mapping.
Supported collective communication types are listed here. Currently, Chakra does not recognize nccl:send as a collective communication type. The Chakra working group must decide whether to add SEND and RECV as new collective types. We understand that these appear in the collected traces, but currently, we do not have a working solution. You can make local changes to support SEND and RECV types on your own. If this works and makes sense, you can create a PR.
Thanks for reporting this issue.
@TaekyungHeo - we probably need to handle this as COMM_SEND_NODE right? Wdyt? This cannot be a collective operation.
@qyysjtu this issue should be fixed now - #PR112