Extending LBANN Distconv Interface
The LBANN Distconv adapter for layers mandates that only the first input tensor to distconv-enabled layer can be a non-DiHydrogen tensor. We raise an error if a tensor requires a copy to a DiHydrogen tensor. The following checks are done:
https://github.com/LLNL/lbann/blob/3b0ea84e2e0b86d14f466d9abe7c60e8b026e84a/src/layers/data_type_distconv_adapter.cpp#L329
https://github.com/LLNL/lbann/blob/3b0ea84e2e0b86d14f466d9abe7c60e8b026e84a/src/layers/data_type_distconv_adapter.cpp#L646
https://github.com/LLNL/lbann/blob/3b0ea84e2e0b86d14f466d9abe7c60e8b026e84a/src/layers/data_type_distconv_adapter.cpp#L787
https://github.com/LLNL/lbann/blob/3b0ea84e2e0b86d14f466d9abe7c60e8b026e84a/src/layers/data_type_distconv_adapter.cpp#L812
https://github.com/LLNL/lbann/blob/3b0ea84e2e0b86d14f466d9abe7c60e8b026e84a/src/layers/data_type_distconv_adapter.cpp#L836
https://github.com/LLNL/lbann/blob/3b0ea84e2e0b86d14f466d9abe7c60e8b026e84a/src/layers/data_type_distconv_adapter.cpp#L861
While these worked for the original DC layers (Convolution, MSE, ReLU), mewer DC layers such as Scatter, Gather, and MatMul generally have more than one input that may need to be copied to DiHydrogen tensors, so ideally we should support the case for multiple parent tensors requiring copy. Simply removing the checks resulted in failing CI tests.
Possible workaround with Identity layer as a copy layer also has issues: #2126