ib validation benchmark should support mixed IB device naming schema
This is for superbench latest code.
Current superbench ib validation benchmark is designed to have consistent IB device names across the nodes. A user must specify this name or a default one is used. The following command will be created to pass to ib command (like ib_write_bw) https://github.com/microsoft/superbenchmark/blob/main/superbench/benchmarks/micro_benchmarks/ib_validation_performance.py#L310
This design has a problem: In some environments, the IB device naming are not consistent. e.g. some VM calls the IB device mlx5_0. Some VM calls it mlx5_ib0. There is no way to run ib-validation benchmark on these VMs together.
Expected:
IB validation benchmark should work if some IB device is called mlx5_0, some VM calls it mlx5_ib0 (or other name). One design : in the run config yaml, a user specifies the index of the IB device (e.g. 0,1,2). superbench figures out the actual physical device name at runtime on each VM (e.g. mlx5_0, mlx5_ib0 etc). 'ibstat -l' can list the IB device names.
Because ibstat -l cannot guarantee a deterministic order (alphabetical, alphanumeric, or pcie order) of multiple ib devices (see this thread), currently we support to specify it programmably (could be ibstat -l, ibdev2netdev, or others) instead of enumerated index.
Here's an example to use default ibstat -l order.
ib-traffic:
parameters:
ib_dev: >-
'$(ibstat -l | sed -n $((LOCAL_RANK+1))p)'