ib validation benchmark should support mixed IB device naming schema

Open LiweiPeng opened this issue 3 years ago • 1 comments

This is for superbench latest code.

Current superbench ib validation benchmark is designed to have consistent IB device names across the nodes. A user must specify this name or a default one is used. The following command will be created to pass to ib command (like ib_write_bw) https://github.com/microsoft/superbenchmark/blob/main/superbench/benchmarks/micro_benchmarks/ib_validation_performance.py#L310

This design has a problem: In some environments, the IB device naming are not consistent. e.g. some VM calls the IB device mlx5_0. Some VM calls it mlx5_ib0. There is no way to run ib-validation benchmark on these VMs together.

Expected:
IB validation benchmark should work if some IB device is called mlx5_0, some VM calls it mlx5_ib0 (or other name). One design : in the run config yaml, a user specifies the index of the IB device (e.g. 0,1,2). superbench figures out the actual physical device name at runtime on each VM (e.g. mlx5_0, mlx5_ib0 etc). 'ibstat -l' can list the IB device names.

Aug 22 '22 23:08 LiweiPeng

Because ibstat -l cannot guarantee a deterministic order (alphabetical, alphanumeric, or pcie order) of multiple ib devices (see this thread), currently we support to specify it programmably (could be ibstat -l, ibdev2netdev, or others) instead of enumerated index.

Here's an example to use default ibstat -l order.

ib-traffic:
  parameters:
    ib_dev: >-
      '$(ibstat -l | sed -n $((LOCAL_RANK+1))p)'

Aug 31 '22 10:08 abuccts