valentin petrov

Results 24 issues of valentin petrov

## What Unifies pipelining parameters. Adds ucc_pipeline_params_t and the interface for user to set them + cfg var parser. ## Why ? Each time we add another pipelined alg we...

Ready-for-Review

## What Properly handle potential failures that happen during TL context_create_epilog call. ## Why ? Current behavior: if context_create_epilog fails -> ucc context creation fails -> job fails. Expected behavior:...

Ready-for-Review

## What Adds new TL/MLX5: minimal necessary tl iface stubs w/o much actual implementation (added in next PRs). Adds option to provide negate sign "^" to the --with-tls. Default list...

Ready-for-Review

## What Potential Alternative for #596 . This PR implements ALL the reductinos (dt/ops) in the ec/cuda executor for persistent mode. It is done by making "device template" functions (common...

Ready-for-Review

## What Adds pipelining support for RAB allreduce algorithm in CL/HIER ## Why ? Potential perf improvement. E.g. we can use TL/SHM for larger msg sizes with pipelining (when single...

Ready-for-Review

## What Fixes config file parsing with respect to inherited variables ## Why ? When the variable in the config table is inherited from another parent table (e.g., TLS var...

WIP - Don't Merge

## What Implements Shared PD initialization in TL/MLX5

## What Custom IB WQEs implementation: transpose, wait_on_data, umr, rdma

Ready-for-Review