chakra icon indicating copy to clipboard operation
chakra copied to clipboard

[ETFeeder] Resolve deps based on not only data_deps, but also ctrl_deps

Open changhai0109 opened this issue 2 years ago • 6 comments

Summary

In ETFeederNode, add fields of all_deps, which is a complete set of data_deps and ctrl_deps. Add unreleased_deps to track the parents of a node which is not issued yet. Update deps resolving reference to all_deps instead of data_deps.

Test Plan

Add following function to et_generator

def two_comp_nodes_ctrl_dependent(num_npus: int, runtime: int) -> None:
    for npu_id in range(num_npus):
        output_filename = f"two_comp_nodes_ctrl_dependent.{npu_id}.et"
        with open(output_filename, "wb") as et:
            encode_message(et, GlobalMetadata(version="0.0.4"))

            parent_node = get_node("COMP_NODE", COMP_NODE)
            parent_node.duration_micros = runtime
            parent_node.attr.append(ChakraAttr(name="is_cpu_op", bool_val=False))
            encode_message(et, parent_node)

            child_node = get_node("COMP_NODE", COMP_NODE)
            child_node.duration_micros = runtime
            child_node.ctrl_deps.append(parent_node.id)
            child_node.attr.append(ChakraAttr(name="is_cpu_op", bool_val=False))
            encode_message(et, child_node)

register this function in main as follows

def main():
   ...
  two_comp_nodes_ctrl_dependent(args.num _npus, args.default_runtime)
  ...

Run et_generator and get two_comp_nodes_ctrl_dependent.{npu_id}.et, then run with astrasim.

changhai0109 avatar Dec 12 '23 00:12 changhai0109

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

github-actions[bot] avatar Dec 12 '23 00:12 github-actions[bot]

@changhai0109, Does taking into account both control dependency and data dependency enhance the accuracy of simulations? Are there any instances where an order meets the data dependency requirements but not the control dependency requirements?

JoongunPark avatar Jan 30 '24 22:01 JoongunPark

@changhai0109, I appreciate your contribution to Chakra.

I believe we should solely rely on data dependencies, rather than control dependencies. Control dependencies refer to dependencies used in the host Chakra execution traces (PyTorch execution traces). Although the final Chakra host + device execution traces include control dependencies, these are encoded for compatibility purposes rather than for simulation. Thus, it would be preferable to avoid depending on control dependencies for simulation purposes.

TaekyungHeo avatar Feb 06 '24 15:02 TaekyungHeo

@changhai0109 kindly answer @TaekyungHeo and @JoongunPark questions. We can review this PR and see if this is required. As things stand, we do not need to encode this additional deps for simulation use cases.

srinivas212 avatar Feb 06 '24 20:02 srinivas212

@TaekyungHeo @JoongunPark

I understand for now we do not need to rely on ctrl deps for now. However, support ctrl deps might be beneficial for ppl who study scheduling problems.

So instead of supporting only data deps, how about adding a flag in ETFeeder, so that users can choose whether enable ctrl deps, then for current usages just disable ctrl deps.

changhai0109 avatar Feb 07 '24 15:02 changhai0109

@TaekyungHeo @JoongunPark

I understand for now we do not need to rely on ctrl deps for now. However, support ctrl deps might be beneficial for ppl who study scheduling problems.

So instead of supporting only data deps, how about adding a flag in ETFeeder, so that users can choose whether enable ctrl deps, then for current usages just disable ctrl deps.

Your suggestion to add a flag in ETFeeder for enabling control dependencies raises an interesting point. Have you seen any examples or scenarios where the scheduling process is significantly impacted by the choice of dependency option?

JoongunPark avatar Feb 13 '24 23:02 JoongunPark