chakra icon indicating copy to clipboard operation
chakra copied to clipboard

Utilize ctrl_deps for operator dependencies in simulation

Open TaekyungHeo opened this issue 1 year ago • 1 comments

Summary

Previously, the data_deps field was utilized to encode operator dependencies in simulations. However, data_deps should actually be reserved for encoding data dependencies, not for simulating operator dependencies. Therefore, this commit updates the code to ensure that pytorch2chakra_converter.py employs ctrl_deps for this purpose.

Test Plan

$ cd ~/param
$ cd param/train/comms/pt
$ pip install .
$ cd ../../compute/python
$ pip install -r requirements.txt
$ python setup.py install
$ python tools/trace_link.py --pytorch-et-file ~/llama_pytorch_et/llama_et_0.json --kineto-file ~/llama_kineto/worker0_step_12.1697596714999.pt.trace.json --output-file ~/rank0.json &
$ python tools/trace_link.py --pytorch-et-file ~/llama_pytorch_et/llama_et_1.json --kineto-file ~/llama_kineto/worker1_step_12.1697596715001.pt.trace.json --output-file ~/rank1.json &
$ python tools/trace_link.py --pytorch-et-file ~/llama_pytorch_et/llama_et_2.json --kineto-file ~/llama_kineto/worker2_step_12.1697596714848.pt.trace.json --output-file ~/rank2.json &
$ python tools/trace_link.py --pytorch-et-file ~/llama_pytorch_et/llama_et_3.json --kineto-file ~/llama_kineto/worker3_step_12.1697596714880.pt.trace.json --output-file ~/rank3.json &
$ python tools/trace_link.py --pytorch-et-file ~/llama_pytorch_et/llama_et_4.json --kineto-file ~/llama_kineto/worker4_step_12.1697596714944.pt.trace.json --output-file ~/rank4.json &
$ python tools/trace_link.py --pytorch-et-file ~/llama_pytorch_et/llama_et_5.json --kineto-file ~/llama_kineto/worker5_step_12.1697596714871.pt.trace.json --output-file ~/rank5.json &
$ python tools/trace_link.py --pytorch-et-file ~/llama_pytorch_et/llama_et_6.json --kineto-file ~/llama_kineto/worker6_step_12.1697596714614.pt.trace.json --output-file ~/rank6.json &
$ python tools/trace_link.py --pytorch-et-file ~/llama_pytorch_et/llama_et_7.json --kineto-file ~/llama_kineto/worker7_step_12.1697596714853.pt.trace.json --output-file ~/rank7.json &

$ cd ~/charka
$ pip install .
$ python3 -m chakra.et_converter.et_converter --input_type PyTorch --input_filename ~/rank0.json --output_filename ~/rank.0.et --num_dims 1
$ python3 -m chakra.et_converter.et_converter --input_type PyTorch --input_filename ~/rank1.json --output_filename ~/rank.1.et --num_dims 1
$ python3 -m chakra.et_converter.et_converter --input_type PyTorch --input_filename ~/rank2.json --output_filename ~/rank.2.et --num_dims 1
$ python3 -m chakra.et_converter.et_converter --input_type PyTorch --input_filename ~/rank3.json --output_filename ~/rank.3.et --num_dims 1
$ python3 -m chakra.et_converter.et_converter --input_type PyTorch --input_filename ~/rank4.json --output_filename ~/rank.4.et --num_dims 1
$ python3 -m chakra.et_converter.et_converter --input_type PyTorch --input_filename ~/rank5.json --output_filename ~/rank.5.et --num_dims 1
$ python3 -m chakra.et_converter.et_converter --input_type PyTorch --input_filename ~/rank6.json --output_filename ~/rank.6.et --num_dims 1
$ python3 -m chakra.et_converter.et_converter --input_type PyTorch --input_filename ~/rank7.json --output_filename ~/rank.7.et --num_dims 1

$ cd ~/astra-sim
$ ./build/astra_analytical/build.sh
$ ./build/astra_analytical/build/bin/AstraSim_Analytical_Congestion_Unaware --workload-configuration=/Users/
theo/rank --system-configuration=./inputs/system/Switch.json \
  --network-configuration=./inputs/network/analytical/Switch.yml \                                                                  
  --remote-memory-configuration=./inputs/remote_memory/analytical/no_memory_expansion.json
ring of node 0, id: 0 dimension: local total nodes in ring: 8 index in ring: 0 offset: 1total nodes in ring: 8
ring of node 0, id: 0 dimension: local total nodes in ring: 8 index in ring: 0 offset: 1total nodes in ring: 8
ring of node 0, id: 0 dimension: local total nodes in ring: 8 index in ring: 0 offset: 1total nodes in ring: 8
ring of node 0, id: 0 dimension: local total nodes in ring: 8 index in ring: 0 offset: 1total nodes in ring: 8
sys[2] finished, 7213509000 cycles                                
sys[6] finished, 7226613000 cycles                                
sys[0] finished, 7269182000 cycles                                
sys[4] finished, 7276689000 cycles                                
sys[1] finished, 7340042000 cycles                                
sys[7] finished, 7367494000 cycles                                
sys[5] finished, 7374663000 cycles                                
sys[3] finished, 7375565000 cycles

TaekyungHeo avatar Feb 22 '24 21:02 TaekyungHeo

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

github-actions[bot] avatar Feb 22 '24 21:02 github-actions[bot]