daceml
daceml copied to clipboard
A Data-Centric Compiler for Machine Learning
I'm trying to externally add and register a custom implementation for an ONNX op. For the sake of context, the op in question is `ONNXMul`. I've tried following the code...
Implementing module replacements: * torch.Module becomes a placeholder torch fn, then a placeholder op in ONNX graph, which gets expanded to a replacement implementation. * main replacement mechanism in `daceml/onnx/nodes/replacement.py`...
Here is my problem while run the example/**plot_cuda_mish.py** why this problem occur ? Could you please help me to fix this? ``` File "/home/daceml/daceml/venv/lib/python3.8/site-packages/dace/codegen/compiler.py", line 227, in configure_and_compile _run_liveoutput("cmake --build...
Previously they lowered to mpi.BlockScatter and mpi.BlockGather from dace.
These nodes abstract mapping an array onto a process grid. They fill a similar role to BlockScatter and BlockGather in dace, but with a more array-programming focused API. This means...
This change adds the DistributedMemlet library node and the scheduling function for distributed computation. This allows you to distribute the work in the top-level map of the SDFG by specifying...
This allows us to support reductions with their intialization states. The idea is that nested SDFG are required to be schedule such that there is no communication within them. The...
I am getting the following errors when I run the example/plot_fpga_lenet.py using the current master branch: ```bash $ python examples/plot_fpga_lenet.py /u1/ruckman/anaconda3/envs/dace-ml-dev/lib/python3.9/site-packages/dace/sdfg/validation.py:321: UserWarning: WARNING: Use of uninitialized transient "onnxCOLONCOLONRelu_11" in state...
Otherwise, parallelism opportunities are missed as blockDim.x is usually small on its own.