hecmay comments

Results 85 comments of


                                            hecmay

New synthesis mode required

> > I suppose that should only work for one-read-one-write case. We can support this specific one-read-one-write case without generating any local buffer, the local buffer is generated for other...

New synthesis mode required

> > > > Agreed. the abstraction is not clear enough, and it's too much work for users to use `.to` for each of the arguments. > > > >...

New synthesis mode required

@chhzh123 Just had a discussion with Sean. I will create a `ZeroCopy` mode for `.to` primitive, so that you will be able to generate kernel function without any read/write nested...

New synthesis mode required

After I add the aforementioned features, I will create a simple primitive for compute placement. This would make our life easier: we do not have to call so many `.to`...

Enhance the graph partitioning algorithm to determine host-device bound with incomplete placement information

The basic partitioning is working. We have around 5 test cases for that. Yeah, we can add more test cases to ensure that is robust enough first.

Issue with systolic array based large matrix multiplication

Hi jessewjx, 1. The version you are using does not really work on large matrix. For 1024x1024 GEMM, It tries to implement a systolic array with 1024x1024 PEs, which is...

Issue with systolic array based large matrix multiplication

Yeah. You can parameterize the algorithm as matrix A (4096x4096) multiplied with B (4096x1).

Issue with systolic array based large matrix multiplication

Hi @jessewjx. Please see the systolic gemm here: https://github.com/Hecmay/heterocl/blob/fix/samples/systolic_array/systolic_array_module_stream.py It may take a long time for this PR to be merged. You can simply pull back and compile the `fix`...

Stack Overflow for Recursive IR Traversal

Bump this up since we are running into the issue recently. The building and simulation process can consume surprisingly long time with increasing number of compute units (CU) in the...

reuse_at failed with SegFault after adding a void wrapper stage

Here is the initial attempt of `.to()` revamp: https://github.com/Hecmay/heterocl/tree/fix