Artem Kroviakov
Artem Kroviakov
**Describe the bug** Element-wise multiplication of SparseTensor is not consistent with documentation. Documentation states: _If you add two sparse tensors, this will add two features. In case where there is...
This PR introduces an asynchronous (batched) data fetching for L0 GPUs. Its purpose is to reduce end-to-end execution time of a workload. ____________ ### Why? We have recursive materializations (from...
This PR introduces an index structure for free buffers of a slab, this allows keep data fetching time pretty much constant. Example: 1000 fragments, 15 columns, (we observe GPU as...
As of now, HDK's heterogeneity looks like this: - We have X fragments, when we schedule them on a GPU, it will receive X kernels, X fragments and execute kernels...
The code in this branch is supposed to link into separately built shared library and currently represents the HDK side of talking to a shared library. The current (ugly) workflow...