Yi Dong

Results 9 issues of Yi Dong

# What does this PR do ? To handle very large dataset, e.g. hundreds of gigabyte to terrabyte compressed raw data, we need multiple nodes to create sharding index and...

# What does this PR do ? Implemented the fused softmax GPU kernel that can be a drop-in replacement of the `apex`'s FusedScaleMaskSoftmax`. It addresses the following issues: 1. Removed...

I followed the example at https://github.com/jupyterlab/extension-examples/tree/master/advanced/kernel-output The ipywidgets can be displayed fine in the notebook cells, but not in the `SimpliedOutputArea` from `@jupyterlab/outputarea` created manually. I checked the rendermime, it...

In `ScaledMaskedSoftmax` , currently if all the keys are masked for a certain query element, the attention is uniformly distributed among all the key elements. It means it will average...

bug

While I am working on the NeMo Megatron T5 model. I noticed there is a tensor calculation error. I narrowed it down to the `scaled_masked_softmax` method and make a reproducer...

I filed the same issue [#2926](https://github.com/jupyter-widgets/ipywidgets/issues/2926) at the ipywidget repo. No response yet. I followed the example at https://github.com/jupyterlab/extension-examples/tree/master/advanced/kernel-output The ipywidgets can be displayed fine in the notebook cells, but...

I found there is a mismatch between fused softmax kernel calculation vs pytorch softmax calculation. I use the pytorch:22.07 container or the latest apex master branch. Here is how to...

bug

Currently the inference requires the query tensor has length 1. However, there are use cases that the query tensor length > 1. Note, this fix requires 1. TE to use...

stale

**Is your feature request related to a problem? Please describe.** EWM is a very popular method used in time series analysis, especially in the domain of FSI. cuIndicator is using...

feature request
libcudf
Python