Ziyue Huang comments

Results 20 comments of


                                            Ziyue Huang

[Proposal] Unified Interface/Implementation for Sparse Attention

This is not a killing problem, as the backend can switch to faster kernel if the attention pattern has optimized handcrafted kernel. Block-sparse attention seems not very appealing to me,...

sliding window self-attention cell

Waiting for https://github.com/apache/incubator-mxnet/pull/19387 to be merged.

sliding window self-attention cell

benchmark script ``` import numpy as np from numpy.testing import assert_allclose import mxnet as mx from gluonnlp.attention_cell import masked_softmax, MultiHeadAttentionCell, MultiHeadSlidingWindowAttentionCell import time def test_multi_head_sliding_window_dot_attention_cell(): def gen_sliding_window_mask_full(batch_size, seq_length, w, symmetric,...

[WIP] distributed training

Just found that the API behavior in BytePS master branch changes recently...... Not sure if that is intended or bugs. Tracked here (https://github.com/bytedance/byteps/issues/292).

[WIP] distributed training

Sorry for the late reply. @szha core dump due to undefined symbols is fixed after 0820 wheel, and I didn't record the undefined symbols. For the segfault, below is the...

[SS]《1.2 Structured Streaming 之 Output Modes 解析》讨论区

请问一下，现在这个《1.2 Structured Streaming 之 Output Modes 解析》在哪里？

API changes of bps.mxnet.trainer after https://github.com/bytedance/byteps/pull/225

@ymjiang Hi, did you test the accuracy of the BERT model trained by https://github.com/byteps/examples/blob/master/mxnet/bert-large ? It seems that in this script, the NSP loss is normalized (by batch_size) on each...

API changes of bps.mxnet.trainer after https://github.com/bytedance/byteps/pull/225

Why not multiply `num_workers` back to the gradients at the end of `_allreduce_grads`? In order to let this API compute the sum, which is consistent to the previous API (and...

API changes of bps.mxnet.trainer after https://github.com/bytedance/byteps/pull/225

Let me summarize the API behavior change after/before this PR, and feel free to correct me if I make some mistakes :) - `allreduce_grads` computes the average instead of the...

API changes of bps.mxnet.trainer after https://github.com/bytedance/byteps/pull/225

https://github.com/bytedance/byteps/blob/b8948f0927/byteps/mxnet/__init__.py#L201 only takes effect in `step` or `update`. `bps.trainer.allreduce_grads` will compute the sum. `allreduce_grads` then `update` is allowed by MXNet API and heavily used in gluon-nlp.