Jay Gu comments

Results 25 comments of


                                            Jay Gu

Tokenizer gets stuck on bad-match regex matching

Another input I found that causes the stuck in while-loop is here: String test = "7h... nh? a nhìu th?t nhìu ^^~..........................................................................." Looks like the "....." pattern is the problem?

Tokenizer gets stuck on bad-match regex matching

This is not the only bad example. Also I don't think the problem is the while loop, it might be due to some catastrophic regex matching which can take years.

Tokenizer gets stuck on bad-match regex matching

Cool. I'm going to test it on the daily.10k tonight. On Tue, Oct 23, 2012 at 1:16 AM, brendano [email protected] wrote: > I just ran it on 1489999 or so...

Tokenizer gets stuck on bad-match regex matching

It passed the test of daily.10k with 14720000 tweets. Will try on larger dataset later tomorrow. Jay On Tue, Oct 23, 2012 at 1:25 AM, Haijie Gu [email protected] wrote: >...

Unfused Multihead attention TensorRT 9.2 is 2x slower than PyTorch 2.2 on GPU A100-SXM4-40GB

``` import torch from torch.nn import Parameter, Module aten = torch.ops.aten linear = aten.linear.default reshape = aten.reshape.default permute = aten.permute.default sdp = aten.scaled_dot_product_attention.default def mha(x, Wq, Wk, Wv, Wo, heads):...

buildSerializedNetwork failure of TensorRT 10.1 on GPU A10G/3070 - `Error Code 2: Internal Error (Assertion mConfig.caskKlibMapPtr failed. )`

This issue is caused by `builder.reset()`. I have a better repro: ```python import tensorrt as trt def test(use_reset = False): builder = trt.Builder(trt.Logger()) config = builder.create_builder_config() if use_reset: builder.reset() #...

Engine build failure "cuda misaligned address" of TensorRT 10.1 when running fp16 group normalization with particular value of `num_groups`

@lix19937 Here's a repro shows that even setting np.float32 for `dummy_w`, `dummy_b`, `scale`, `bias`, it still error with ` Error Code 1: Cuda Runtime (misaligned address)`. ```python import tensorrt as...

Engine build failure "cuda misaligned address" of TensorRT 10.1 when running fp16 group normalization with particular value of `num_groups`

> Can you use torch api to export an onnx of `affine_group_norm`, then use trtexec convert ? @haijieg @lix19937 No, I want to directly control how network is built using...

[DOC]: Could you add the speed comparion with Triton/ThunderKittens?

@EricLina thank you for your interest. We don't have plan to publish any benchmarks, but we have some [samples](https://github.com/NVIDIA/cutile-python/blob/7a86253d22446a24d68083584f01267325157891/samples/AttentionFMHA.py#L410) for you to get started on running your own comparison. The...

Custom backward pass?

Hi @vgoklani, you can find a backward kernel example [here](https://github.com/NVIDIA/cutile-python/blob/main/samples/LayerNorm.py#L167).