Results 13 issues of Void Main

Is there any possible way to show users all the available achievements registered by developer, so that the user kind of has a roadmap to work on?

Add the Chrome T-Rex rush game to PLE. It should be a fun game for learning & playing. :smile: Here's a random agent playing the game: ![t-rex-with-random-agent](https://user-images.githubusercontent.com/552990/81060770-b86d4a00-8f05-11ea-8087-a3bca372a3df.gif) ## Game Spec...

# Description For some models, there may be `None` output for scripted model, for example, in `torchvision.inception_v3`, the second output is a None constant. Current implementation throws error as reported...

component: conversion
WIP
cla signed

## ❓ Question I'm trying to optimize hugging face's BERT Base uncased model using Torch-TensorRT, the code works after disabling full compilation (`require_full_compilation=False`), and the avg latency is ~10ms on...

question
performance

貌似回复的时候没有@用户的提示?

I'm using the sample code from [tutorial 6](https://triton-lang.org/master/getting-started/tutorials/06-fused-attention.html) and measure the performance on A100, here's the bwd latency graph (ran twice, the results look similar): ![CleanShot 2022-11-28 at 10 58...

help wanted

Hi guys, I wonder why it takes 119 GFlops for DiT-XL/2 to generate 256x256 images. According to my calculation, it should be over 228 GFlops, can anyone please kindly point...

Implement LlaMa as requested in issue #506 . ## Steps to use first convert llama-7b-hf weights from huggingface with `huggingface_llama_convert.py`: `python3 huggingface_llama_convert.py -saved_dir=/path/to/export/folder/ -in_file=/path/to/llama-7b-hf -infer_gpu_num=1 -weight_data_type=fp16 -model_name=llama_7b` next, compile and...

Hi community, I'm building a triton kernel which first loads some discontinuous indexes from one tensor, and loads actual data with these indexes from another tensor. I'm trying to implement...

Hey team, I'm suffering high triton kernel launch overhead. Here's my nsys capture: ![CleanShot 2023-11-10 at 10 28 53](https://github.com/openai/triton/assets/552990/d62f05c8-b00c-43fc-b1c7-0680a9988706) The kernel executes around 80us on GPU, however, it takes 220us...