Pearl Li issues

Results 5 issues of


                                            Pearl Li

Add FlashAttention Kernel in Triton

**Tldr: Add implementations of [FlashAttention](https://arxiv.org/abs/2205.14135) using OpenAI's triton language.** **Background**: - FlashAttention: an IO-aware exact attention algorithm that uses tiling to reduce the number of memory reads/writes, 15% end-to-end speedup...

CLA Signed

Pearlli streamline model chat unit test

**Patch description** 1. removed all logic inside [._check_final_chat_data()](https://github.com/facebookresearch/ParlAI/blob/989e29ff8d7a9606e2bbc7db7290b58fe9b49017/parlai/crowdsourcing/tasks/model_chat/utils.py#L398) and [._check_output_key()](https://github.com/facebookresearch/ParlAI/blob/989e29ff8d7a9606e2bbc7db7290b58fe9b49017/parlai/crowdsourcing/tasks/model_chat/utils.py#L385) in the class `AbstractModelChatTest` into [._remove_non_deterministic_keys()](https://github.com/facebookresearch/ParlAI/blob/989e29ff8d7a9606e2bbc7db7290b58fe9b49017/parlai/crowdsourcing/tasks/model_chat/utils.py#L338). 2. changed 4 unit tests `test_model_chat.py`, `test_model_image_chat.py`, `test_demo_chat.py`, and `test_qa_data_collection.py` to use pytest regressions,...

CLA Signed

Pearl Li

Add FlashAttention Kernel in Triton

Pearlli streamline model chat unit test

Tutorial for Adding Cuda C++ Kernel to ParlAI with Examples

Print values inside fused attention kernel

Making fused attention work with GPUs other than A100