Bearnardd
Bearnardd
# What does this PR do? Fixes the bug mentioned in the [issue](https://github.com/huggingface/transformers/issues/17355) by transiting from `np.random` to the `jax.random`. It also adds several minor changes to be able to...
Fixes # (issue) https://github.com/huggingface/transformers/issues/23055 # What does this PR do? Add control over usage of random attention of `BigBird` based on current mode (training/eval) ## Who can review? @sanchit-gandhi @ydshieh
### Reproduction `Pytorch->Flax` and `Flax->Pytorch` equivalence tests were failing. At the moment they are skipped by https://github.com/huggingface/transformers/pull/23040 ### Expected behavior During working on https://github.com/huggingface/transformers/pull/21023 I have found out that there...
Hi! In the original paper implementation they are using dims `[1:]` : `x = x_padded[1:].view_as(x)` [their code](https://github.com/kimiyoung/transformer-xl/blob/master/pytorch/mem_transformer.py#L201) but in your implementation you are using `[:-1]`: `x = x_padded[:-1].view_as(x)` [your code](https://github.com/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/transformers/xl/relative_mha.py#LL38C5-L38C33)...
Fixes https://github.com/hwchase17/langchain/issues/7384 * add default relevance function to calculate `_similarity_search_with_relevance_scores` @rlancemartin, @eyurtsev
Hi there! I've recently delved into the codebase, and I've observed that there is room for style improvements. Currently, there is a considerable inconsistency in style both within individual files...