Rishabh Agarwal
Rishabh Agarwal
Is setting `trainable = False` not necessary in the variable `batch` in the function `momentum_optimzer`? ie. `batch = tf.Variable(0,trainable=False,name=global_step)` instead of just `batch = tf.Variable(0)`.
I ran ~50000 games between the `Speedy Player` (1st player) and the `20 seconds Championship Player` (2nd player) respectively and `Speedy Player` won ~50% of the games which was quite...
I was wondering if the open-source library [`rliable`](https://github.com/google-research/rliable), corresponding to our [NeurIPS 2021 best paper](https://agarwl.github.io/rliable/), for reliably evaluating reporting performance on ML and RL benchmarks, especially when using a handful...
Right now, we only support loading data from numpy arrays. It would be nice if there was a helper function to convert a dataframe of scores to numpy arrays. Some...
Fix incorrect use of XOR operator to use correct POWER operator. In Python "2 ^ N" means "2 XOR N" whereas "2 ** N" means "2 to the power N"....
I tried replicating the results provided in the [paper](https://arxiv.org/abs/1910.05396), however, I am getting much higher performance for normal PPO (something around 44.1%) while a lower performance with Rand Conv (53.34%)....
It seems that during evaluation MC approximation = 10 is used by the paper but not implemented in the codebase. Can you provide pointers on how to do and or...
Seems like many-shot prompting seems to help on several of the existings tasks here (Big-bench hard, MATH, GSM8K, GPQA). Not sure what's the process but seems like worth a mention...
Several recent papers on this topic from Anthropic (Claude-2), GDM (our work with Gemini 1.5 Pro with 1M context), CMU (open models) and Stanford (multimodal ICL) https://arxiv.org/abs/2404.11018 https://www.anthropic.com/research/many-shot-jailbreaking https://arxiv.org/abs/2405.00200 https://arxiv.org/abs/2405.09798