Rishabh Agarwal issues

Results 9 issues of


                                            Rishabh Agarwal

global_step in momemtum_optimezer

Is setting `trainable = False` not necessary in the variable `batch` in the function `momentum_optimzer`? ie. `batch = tf.Variable(0,trainable=False,name=global_step)` instead of just `batch = tf.Variable(0)`.

Improving the static evaluation function used by Speedy Player

I ran ~50000 games between the `Speedy Player` (1st player) and the `20 seconds Championship Player` (2nd player) respectively and `Speedy Player` won ~50% of the games which was quite...

Adding link to `rliable`?

I was wondering if the open-source library [`rliable`](https://github.com/google-research/rliable), corresponding to our [NeurIPS 2021 best paper](https://agarwl.github.io/rliable/), for reliably evaluating reporting performance on ML and RL benchmarks, especially when using a handful...

Add support for loading data from pandas dataframe

Right now, we only support loading data from numpy arrays. It would be nice if there was a helper function to convert a dataframe of scores to numpy arrays. Some...

enhancement

good first issue

help wanted

Update bloom_filter.py

Fix incorrect use of XOR operator to use correct POWER operator. In Python "2 ^ N" means "2 XOR N" whereas "2 ** N" means "2 to the power N"....

Not able to reproduce results using the code provided

I tried replicating the results provided in the [paper](https://arxiv.org/abs/1910.05396), however, I am getting much higher performance for normal PPO (something around 44.1%) while a lower performance with Rand Conv (53.34%)....

MC sampling not included for test eval

It seems that during evaluation MC approximation = 10 is used by the paper but not implemented in the codebase. Can you provide pointers on how to do and or...

Many-shot ICL

Seems like many-shot prompting seems to help on several of the existings tasks here (Big-bench hard, MATH, GSM8K, GPQA). Not sure what's the process but seems like worth a mention...

many shot ICL / prompting?

Several recent papers on this topic from Anthropic (Claude-2), GDM (our work with Gemini 1.5 Pro with 1M context), CMU (open models) and Stanford (multimodal ICL) https://arxiv.org/abs/2404.11018 https://www.anthropic.com/research/many-shot-jailbreaking https://arxiv.org/abs/2405.00200 https://arxiv.org/abs/2405.09798