Luke Nitish Kumar comments

Results 3 comments of


                                            Luke Nitish Kumar

Improvements to MGSM

The filter on the native languages tasks should also need some updating which currently uses the English format for the answer. https://github.com/EleutherAI/lm-evaluation-harness/blob/a72babbfbddd9195748351892dced4f82fccbc0d/lm_eval/tasks/mgsm/native_cot/cot_yaml#L28 Hugginface dataset French few shot example: Question :...

Support easy concatenation of datasets

Here is a sample training data-mix I'm using... ``` training: type: blended datasets: - type: file path: /mnt/datasets/tokenized/Mistral-Nemo-Base-2407/starcoderdata/java/fast_llm_config.yaml - type: file path: /mnt/datasets/tokenized/Mistral-Nemo-Base-2407/starcoderdata/javascript/fast_llm_config.yaml - type: file path: /mnt/datasets/tokenized/Mistral-Nemo-Base-2407/starcoderdata/python/fast_llm_config.yaml - type:...

Simplify config validation

Couple of requests/questions: 1. Is it possible to do these validation before launching a job like a mock run? When new configs are tested it would be a quick turn...