Damien Tanner
Damien Tanner
Just been reading about ThoughtSource. I'm excited to see this dataset :)
based on this proposal of the format being called Hugging Face MessagesList format. I am going to update this PR and move the changes from the sharegpt format to a...
> Sorry about the conflicts, hoepfully you can sort them. Please get @dctanner to review this soon ish, so we can avoid further branch conflcts I'm on holiday for another...
Looking good Linah. I've been able to do a brief review, but I've asked @Alaatohamy to do a more in depth review because she has more context to review the...
👍 I'll edit the story.
Looking forward to the cocktails @BrianArbuckle! I've settled on https://huggingface.co/datasets/recipe_nlg being the best dataset. In particular the items labelled 'Gathered' which are higher quality (less mistakes in measurement units). I've...
@BrianArbuckle simplest is usually best :) Maybe just keep these extra things as columns in your db (even though most will be blank). As I understand it, when preparing the...
I like the jsonl format proposed. It seems sensible to have a consistent format for all our data, and json in general is more flexible if we do want to...
I am also having this error when doing a Mistral qlora using `sample_packing: true`, `val_set_size: 0.1` and the dataset https://huggingface.co/datasets/sablo/dolly_curated (14k samples). The two ways I've made it work are...
Same issue here. Is there a previous version that is known to work with the TinyLlama and Phi Demos?