Damien Tanner

Results 10 comments of Damien Tanner

Just been reading about ThoughtSource. I'm excited to see this dataset :)

based on this proposal of the format being called Hugging Face MessagesList format. I am going to update this PR and move the changes from the sharegpt format to a...

> Sorry about the conflicts, hoepfully you can sort them. Please get @dctanner to review this soon ish, so we can avoid further branch conflcts I'm on holiday for another...

Looking good Linah. I've been able to do a brief review, but I've asked @Alaatohamy to do a more in depth review because she has more context to review the...

👍 I'll edit the story.

Looking forward to the cocktails @BrianArbuckle! I've settled on https://huggingface.co/datasets/recipe_nlg being the best dataset. In particular the items labelled 'Gathered' which are higher quality (less mistakes in measurement units). I've...

@BrianArbuckle simplest is usually best :) Maybe just keep these extra things as columns in your db (even though most will be blank). As I understand it, when preparing the...

I like the jsonl format proposed. It seems sensible to have a consistent format for all our data, and json in general is more flexible if we do want to...

I am also having this error when doing a Mistral qlora using `sample_packing: true`, `val_set_size: 0.1` and the dataset https://huggingface.co/datasets/sablo/dolly_curated (14k samples). The two ways I've made it work are...

Same issue here. Is there a previous version that is known to work with the TinyLlama and Phi Demos?