Tim Isbister
Tim Isbister
Wondering the same. Suprised this hasnt been demonstrated!
But do we really need to manually add domain specific (out of vocab words?). Isn't the purpose with word pieces that they can in theory construct new words with combining...
Same here; ``` deepspeed.runtime.zero.utils.ZeRORuntimeException: The checkpoint being loaded used a DP world size of 7 but the current world size is 8. Automatic adjustment of ZeRO's optimizer state partitioning with...
I will look at this!
# gpt-sw3-20b ```html {"dataset": "swerec", "task": "sentiment-classification", "dataset_languages": ["sv"], "model": "AI-Sweden-Models/gpt-sw3-20b", "results": {"raw": {"test": [{"mcc": 0.7889415012271835, "macro_f1": 0.7977647031307488}, {"mcc": 0.7482306507426133, "macro_f1": 0.7624724867558679}, {"mcc": 0.7385365394017152, "macro_f1": 0.7661884945460464}, {"mcc": 0.7953827412453802, "macro_f1": 0.8079224208867918},...
# gpt-sw3-20b-instruct ``` {"dataset": "swerec", "task": "sentiment-classification", "dataset_languages": ["sv"], "model": "AI-Sweden-Models/gpt-sw3-20b-instruct", "results": {"raw": {"test": [{"mcc": 0.6906651751805405, "macro_f1": 0.7226691138787769}, {"mcc": 0.6822053393033132, "macro_f1": 0.7131633530990946}, {"mcc": 0.5852277661641855, "macro_f1": 0.6392850178395086}, {"mcc": 0.7515938029833797, "macro_f1": 0.7649167883743853},...
I have the same issue during training, trying to finetune on OpenHermes-2.5. I've tried: ```yaml datasets: - path: teknium/OpenHermes-2.5 conversation: chatml type: sharegpt ``` ```bash [2024-05-23 14:57:44,691] [INFO] [axolotl.load_tokenizer:294] [PID:3848826]...
@dakinggg Thanks for your reply! Am not using streaming. This is my dataloader: ```yaml train_loader: name: text dataset: local: ${data_local} remote: ${data_remote} split: tokenizer_name: ${tokenizer_name} max_seq_len: ${max_seq_len} shuffle: true mlm_probability:...
@dakinggg even with streaming: false? I tried setting streaming: true for a different training run, where i had to turn of sequence packing, that yielded way lower tokens/s. But pretty...