Fan
Fan
Hi! Thanks for reporting this issue. I guess there are some reasons for this issue: 1. I noticed you are using GPT-3.5-turbo-16k, which sometimes won't perfectly follow our system prompt....
Hi~ I also wonder is there a way we can start a server with multi-GPU, e.g., I want to start the server using `llama-7b-chat` and can I simply set `tp-size=8`...
Thanks for your quick reply!!😊😊 I understand the inter-GPU communication cost now and indeed 7B model works just fine on simple GPU. So can I say that data parallel is...
Got it! Thanks!
Hi~ really need this feature! One question: When I use the same tasks to evaluate the models(with same architecture but are from different runs), will the evaluation sample (docs and...
That's Cool! thanks for your explanation!
Thank you for your kind words!😄 Since we haven't conducted experiments on multilingual data, I don't have a definitive answer, but I think ProX could work better given proper SFT...
Hi @mtasic85, thank you for your interest in ProX! We will try it on code data in the coming days; however, I can't confirm the exact timeline yet. Unlike our...
@mtasic85 If you are still interested in large-scale and high-quality code dataset, you may find our new [MegaMath](https://huggingface.co/datasets/LLM360/MegaMath) dataset helpful, especially the [megamath-code](https://huggingface.co/datasets/LLM360/MegaMath/tree/main/megamath-code) subset. Although our primary goal is to...