Stella Biderman

Results 284 comments of Stella Biderman

@zhuzilin Yeah, DeepSpeed’s garbage collection is, well, garbage. We actually have an auxiliary tool `tools/kill_all.sh` that kills all DeepSpeed processes across all connected machines because of how common it is...

@zhuzilin I've set aside time to add support for muP to GPT-NeoX this week and would love to check out your code. Where can I find it? Perhaps you can...

Reposting from chat for documentation: EleutherAI found the same and removed fused layernorm from GPT-NeoX: https://github.com/EleutherAI/gpt-neox/pull/428

`git clone https://huggingface.co/bigscience/gpt2-350m-en/tree/megatron-deepspeed` is failing. It says repo not found. I can download the HF version without issue.

> @StellaAthena Have you made progress with this issue? If not, perhaps I'll take a jab at it! > > @stas00 Will the unit test run with the CI? I'm...

@jzf2101 Are there additional changes that need to be made to this PR or can it be merged?

Did your comparison fp16 model use `zero`? I notice that you're not using it here.

@Quentin-Anthony Can you list which DeeperSpeed features would be lost with this move?

> > @Quentin-Anthony Can you list which DeeperSpeed features would be lost with this move? > > Small stuff like logging format, some more detailed timers, and the forward hooks...

It seems to me that the code is structured in a way that fundamentally assumes that durations are discrete. To change this, one would have to change the way durations...