German Abramov

Results 21 issues of German Abramov

In original BKRGA paper, and in your REGAL paper we have some fitness value function which one we want to minimize. In original device placement task we want to minimize...

i'm checking `extract_model` method and i see you using `dfs` to check the graph [link](https://github.com/onnx/onnx/blob/fe2433d3dd677ed5c582e194a49a632e707d0543/onnx/utils.py#L132) you using `dfs` via` recursive` [link](https://github.com/onnx/onnx/blob/fe2433d3dd677ed5c582e194a49a632e707d0543/onnx/utils.py#L53) how do you think, if we try to use...

enhancement
stale

As I know, `pytorch2onnx` from `mmseg` doesn't work correctly with `SegFormer` model. I have converted this, and accuracy is really bad (like random). So I have tried to export this...

I'm observe my optimizer metrics while mpt trains, and some blocks are infs e.g. `Train cosine/update_grad/model._fsdp_wrapped_module.transformer.blocks.9._fsdp_wrapped_module.ffn.down_proj.weight: inf` It's ok or why this happens? Do you know this issue? I guess...

I'm trying to use `hf_generate.py`, why it's not working with flag `--attn_impl triton`? also changed in `convert_composer_to_hf.py` to `config.attn_config['attn_impl'] = 'triton'` from `torch` ```ValueError: Requirements for `attn_impl: triton` not installed....

I have trained 125m MPT on some small dataset, my generated inputs via `inference/hf_generate.py` (before this converted from composer to HF, and it's gives me some value from `eval/eval.py` with...

Hi! you have script for prepare data in your `scripts/train` which is `python ../data_prep/convert_dataset_hf.py --dataset c4 --data_subset en --out_root ./my-copy-c4 --splits train_small val_small --concat_tokens 2048 --tokenizer EleutherAI/gpt-neox-20b --eos_text ''` can...

Hi! using this docker image `mosaicml/llm-foundry:2.0.1_cu118-latest` I'm training mpt-125m with your default parameters, my loss explodes after some number of steps I have added warmup 2k steps as well It...

I'm trying to convert c4 dataset from your `convert_hf` code [here](https://huggingface.co/datasets/allenai/c4) they say `en` subset is 305 Gb but if I'm give `c4` and `en` as arguments, it looks like...

In your training config, we can choose data_local or data_remote If I'm using data_remote on s3, what option I will get? Training loop directly from remote s3? Or first transfer...

enhancement