Binh Tang

Results 39 comments of Binh Tang

While we're still waiting for a code change, is there a temporary solution to fix the problem? I tried to add a CSS rule as suggested in https://github.com/bstriner/keras-tqdm/issues/21 and all...

@byshiue Would you mind confirming whether weight-only quantization works with the GPT 175B model without mixed precision? I have been able to get reasonable outputs using a OPT 175B checkpoint,...

@mady143 The RuntimeError indicates you're out of GPU memory. You can try to reduce the batch size.

@gulzainali98 It appears that the weights you're using are not compatible with `ModelParallelTransformerLanguageModel`, which expects KQV weights to be combined. If you still have issues, I recommend trying one of...

I think the first issue can be fixed by a one-line change (see this [OmegaConf documentation](https://omegaconf.readthedocs.io/en/2.1_branch/usage.html#struct-flag)): ```python with omegaconf.open_dict(cfg): setattr(cfg["model"], "inference", True) ```

@jxmsML The `qkv_proj` can be found in [ModelParallelMultiheadAttention](https://github.com/facebookresearch/metaseq/blob/f2cd36798793604cf51ab8b8a2cb167c964f9667/metaseq/model_parallel/modules/multihead_attention.py#L185), which is [enabled by default](https://github.com/facebookresearch/metaseq/blob/f2cd36798793604cf51ab8b8a2cb167c964f9667/metaseq/model_parallel/modules/multihead_attention.py#L91) when used with Megatron. This is in contrast with the individual `q_proj`, `k_proj`, `v_proj` in [MultiheadAttention](https://github.com/facebookresearch/metaseq/blob/f2cd36798793604cf51ab8b8a2cb167c964f9667/metaseq/modules/multihead_attention.py#L21). The...

> So far I can't find such logic of concatenating the q_proj, v_proj, k_proj into qkv_proj in metaseq. I think the weights are concatenated by default for `ModelParallelTransformerLanguageModel`, and we...

It seems to me that distributed process groups weren't initialized properly. In addition to Punit's suggestion, can you also quickly check if Slurm environment variables have been inherited correctly (e.g....

> I don't think I have Slurm installed but I don't think that's the issue. It seems that the suffix -model_part-0 is missing. You're right, Slurm isn't required and might...

Please see the updated PR for a script to benchmark the generator interface in terms of latency and peak GPU memory usage. We also add a function to collect GPU...