Kyungmin Lee

Results 9 issues of Kyungmin Lee

```len(DeepspeedDataLoader)``` is currently equal to ```len(data_sampler) = len(dataset) / data_parallel_world_size```. But, ```len(DataLoader)``` is equal to ```len(dataloader) = len(data_sampler) / batch_size```.

In ZeRO stage1, it works. I used `from_pretrained("facebook/bart-base")` as backbone in transformers==4.2.1 In ZeRO stage2, `backward` stop working like infinite loop at some processes. In ZeRO stage3, ``` [2022-05-18 14:33:12,828]...

bug

fix typo in docker run script https://github.com/triton-inference-server/fastertransformer_backend#rebuilding-fastertransformer-backend-optional

Hi, I'm following the [setup guide](https://github.com/triton-inference-server/fastertransformer_backend#setup). I found a bug and solved it. https://github.com/triton-inference-server/fastertransformer_backend#setup ``` docker run -it \ --shm-size=1g --ulimit memlock=-1 \ -v ${WORKSPACE}:/workspace \ --name ft_backend_builder \ ${TRITON_DOCKER_IMAGE}...

I am trying to continue training my model from a checkpoint, using paxml. Does paxml not support `restore_checkpoint_dir` or `restore_checkpoint_step` for train mode?

### System Info - CPU architecture: x86_64 - CPU/Host memory size: 1T - GPU name: NVIDIA A100-40G - TensorRT-LLM branch: main, v0.9.0, 118b3d7 - CUDA: 12.3 - NVIDIA driver: 545.23.08...

bug

``` llm = LLM('/app/models/tensorrt_llm', skip_tokenizer_init=True) sampling_params = SamplingParams(end_id=2, return_context_logits=True, max_new_tokens=1) results = llm.generate([[32, 12,24,54,6,747]], sampling_params=sampling_params) print(results) print(results[0].context_logits) ``` ``` GenerationResult(request_id=1, prompt_token_ids=[32, 12, 24, 54, 6, 747], outputs=[CompletionOutput(index=0, text='', token_ids=[], cumulative_logprob=None,...

stale

Converting EXAONE without setting `TRTLLM_DISABLE_UNIFIED_CONVERTER=1` causes this error. I used [EXAONE-3.5-32B-Instruct](https://huggingface.co/LGAI-EXAONE/EXAONE-3.5-32B-Instruct), 2.4B and 7.8B. `model_name = hf_model_or_dir` does not contain 'exaone' If the model is EXAONE, `config.architecture` is always [ExaoneForCausalLM](https://huggingface.co/LGAI-EXAONE/EXAONE-3.5-32B-Instruct/blob/main/config.json#L4)....

Community want to contribute

Hi, can you check the base images in [dockerfile](https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/dockerfile/Dockerfile.trt_llm_backend#L1-L3)? It looks like internal base images.