Jue WANG

Results 20 comments of Jue WANG

It seems that the sequence length is too long (i.e. # tokens > 512). if you are ok with limiting the max sequence length, then try truncating the input text...

If you want to reduce the memory usage and improve the training speed, I would recommend you to reduce the hidden size "hidden_dim" (e.g. 100) and the number of layers...

@satpalsr I can use multiple GPUs with the following snippet: ```python import torch from transformers import AutoConfig, AutoTokenizer from transformers import AutoModelForCausalLM from accelerate import dispatch_model, infer_auto_device_map from accelerate.utils import...

@davismartens The training script saves a ckpt per `CHECKPOINT_STEPS`, so usually you can just pick the latest one :)

> @LorrinWWW great thanks. Can I run the pretrained model without training too? Sure! You can run our pretrained base model.

@davismartens It appears that the `bot.py` is unable to locate the retrieval module, which should be present in the root directory of the `OpenChatKit` repository. Could you try running the...

@davismartens Can you try this? `export PYTHONPATH=/mnt/c/Users/davis/dev-projects/OpenChatKit:$PYTHONPATH`

@davismartens That's true, we specified `use_auth_token=True`.. You can either login HF: ```sh pip install --upgrade huggingface_hub huggingface-cli login ``` Or, since `togethercomputer/GPT-NeoXT-Chat-Base-20B` is publicly available now, you can simply remove...