Question about loading tokenizer

Open Haodong-Lei-Ray opened this issue 10 months ago • 1 comments

Hi. I use your official scripts in Readme:

#! /bin/bash

export SCRIPT_DIR="$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )"
export PROJECT_DIR="$( cd -- "$( dirname -- "$SCRIPT_DIR" )" &> /dev/null && pwd )"
cd $PROJECT_DIR
export PYTHONPATH="$PYTHONPATH:$PROJECT_DIR"

export llama_tokenizer_path="LargeWorldModel/LWM-Chat-1M-Jax"
export vqgan_checkpoint="/data/lei/localmodel/LargeWorldModel/LWM-Chat-1M-Jax/vqgan"
export lwm_checkpoint="/data/lei/localmodel/LargeWorldModel/LWM-Chat-1M-Jax/params"

python3 -u -m lwm.vision_generation \
    --prompt='Fireworks over the city' \
    --output_file='fireworks.mp4' \
    --temperature_image=1.0 \
    --temperature_video=1.0 \
    --top_k_image=8192 \
    --top_k_video=1000 \
    --cfg_scale_image=5.0 \
    --cfg_scale_video=1.0 \
    --vqgan_checkpoint="$vqgan_checkpoint" \
    --n_frames=8 \
    --mesh_dim='!1,1,-1,1' \
    --dtype='fp32' \
    --load_llama_config='7b' \
    --update_llama_config="dict(sample_mode='vision',theta=50000000,max_sequence_length=32768,scan_attention=False,scan_query_chunk_size=128,scan_key_chunk_size=128,scan_mlp=False,scan_mlp_chunk_size=8192,scan_layers=True)" \
    --load_checkpoint="$lwm_checkpoint" \
    --tokenizer="$llama_tokenizer_path"
read

But it still appear:

Entry Not Found for url: https://huggingface.co/LargeWorldModel/LWM-Chat-1M-Jax/resolve/main/config.json.

How to modify the code " tokenizer = AutoTokenizer.from_pretrained(FLAGS.tokenizer)" in file?

Would be like that?

sp = spm.SentencePieceProcessor()
tokenizer .Load("LargeWorldModel/LWM-Chat-1M-Jax/tokenizer.model")

Thanks to your work.

Apr 03 '25 08:04 Haodong-Lei-Ray

My transformers==4.40.0

Apr 03 '25 09:04 Haodong-Lei-Ray