[Bug] LoRA finetune doesnt generate any output text
Checklist
- [X] 1. I have searched related issues but cannot get the expected help.
- [ ] 2. The bug has not been fixed in the latest version.
- [ ] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
Describe the bug
I finetuned the model InternVL-2B with LoRA and uploaded the model to hugging face by copying the missing scripts from original model.. i uploaded it on https://huggingface.co/shivavardhineedi/my_mini_internVL_lora
when i try to inference it... i get:
i see the model is merged model anyways... there is no adapter_config.json... so no need to handle with peft_model module... but why is this not generating any repsonse?
Reproduction
finetuned with 2nd_finetune documentation... and uploaded the output from work_dir to hugging face... but it doesnt work
Environment
# Use an official NVIDIA CUDA image as a base
FROM nvidia/cuda:11.8.0-devel-ubuntu20.04
# Set environment variables
# Set environment variables
ENV CONDA_ENV=internvl
ENV PATH /opt/conda/envs/$CONDA_ENV/bin:$PATH
ENV TZ=America/New_York
# Set the timezone
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone
# Install dependencies
RUN apt-get update && apt-get install -y \
git \
wget \
build-essential \
libgl1-mesa-glx \
libglib2.0-0 \
&& apt-get clean
# Install Miniconda
RUN wget --quiet https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O /tmp/miniconda.sh && \
/bin/bash /tmp/miniconda.sh -b -p /opt/conda && \
rm /tmp/miniconda.sh && \
/opt/conda/bin/conda clean -a && \
ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh
# Clone the InternVL repository
RUN git clone https://github.com/OpenGVLab/InternVL.git /workspace/InternVL
# Create conda environment
RUN /opt/conda/bin/conda create -n $CONDA_ENV python=3.9 -y && \
/opt/conda/bin/conda clean -a -y
# Set the working directory
WORKDIR /workspace/InternVL/internvl_chat
# RUN pip install flash-attn==2.3.6 --no-build-isolation
# Install Python packages using pip
RUN /bin/bash -c "source /opt/conda/etc/profile.d/conda.sh && conda activate $CONDA_ENV && \
pip install torch==2.0.1+cu118 torchvision==0.15.2+cu118 torchaudio==2.0.2+cu118 --extra-index-url https://download.pytorch.org/whl/cu118 && \
pip install packaging && \
pip install flash-attn==2.3.6 --no-build-isolation && \
pip install timm==0.9.12 && \
pip install -U openmim && \
pip install transformers==4.40.0 && \
pip install -U "huggingface_hub[cli]" && \
pip install opencv-python termcolor yacs pyyaml scipy deepspeed==0.13.5 pycocoevalcap tqdm pillow tensorboardX datasets mlfoundry truefoundry orjson peft sentencepiece"
RUN pip install -r /workspace/InternVL/requirements.txt
Error traceback
No response
I experienced the same issue finetuning the 40B variant - loading the resulting model would produce no outputs.
I had to run python tools/merge_lora.py on the finetuned model first. The docs do not make this part clear and I assume many other engineers familiar with LoRAs/peft are used to LoRAs being external small artifacts rather than being contained (but not merged?) in the finetuning output.
I then copied the *.py files to the merged folder (as per instructions), config.json was already present but I overrode it.
At this point I was able to generate captions aligned with my training data but they were prefixed with <s>, which I was able to fix by overwriting the rest *.json configs in the merged folder with the ones from the 40B model.
I experienced the same issue finetuning the 40B variant - loading the resulting model would produce no outputs.
I had to run
python tools/merge_lora.pyon the finetuned model first. The docs do not make this part clear and I assume many other engineers familiar with LoRAs/peft are used to LoRAs being external small artifacts rather than being contained (but not merged?) in the finetuning output.I then copied the *.py files to the merged folder (as per instructions), config.json was already present but I overrode it.
At this point I was able to generate captions aligned with my training data but they were prefixed with
<s>, which I was able to fix by overwriting the rest *.json configs in the merged folder with the ones from the 40B model.
Have you solved it? I experienced the same issue. Could you tell me the detail about solving it?