Image transformation for InternVL-1.5
I found that there is a image transformation step on load_image function from the example on huggingface repo (transformers based), but there is not any image processing on the gradio_web_server example (the performance is still fine). Then my question is that if there is a comparing research on performance difference with image processing and without image processing? Thanks in advance
Hello, in fact, in the gradio_web_server image transformation is also done, but the code is a little different. In gradio_web_server, the image transformation is conducted by CLIPImageProcessor, which is actually aligned.
See this line of code for more detail: https://github.com/OpenGVLab/InternVL/blob/main/internvl_chat/internvl/serve/model_worker.py#L80