Magma
Magma copied to clipboard
How to process a video to processor?
@ucaswindlike ,
you can simply add more placeholder
convs = [
{"role": "system", "content": "You are agent that can see, talk and act."},
{"role": "user", "content": "<image_start><image><image_end><image_start><image><image_end><image_start><image><image_end>\nWhat is the letter on the robot?"},
]
prompt = processor.tokenizer.apply_chat_template(convs, tokenize=False, add_generation_prompt=True)
inputs = processor(images=[image]*3, texts=prompt, return_tensors="pt")
Also, @jwyang will you be releasing a robotic action example ? :)
@rr3087 , Actually, we already included a robot action example in agents folder based on libero env. Please take a look! Ideally, it would be great to set up a gradio demo for robot manipulation, but we have not yet figured out how to do that.