Enshen Zhou issues

Repositories
Issues
Comments

Results 3 issues of


                                            Enshen Zhou

feat: Support multi-modal input and multi-modal output in one agent

## Description Mainly implements 4 multi-modal parts: 1. Building an agent to support multiple image inputs (see examples/vision/object_recognition.py) 2. Building an agent to call DALL-E to generate images (see examples/vision/image_crafting.py)...

New Feature

Some questions of interest regarding the details of Prior training.

In the Appendix D.2 section of the paper, the Prior Training section, I understood how Steve-1 collected text-video pairs for training the Prior. I am particularly interested in two points...

Question about Resume?

Thank you very much for providing such an excellent open-source code repository! I have a question regarding the resume functionality. Suppose I am training for just one epoch, and I...