teach
teach copied to clipboard
About Model Input and Output
May I ask what the inputs and outputs of this model are? In my understanding, visual language navigation tasks should not be able to provide input for language and visual information. Then, will the intelligent agent wait for you to navigate to the target point? Why is there no navigation section about the intelligent agent in the code. Also, may I ask why my success rate after running inference is zero