Visual Intelligence, Pretrained Vision-and-Language Model, Embodied AI, Collaborative Agents, Vision Task(Object Detection, Segmentation)
Answer questions about images with text prompts
Generate images from text descriptions