Vid2Seq: a pretrained visual language model for describing m

Vid2Seq: a pretrained visual language model for describing multi-event videos – Google AI Blog

Vid2Seq: a pretrained visual language model for describing multi-event videos – Google AI Blog
googleblog.com - get the latest breaking news, showbiz & celebrity photos, sport news & rumours, viral videos and top stories from googleblog.com Daily Mail and Mail on Sunday newspapers.

Related Keywords

Antoine Yang , Research Scientist , Google Research , Student Researcher , Arsha Nagrani , Large Scale Pretraining , Visual Language Model , Dense Video , Activitynet Captions ,