Emu3.5 Collection Native Multimodal Models are World Learners π β’ 4 items β’ Updated 28 days ago β’ 71
EVA-CLIP: Improved Training Techniques for CLIP at Scale Paper β’ 2303.15389 β’ Published Mar 27, 2023
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale Paper β’ 2211.07636 β’ Published Nov 14, 2022 β’ 1
You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale Paper β’ 2412.06699 β’ Published Dec 9, 2024 β’ 13
EVEv2: Improved Baselines for Encoder-Free Vision-Language Models Paper β’ 2502.06788 β’ Published Feb 10 β’ 13
Dense Contrastive Learning for Self-Supervised Visual Pre-Training Paper β’ 2011.09157 β’ Published Nov 18, 2020
OmniGen2: Exploration to Advanced Multimodal Generation Paper β’ 2506.18871 β’ Published Jun 23 β’ 78
Audio-Sync Video Generation with Multi-Stream Temporal Control Paper β’ 2506.08003 β’ Published Jun 9 β’ 3
Unveiling Chain of Step Reasoning for Vision-Language Models with Fine-grained Rewards Paper β’ 2509.19003 β’ Published Sep 23
Uniform Discrete Diffusion with Metric Path for Video Generation Paper β’ 2510.24717 β’ Published Oct 28 β’ 39
Emu3.5: Native Multimodal Models are World Learners Paper β’ 2510.26583 β’ Published Oct 30 β’ 107
Emu3.5 Collection Native Multimodal Models are World Learners π β’ 4 items β’ Updated 28 days ago β’ 71
Emu3.5: Native Multimodal Models are World Learners Paper β’ 2510.26583 β’ Published Oct 30 β’ 107