Xinlong Wang's picture

Xinlong Wang

xinlongwang

·

https://xloong.wang/

WXinlong

AI & ML interests

computer vision, foundation model

Recent Activity

updated a collection 28 days ago

authored a paper about 1 month ago

EVA-02: A Visual Representation for Neon Genesis

authored a paper about 1 month ago

EVA-CLIP: Improved Training Techniques for CLIP at Scale

View all activity

Organizations

updated a collection 28 days ago

Emu3.5

Native Multimodal Models are World Learners 🌍 • 4 items • Updated 28 days ago • 71

authored 14 papers about 1 month ago

EVA-02: A Visual Representation for Neon Genesis

Paper • 2303.11331 • Published Mar 20, 2023

EVA-CLIP: Improved Training Techniques for CLIP at Scale

Paper • 2303.15389 • Published Mar 27, 2023

Uni3D: Exploring Unified 3D Representation at Scale

Paper • 2310.06773 • Published Oct 10, 2023

Tokenize Anything via Prompting

Paper • 2312.09128 • Published Dec 14, 2023 • 1

EVA: Exploring the Limits of Masked Visual Representation Learning at Scale

Paper • 2211.07636 • Published Nov 14, 2022 • 1

You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale

Paper • 2412.06699 • Published Dec 9, 2024 • 13

EVEv2: Improved Baselines for Encoder-Free Vision-Language Models

Paper • 2502.06788 • Published Feb 10 • 13

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

Paper • 2011.09157 • Published Nov 18, 2020

OmniGen2: Exploration to Advanced Multimodal Generation

Paper • 2506.18871 • Published Jun 23 • 78

Unified Vision-Language-Action Model

Paper • 2506.19850 • Published Jun 24 • 27

Audio-Sync Video Generation with Multi-Stream Temporal Control

Paper • 2506.08003 • Published Jun 9 • 3

Unveiling Chain of Step Reasoning for Vision-Language Models with Fine-grained Rewards

Paper • 2509.19003 • Published Sep 23

Uniform Discrete Diffusion with Metric Path for Video Generation

Paper • 2510.24717 • Published Oct 28 • 39

Emu3.5: Native Multimodal Models are World Learners

Paper • 2510.26583 • Published Oct 30 • 107

liked 3 models about 1 month ago

BAAI/Emu3.5-Image

Text-to-Image • 34B • Updated Nov 5 • 283 • 61

BAAI/Emu3.5

Any-to-Any • 34B • Updated Nov 5 • 373 • 162

BAAI/Emu3.5-VisionTokenizer

Updated Oct 31 • 157 • 19

upvoted a collection about 1 month ago

Emu3.5

Native Multimodal Models are World Learners 🌍 • 4 items • Updated 28 days ago • 71

upvoted a paper about 1 month ago

Emu3.5: Native Multimodal Models are World Learners

Paper • 2510.26583 • Published Oct 30 • 107