EgoCross Teaser

EgoCross

A comprehensive benchmark across Surgery, Industry, Extreme Sports, and Animal Perspective. EgoCross comprises 798 clips and 957 QA pairs, supporting both CloseQA and OpenQA formats for fine‑grained evaluation.

Cross‑Domain Egocentric Video QA

About the Dataset

EgoCross is a cross-domain benchmark designed to evaluate how well multimodal large language models (MLLMs) generalize to egocentric video question answering (VQA). Unlike prior daily-life datasets, EgoCross focuses on diverse and challenging domains — including surgery, industrial assembly, extreme sports, and animal perspective — to assess model robustness under varying visual and semantic conditions.

The benchmark covers 15 sub-tasks grouped into four capability families: Identification, Localization, Prediction, and Counting. Each video clip is paired with multiple close-ended and open-ended questions that require fine-grained temporal, spatial, and reasoning understanding.

In total, EgoCross contains 798 video clips and 957 QA pairs, curated through a semi-automatic pipeline combining LLM-based question generation and human verification. It provides a unified platform for measuring cross-domain generalization, highlighting the gap between everyday understanding and complex real-world egocentric perception.

Representative Domains

Below are four domain exemplars (Figure‑1 style). Replace the placeholders with your own video frames or GIF thumbnails.

Surgery

Fine‑grained tool recognition, phase understanding, and hand‑specific interactions.

Industry

Component identification, procedural reasoning, and tool‑usage logic.

Extreme Sports

High‑speed egocentric motion, navigation cues, and temporal anticipation.

Animal Perspective

Species cues, alternative movement patterns, and behavioral understanding.

Download

EgoCross dataset is now available on Hugging Face. Access the complete benchmark with all domains and QA pairs.

Download from Hugging Face 🤗 Complete dataset with 798 clips and 957 QA pairs across 4 domains.

Resources

Paper
EgoCross
Getting Started
Data loaders, evaluation scripts, and examples (to be added upon release).

Team

East China Normal University  •  INSAIT •  Fudan University

Yuqian Fu
Yuqian Fu
Tianwen Qian
Tianwen Qian
Yanjun Li
Yanjun Li
Yu Li
Yu Li
Kunyu Peng
Kunyu Peng
Xu Zheng
Xu Zheng
Yongqin Xian
Yongqin Xian
Alessio
Alessio Tonioni
Yanwei Fu
Yanwei Fu
Xiaoling Wang
Xiaoling Wang
Danda Paudel
Danda Paudel
Federico
Federico Tombari
Luc Van Gool
Luc Van Gool

Contact

For questions and collaboration, please reach out to our team members.

Yanjun Li
51265901098@stu.ecnu.edu.cn
Yuqian Fu
yuqian.fu@insait.ai
Tianwen Qian
twqian@cs.ecnu.edu.cn