SurplusDeficit
/

GeLM

Model card Files Files and versions

GeLM / README.md

qiruichen1206@gmail.com

Update README.md

595112c over 1 year ago

|

history blame contribute delete

513 Bytes

	---
	datasets:
	- SurplusDeficit/MultiHop-EgoQA
	---

	# Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos


	## GeLM Model

	We propose a novel architecture, termed as <b><u>GeLM</u></b> for MH-VidQA, to leverage the world knowledge reasoning capabilities of multi-modal large language models (LLMs), while incorporating a grounding module to retrieve temporal evidence in the video with flexible grounding tokens.

	<div align="center">
	<img src="./assets/architecture_v3.jpeg" style="width: 80%;">
	</div>