PolyU Unveils VideoMind: An Innovative AI Agent for Enhanced Long Video Understanding and Analysis

University Feed 16/06/2025

2 minutes read

PolyU Unveils VideoMind: An Innovative AI Agent for Enhanced Long Video Understanding and Analysis - AppliedHE

A research team from The Hong Kong Polytechnic University (PolyU) has introduced VideoMind, an artificial intelligence (AI) agent designed to enhance the understanding and analysis of long videos. This framework aims to improve the ability of AI models to perform reasoning and question-answering tasks related to lengthy video content by emulating human cognitive processes. VideoMind employs a Chain-of-Low-Rank Adaptation (LoRA) strategy to optimize computational resource use, addressing the increasing demand for efficient generative AI in video analysis. The research findings have been submitted for presentation at prominent AI conferences.

Complexity of Long Videos

Long videos, particularly those that exceed 15 minutes, often present intricate information that develops over time. This complexity requires AI models to recognize changes and dependencies throughout the content, necessitating significant computing power and memory to facilitate the processing of such extensive videos.

Leadership and Structure of VideoMind

The research team is led by Professor Changwen Chen, Interim Dean of the Faculty of Computer and Mathematical Sciences at PolyU and Chair Professor of Visual Computing. VideoMind’s design is guided by human methods of video comprehension and is structured around four key functions: the Planner, which orchestrates the various roles for each query; the Grounder, which identifies pertinent moments; the Verifier, which checks the accuracy of information from the selected moments; and the Answerer, which produces an answer based on the provided query. This organized structure is intended to address the temporal reasoning challenges typically faced by AI models.

Chain-of-LoRA Strategy

A significant aspect of VideoMind is its Chain-of-LoRA strategy, a recent fine-tuning method that permits AI models to adjust to specific tasks without the need for extensive parameter retraining. This involves the integration of four lightweight LoRA adapters within a single model, enhancing both efficiency and adaptability by allowing selective activation of roles during data processing.

Performance and Availability

The VideoMind framework has been made available as open source on GitHub and Hugging Face, where it includes information regarding its performance across 14 different benchmarks for temporal-grounded video understanding. Comparative studies with other leading AI models, such as GPT-4o and Gemini 1.5 Pro, indicate that VideoMind surpasses them in grounding accuracy for challenging tasks involving videos averaging 27 minutes in duration. Two variants of VideoMind have been developed: a smaller model with 2 billion parameters and a larger model with 7 billion parameters, with the smaller model showing comparable performance to some higher-parameter models.

Human Cognition and Computational Efficiency

Professor Chen noted that human cognition often involves switching between different strategies for video processing, allowing individuals to decompose tasks and synthesize observations into coherent responses. He pointed out that the human brain operates efficiently, using approximately 25 watts of power, substantially less than the power consumed by supercomputers with equivalent processing capabilities. The role-based workflow of VideoMind, combined with the Chain-of-LoRA strategy, seeks to reduce computational demands while enhancing the model’s comprehension abilities.

Potential Impact on AI Technology

As AI continues to play a crucial role in technological developments worldwide, limitations in computing power frequently impede the advancement of AI models. The VideoMind framework presents a potentially effective solution by reducing technological costs and lowering barriers to deployment, thereby addressing challenges related to power consumption during AI processing.

Future Applications

Furthermore, Professor Chen indicated that VideoMind not only mitigates limitations in AI performance for video processing but also functions as a modular, scalable, and interpretable framework for multimodal reasoning. The research team anticipates extending the applications of generative AI to various fields, including intelligent surveillance, sports and entertainment video analysis, and video search engines.

(Source: Hong Kong Polytechnic University)

PolyU Unveils VideoMind: An Innovative AI Agent for Enhanced Long Video Understanding and Analysis

Complexity of Long Videos

Leadership and Structure of VideoMind

Chain-of-LoRA Strategy

Performance and Availability

Human Cognition and Computational Efficiency

Potential Impact on AI Technology

Future Applications

University Feed

Sungkyunkwan University’s Startup Support Foundation Hosts Successful Stanford Research Institute Workshop

Global Hagi Scholarship and Leader Awards Ceremony at Tohoku University – Summer 2025

PolyU Junior Researcher Mentoring Programme 2025: Nurturing Future Research Innovators

Complexity of Long Videos

Leadership and Structure of VideoMind

Chain-of-LoRA Strategy

Performance and Availability

Human Cognition and Computational Efficiency

Potential Impact on AI Technology

Future Applications

University Feed

Stay up to date! Subscribe to the AppliedHE Xtra Xtra Mailing List

MMU Accounting Students Shine as Top Finalists in MICPA x CA ANZ Accountancy Week 2025

UTB's Sand Retention Innovation Lab (SRI) in Jakarta: Advancing Oil and Gas Solutions

Related Articles

University of Luzon Co-Hosts the 12th HCU International Conference 2025: Advancing Global Research and Collaboration

UNAIR Rector Participates in 2025 Indonesian Science and Technology Convention in Bandung

HKU Hosts the 25th Asian Quantum Information Science Conference: A Gathering of Over 400 Experts in Quantum Science

IMU University Researchers Enhance Research Ethics Commitment at YSN-ASM RCR Programme

Sungkyunkwan University’s Startup Support Foundation Hosts Successful Stanford Research Institute Workshop

Global Hagi Scholarship and Leader Awards Ceremony at Tohoku University – Summer 2025

PolyU Junior Researcher Mentoring Programme 2025: Nurturing Future Research Innovators