Harbin Institute of Technology

Harbin Institute of Technology

  • Weihai Campus
  • Shenzhen Campus
  • Language
    • 中文
    • РУССКИЙ
导航
  • Home
  • About HIT
  • Organization
  • News & Events
    • News
    • Events
    • Lecture & Activity
  • Admissions
    • Degree Programs
    • Non-degree Programs
    • Scholarships
    • Applying
  • Research
    • Labs And Centers
    • Joint Research Programs
    • Notable Scholars
    • Collaborating Institutions
    • Publications
    • Research Feature
    • Research News
  • Campus
    • Student Life
    • Campus Landscape
    • Communities & Activities
    • HIT Career Center & Employability
    • Campus Map
  • Global
    • Partners
    • Cooperation in Running Schools
    • International Conferences
    • International Events & Competition
  • Faculty
  • Alumni
    • Alumni Association
    • Alumni Branches
    • Giving & Donation
Research News
Research
homepage  Research  Research News
HIT team makes breakthrough in CLIP fine-grained alignment
Apr 10, 2026
en.hit.edu.cn

A research team led by Professor Zhao Yue from the School of Astronautics at Harbin Institute of Technology (HIT) has made important progress in the research of CLIP fine-grained alignment technology.

 

The research findings, titled MSG-CLIP: Enhancing CLIP's ability to learn fine-grained structural associations through multi-modal scene graph alignment, have been published in Pattern Recognition, an international academic journal in the field of pattern recognition and artificial intelligence.

 

This achievement provides key technical support for improving the precise image-text understanding capability of cross-modal artificial intelligence models.

 

As a core representative of cross-modal pre-trained models, CLIP has become a critical foundational technology in core AI fields such as image-text retrieval, visual question answering, and image-text generation, thanks to its powerful image-text semantic alignment capability.

 

However, as the core module for CLIP to comprehend image-text semantics, fine-grained alignment has long suffered from pain points including low alignment accuracy and insufficient structural learning ability, which has become a key bottleneck restricting the application of CLIP in high-end visual understanding scenarios.

 

Unlike traditional CLIP models that can only achieve coarse-grained semantic matching, breakthroughs in fine-grained alignment technology are the core prerequisite for AI to accurately interpret the deep semantics of images and texts.

 

Targeting the technical pain points, Professor Zhao's team conducted systematic research and proposed the MSG-CLIP framework.

 

Through a multi-modal scene graph alignment mechanism, the framework realizes dual fine-grained precise matching: entity-level modal alignment and triple-level relational alignment, which solves the core defects of traditional CLIP in fine-grained alignment, such as lack of structural information and large matching errors.

 

Experimental results show that without increasing model parameters, MSG-CLIP achieves a substantial 11.2 percent performance improvement over the baseline model on the authoritative benchmark dataset VG-Attribution, and a notable 2.5 percent performance gain on another authoritative benchmark dataset VG-Relation.



 A schematic diagram showing the overall MSG-CLIP framework. [Photo/hit.edu.cn]

 

HIT is the first affiliated party of this paper. Lyu Xiaotian, a doctoral candidate at the School of Astronautics, is the first author, and Professor Zhao is the corresponding author.

 

This research was supported by the National Natural Science Foundation of China, the Key Research and Development Program of Artificial Intelligence in Heilongjiang province, and other projects.

 

Paper link: https://www.sciencedirect.com/science/article/abs/pii/S0031320325014578?via%3Dihub=

Contact Us
  • Study at HIT
    StudyatHIT@hit.edu.cn
  • International Collaborations
    global@hit.edu.cn
Join Us
Qualified candidates must have a Ph.D. degree or equivalent plus postdoctoral experience, and an excellent record of research accomplishments.
Links
  • Study At HIT
  • HIT-Times
  • Academic Calendar
  • Map
Harbin Institute of Technology
BACK TO TOP
Follow Us
  • Facebook
    Facebook
  • Twitter
    Twitter
  • Instagram
    Instagram
  • Linkedin
    Linkedin
  • TikTok
    TikTok
  • Youtube
    Youtube
  • Weibo
    Weibo
  • Wechat
    Wechat

Copyright © 2025 Copyright Harbin Institute of Technology All Rights Reserved No. 92 Xidazhi Street, Nangang District, Harbin 黑ICP备05006863号