Text-to-Image Vehicle Re-Identification: Multi-Scale Multi-View Cross-Modal Alignment Network and a Unified Benchmark
Vehicle Re-IDentification (Re-ID) aims to retrieve the most similar images with a given query vehicle image from a set of images captured by non-overlapping cameras, and plays a crucial role in intelligent transportation systems and has made impressive advancements in recent years. In real-world scenarios, the authors can often acquire the text descriptions of target vehicle through witness accounts, and then manually search the image queries for vehicle Re-ID, which is time-consuming and labor-intensive. To solve this problem, this paper introduces a new fine-grained cross-modal retrieval task called text-to-image vehicle re-identification, which seeks to retrieve target vehicle images based on the given text descriptions. To bridge the significant gap between language and visual modalities, the authors propose a novel Multi-scale multi-view Cross-modal Alignment Network (MCANet). In particular, the authors incorporate view masks and multi-scale features to align image and text features in a progressive way. In addition, the authors design the Masked Bidirectional InfoNCE (MB-InfoNCE) loss to enhance the training stability and make the best use of negative samples. To provide an evaluation platform for text-to-image vehicle re-identification, the authors create a Text-to-Image Vehicle Re-Identification dataset (T2I VeRi), which contains 2465 image-text pairs from 776 vehicles with an average sentence length of 26.8 words. Extensive experiments conducted on T2I VeRi demonstrate MCANet outperforms the current state-of-art (SOTA) method by 2.2% in rank-1 accuracy.
- Record URL:
-
Availability:
- Find a library where document is available. Order URL: http://worldcat.org/oclc/41297384
-
Supplemental Notes:
- Copyright © 2024, IEEE.
-
Authors:
- Ding, Leqi
- Liu, Lei
- Huang, Yan
- Li, Chenglong
- Zhang, Cheng
- Wang, Wei
- Wang, Liang
- Publication Date: 2024-7
Language
- English
Media Info
- Media Type: Web
- Features: References;
- Pagination: pp 7673-7686
-
Serial:
- IEEE Transactions on Intelligent Transportation Systems
- Volume: 25
- Issue Number: 7
- Publisher: Institute of Electrical and Electronics Engineers (IEEE)
- ISSN: 1524-9050
- Serial URL: http://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6979
Subject/Index Terms
- TRT Terms: Cameras; Identification systems; Image analysis; Intelligent transportation systems; Vehicular ad hoc networks; Visualization
- Subject Areas: Data and Information Technology; Highways; Vehicles and Equipment;
Filing Info
- Accession Number: 01936078
- Record Type: Publication
- Files: TRIS
- Created Date: Nov 7 2024 9:21AM