TitleCROSS-MODALITY MATCHING BASED ON FISHER VECTOR WITH NEURAL WORD EMBEDDINGS AND DEEP IMAGE FEATURES
AuthorsHan, Liang
Wang, Wenmin
Fan, Mengdi
Wang, Ronggang
AffiliationPeking Univ, Shenzhen Grad Sch, Sch Elect & Comp Engn, Lishui Rd 2199, Shenzhen 518055, Peoples R China.
KeywordsCross-modal retrieval
Fisher Vector
deep CNN image features
cross-modality matchin
Issue Date2017
PublisherIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
CitationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). 2017, 2921-2925.
AbstractCross-modal retrieval, which aims to solve the problem that the query and the retrieved results are from different modality, becomes more and more essential with the development of the Internet. In this paper, we mainly focus on the exploration of high-level semantic representation of image and text for cross-modal matching. Deep convolutional image features and Fisher Vector with neural word embeddings are utilized as visual and textual features respectively. To further investigate the correlation among heterogeneous multimodal characteristics, we use multiclass logistic classifier for semantic matching across modalities. Experiments on Wikipedia and Pascal Sentence dataset demonstrate the robustness and effectiveness for both Img2Text and Text2Img retrieval tasks.
URIhttp://hdl.handle.net/20.500.11897/499581
ISSN1520-6149
DOI10.1109/ICASSP.2017.7952691
IndexedEI
CPCI-S(ISTP)
Appears in Collections:信息工程学院

Files in This Work
There are no files associated with this item.

Web of Science®



Checked on Last Week

Scopus®



Checked on Current Time

百度学术™



Checked on Current Time

Google Scholar™





License: See PKU IR operational policies.