Impact of Binary-Valued Representation on the Performance of Cross-Modal Retrieval System

Nikita  Bhatt; Amit  Ganatra; Nirav  Bhatt; Purvi  Prajapati; Mrugendra  Rahevar; Martin  Parmar

doi:https://doi.org/10.33889/IJMEMS.2022.7.6.060

Nikita Bhatt
U & P U. Patel Department of Computer Engineering, CSPIT, CHARUSAT, Gujarat, India.

Amit Ganatra
Devang Patel Institute of Advance Technology and Research, CHARUSAT, Gujarat, India.

Nirav Bhatt
Smt. Kundanben Dinsha Patel Department of Information Technology, CSPIT, CHARUSAT, Gujarat, India.

Purvi Prajapati
Smt. Kundanben Dinsha Patel Department of Information Technology, CSPIT, CHARUSAT, Gujarat, India.

Mrugendra Rahevar
U & P U. Patel Department of Computer Engineering, CSPIT, CHARUSAT, Gujarat, India.

Martin Parmar
U & P U. Patel Department of Computer Engineering, CSPIT, CHARUSAT, Gujarat, India.

DOI https://doi.org/10.33889/IJMEMS.2022.7.6.060

Received on April 09, 2022

;

Accepted on September 06, 2022

Abstract

The tremendous proliferation of Multi-Modal data and the flexible need of users has drawn attention to the field of Cross-Modal Retrieval (CMR), which can perform image-sketch matching, text-image matching, audio-video matching and near infrared-visual image matching. Such retrieval is useful in many applications like criminal investigation, recommendation systems and person reidentification. The real challenge in CMR is to preserve semantic similarities between various modalities of data. To preserve semantic similarities, existing deep learning-based approaches use pairwise labels and generate binary-valued representation. The generated binary-valued representation provides fast retrieval with low storage requirement. However, the relative similarity between heterogeneous data is ignored. So, the objective of this work is to reduce the modality-gap by preserving relative semantic similarities among various modalities. So, a model named "Deep Cross-Modal Retrieval (DCMR)" is proposed, which takes triplet labels as the input and generates binary-valued representation. The triplet labels locate semantic similar data points nearer and dissimilar points far in the vector space. Extensive experiments are performed and the result is compared with deep learning-based approaches, which shows that the performance of DCMR increases by 2% to 3% for Image→Text retrieval and by 2% to 5% for Text→Image retrieval in mean average precision (mAP) on MSCOCO, XMedia, and NUS-WIDE datasets. So, the binary-valued representation generated from triplet labels preserve better relative semantic similarities than pairwise labels.

Keywords- Information retrieval, Multi-modal data, VGG-F network, Glove, Multi-layer perceptron (MLP), Mean average precision (MAP).

Citation

Bhatt, N., Ganatra, A., Bhatt, N., Prajapati, P., Rahevar, M., & Parmar, M. (2022). Impact of Binary-Valued Representation on the Performance of Cross-Modal Retrieval System. International Journal of Mathematical, Engineering and Management Sciences, 7(6), 964-981. https://doi.org/10.33889/IJMEMS.2022.7.6.060.

Volume 7 (2022)

Number 6 (December)

Pages 964-981

PDF

Downloads: 13

International Journal of Mathematical, Engineering and Management Sciences

ISSN: 2455-7749 . Open Access

Impact of Binary-Valued Representation on the Performance of Cross-Modal Retrieval System