Cross-Modal Retrieval using Random Multimodal Deep Learning


Hemanth Somasekar,Kavya Naveen,



Cross modal similarity search,witter dataset,class labels,strong supervised methods, NUS Wide,Random Multimodal Deep Learning,


In multimedia community, cross modal similarity search based hashing received extensive attention because of the effectiveness and efficiency of query. This research work contributes large scale dataset for weakly managed cross-media recovery, named Twitter100k. Current datasets namely Wikipedia, NUS Wide and Flickr30k, have two main restrictions. First, these datasets are deficient in content diversity, i.e., only some pre-characterized classes are secured. Second, texts in these datasets are written informal dialect, that leads to irregularity with practical applications. To overcome these disadvantages, the proposed method used Twitter100k dataset because of two major points, first, it has 100,000 content-image pairs that are randomly crawled from Twitter and it has no importance in the image classifications. Second, text in Twitter100k is written in informal language by the clients. Since strongly supervised strategies use the class labels that might be missing in practice, this paper mainly concentrates on weakly managed learning for cross-media recovery, in which only text-image sets misused during training. This paper proposed a Random Multimodal Deep Learning (RMDL) based Recurrent Neural Network (RNN) for cross-media retrieval. The variety of input data such as video, text, images etc. are used for cross-media recovery which can be accept by proposed RMDL in weakly dataset. In RMDL, the various input data can be classified by using RNN architecture. to improve the accuracy and robustness of the proposed method, RMDL uses the specific RNN structure i.e. Long Short-Term Memory (LSTM). In the experimental analysis, the results demonstrated that the proposed RMDL-based strategy achieved 78% of Cumulative Match Characteristic (CMC) compared to other datasets.


I.Ahmad, Khaleel, Monika Sahu, Madhup Shrivastava, Murtaza Abbas Rizvi, and Vishal Jain., “An efficient image retrieval tool: query based image management system,” International Journal of Information Technology, pp. 1-9, 2018.

II.Ballan Lamberto, Tiberio Uricchio, Lorenzo Seidenari, and Alberto Del Bimbo,“A cross-media model for automatic image annotation”, In Proceedings of International Conference on Multimedia Retrieval, pp. 73, 2014.

III.Ding Guiguang, Yuchen Guo, and Jile Zhou,“Collective matrix factorizationhashing for multimodal data,” Proceedings of the IEEE conference on computer vision and pattern recognition, 2014.

IV.Deng Cheng, Xu Tang, Junchi Yan, Wei Liu, and Xinbo Gao, “Discriminative dictionary learning with common label alignment for cross-modal retrieval,” IEEE Transactions on Multimedia, vol. 18, 2, pp. 208-218, 2016.

V.Ding, Kun, Bin Fan, Chunlei Huo, Shiming Xiang, and Chunhong Pan, “Cross-modal hashing via rank-order preserving,” IEEE Transactions on Multimedia, vol. 19, no. 3, pp. 571-585, 2017.

VI.Hauptmann, A. G., Yi Yang, and L. Zheng,“Person Re-identification: Past, Present and Future,” 2016.

VII.Hwang Sung Ju, and Kristen Grauman,“Reading between the lines: Object localization using implicit cues from image tags,” IEEE transactions on pattern analysis and machine intelligence vol. 34, no.6,pp. 1145-1158, 2012.

VIII.Jiang Bin, Jiachen Yang, Zhihan Lv, Kun Tian, Qinggang Meng, and Yan Yan, “Internet cross-media retrieval based on deep learning”, Journal of Visual Communication and Image Representation, vol.48, pp. 356-366, 2017.

IX.Kang Cuicui, Shiming Xiang, Shengcai Liao, Changsheng Xu, and Chunhong Pan, “Learning consistent feature representation for cross-modal multimedia retrieval,” IEEE Transactions on Multimedia, vol. 17, no. 3, pp. 370-381, 2015.

X.L. Malliga, and K. Bommanna Raja, “A Novel Content Based Medical Image Retrieval Technique with Aid of Modified Fuzzy C-Means Clustering (CBMIR-MFCM),” Journal of Medical Imaging and Health Informatics vol. 6, no. 3, pp. 700-709, 2016

XI.Pennington Jeffrey, Richard Socher, and Christopher Manning,“Glove: Global vectors for word representation,” Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014.

XII.Pascanu Razvan, Tomas Mikolov, and Yoshua Bengio,“On the difficulty oftraining recurrent neural networks,” International Conference on Machine Learning. 2013.

XIII.Rasiwasia Nikhil, Jose Costa Pereira, Emanuele Coviello, Gabriel Doyle, Gert RG Lanckriet, Roger Levy, and Nuno Vasconcelos,“A new approach to cross-modal multimedia retrieval,” In Proceedings of the 18th ACM international conference on Multimedia, pp. 251-260, ACM.

XIV.Rehman Sadaqat Ur, Shanshan Tu, Yongfeng Huang, and Obaid Ur Rehman, “A Benchmark Dataset and Learning High-Level Semantic Embeddings of Multimedia for Cross-Media Retrieval,” IEEE Access, vol. 6, pp. 67176-67188, 2018.

XV.SharmaAbhishek, Abhishek Kumar, Hal Daume, and David W. Jacobs,“Generalized multiview analysis: A discriminative latent space”, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2160-2167, 2012.

XVI.Shen Fumin, Chunhua Shen, Qinfeng Shi, Anton Van Den Hengel, and Zhenmin Tang,“Inductive hashing on manifolds,” In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1562-1569, 2013.

XVII.Song Jingkuan, Yi Yang, Zi Huang, Heng Tao Shen, and Jiebo Luo,“Effective multiple feature hashing for large-scale near-duplicate video retrieval,” IEEE Transactions on Multimedia, vol. 15, no. 8, pp. 1997-2008, 2013.

XVIII.Song Jingkuan, Yang Yang, Yi Yang, Zi Huang, and Heng Tao Shen, “Inter-media hashing for large-scale retrieval from heterogeneous data sources,” In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp. 785-796, 2015.

XIX.Wu Fei, Zhou Yu, Yi Yang, Siliang Tang, Yin Zhang, and Yueting Zhuang,“Sparse Multi-Modal Hashing,” IEEE Trans. Multimedia, vol. 16, no. 2, pp. 427-439, 2014.

XX.Xu Xing, Yang Yang, Atsushi Shimada, Rin-ichiro Taniguchi, and Li He,“Semi-supervised coupled dictionary learning for cross-modal retrieval in internet images and texts”, In Proceedings of the 23rd ACM international conference on Multimedia, pp. 847-850, 2015.

XXI.Yang Yi, Feiping Nie, Dong Xu, Jiebo Luo, Yueting Zhuang, and Yunhe Pan,“A multimedia retrieval framework based on semi-supervised ranking and relevance feedback,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 4, pp. 723-742, 2012.

XXII.Yang Yang, Zheng-Jun Zha, Yue Gao, Xiaofeng Zhu, and Tat-Seng Chua, “Exploiting web images for semantic video indexing via robust sample-specific loss,” IEEE Transactions on Multimedia, vol. 16, no. 6, pp. 1677-1689, 2014.

XXIII.Yang Yang, Hanwang Zhang, Mingxing Zhang, Fumin Shen, and Xuelong Li,“Visual coding in a semantic hierarchy,” In Proceedings of the 23rd ACM international conference on Multimedia pp. 59-68, 2015.

XXIV.Zhang Hong, Yun Liu, and Zhigang Ma “Fusing inherent and external knowledge with nonlinear learning for cross-media retrieval”, Neurocomputing, vol.119, pp.10-16, 2013.

XXV.Zha Zheng-Jun, Meng Wang, Yan-Tao Zheng, Yi Yang, Richang Hong, and Tat-Seng Chua,“Interactive video indexing with statistical active learning,” IEEE Transactions on Multimedia, vol. 14, no. 1, pp. 17-27, 2014.

XXVI.Zheng Liang, Zhi Bie, Yifan Sun, Jingdong Wang, Chi Su, Shengjin Wang, and Qi Tian,“Mars: A video benchmark for large-scale person re-identification.”In European Conference on Computer Vision, pp. 868-884, Springer, 2016.

XXVII.Zhou Jile, Guiguang Ding, and Yuchen Guo,“Latent semantic sparse hashing for cross-modal similarity search,”In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval, 2014.

Hemanth Somasekar, Kavya Naveen View Download