Towards Efficient Cross-Modal Visual Textual Retrieval using Transformer-Encoder Deep Features

Bibtex entry :

@techreport { messina:arxiv2106.00358,
    author = { Nicola Messina and Giuseppe Amato and Fabrizio Falchi and Claudio Gennaro and Stephane Marchand-Maillet },
    title = { Towards Efficient Cross-Modal Visual Textual Retrieval using Transformer-Encoder Deep Features },
    journal = { CoRR },
    volume = { abs/2106.00358 },
    year = { 2021 },
    url = { https://arxiv.org/abs/2106.00358 },
}
--

Keywords: machine learning, information geometry, data mining, Big Data, affective information retrieval (recherche d'information), information visualisation, content-based image and video retrieval (CBIR, CBR, CBVR, CBMR, CBMIR), information mining, classification, multimedia and multimodal information management, semantic web, knowledge base (RDF, OWL, XML, metadata, auto-annotation, description), multimodal information fusion