Fine-grained Visual Textual Alignment for Cross-Modal Retrieval using Transformer Encoders

Bibtex entry :

@techreport { messina:arxiv2008.05231,
    author = { Nicola Messina and Giuseppe Amato and Andrea Esuli and Fabrizio Falchi and Claudio Gennaro and  St{\'{e}}phane Marchand{-}Maillet },
    title = { Fine-grained Visual Textual Alignment for Cross-Modal Retrieval using Transformer Encoders },
    institution = { CoRR abs/2008.05231 },
    year = { 2020 },
    url = { https://arxiv.org/abs/2008.05231 },
}
--

Keywords: machine learning, information geometry, data mining, Big Data, affective information retrieval (recherche d'information), information visualisation, content-based image and video retrieval (CBIR, CBR, CBVR, CBMR, CBMIR), information mining, classification, multimedia and multimodal information management, semantic web, knowledge base (RDF, OWL, XML, metadata, auto-annotation, description), multimodal information fusion