Fine-Grained Visual Textual Alignment for Cross-Modal Retrieval Using Transformer Encoders

Bibtex entry :

@article { messina:acmtransmcca2021,
    author = { Nicola Messina and Giuseppe Amato and Andrea Esuli and Fabrizio Falchi and Claudio Gennaro and St{\'{e}}phane Marchand{-}Maillet },
    title = { Fine-Grained Visual Textual Alignment for Cross-Modal Retrieval Using Transformer Encoders },
    journal = { {ACM} Trans. Multim. Comput. Commun. Appl. },
    volume = { 17 },
    number = { 4 },
    pages = { 128:1--128:23 },
    year = { 2021 },
    url = { https://doi.org/10.1145/3451390 },
    doi = { 10.1145/3451390 },
}
--

Keywords: machine learning, information geometry, data mining, Big Data, affective information retrieval (recherche d'information), information visualisation, content-based image and video retrieval (CBIR, CBR, CBVR, CBMR, CBMIR), information mining, classification, multimedia and multimodal information management, semantic web, knowledge base (RDF, OWL, XML, metadata, auto-annotation, description), multimodal information fusion