research:projects:mumia

Large-scale distributed multimedia information retrieval strategies:

Classical challenges in the design of multimedia information systems are now augmented by the growth of collections in size, complexity and diversity of the data, complexity of the information networks. Multimedia collections associated with community-based exchange networks such as Deezer, Flickr or YouTube commonly handle millions of audio, image and video documents. The support of these collections over the web associates this multimedia data with textual information (e.g. meta-data, tags, webpages). This text is often the only handle for the access of the multimedia data. There is a growing interest to map established findings in content-based multimedia information retrieval to these large-scale contexts. The present project precisely follows that line.

In this project, we wish to propose solutions to enable indexing and retrieval of multimedia data for large-scale collections and in a distributed context of processing and data distribution. We identify distinct inter-related problems that we wish to address:

Large-scale multi-modal information indexing: Storage and access to data needs to be defined in a principled way. While large-scale indexing structures exists, the main challenge here is to adapt or exploit these techniques in a multi-modal environment, ie where concurrent access to the same data must be performed following its various facets. Here, we will investigate the use of approximate access structures such as embedding structures or metric-tree;
Large-scale multi-modal information retrieval: Building on the above, retrieval strategies must be constructed and adapted so as to handle multimodality. We have made progress in that direction and wish to advance further in defining learning strategies that are coherent with available data access. In this context, retrieval procedures are constrained to be parsimonious in their access to the data. We will work on the integration of the above-defined access strategies in learning algorithms such as Boosting, SVMs or cluster-based representations that form our current developments;
Exploitation of the distributed context: Multimedia data representation requires the processing of the original data. While this step is somewhat easily distributed over several CPUs with a coarse-grain strategy, obtaining efficiently distributed indexing and learning procedures is more challenging. We start from our Cross-modal Search Engine (CMSE) already achieving some form of distribution and wish to map its algorithms onto a fully distributed context.

This project will be developed in the context of truly large-scale operations. We are involved in the organization of the ImageCLEF multimedia retrieval track based on the Wikipedia collection comprising few hundred of thousands of images with associated text. We have also created contacts with maintainers of the CoPhIR (Content-based Photo Image Retrieval) collection comprising now 106 millions tagged images from Flickr. While the former will be a proper retrieval performance evaluation platform, the latter will be a suitable testbed for scalability of our system and procedures.