Large-scale distributed multimedia information retrieval strategies:
Classical challenges in the design of multimedia information systems are now augmented by the growth of collections in size, complexity and diversity of the data, complexity of the information networks. Multimedia collections associated with community-based exchange networks such as Deezer, Flickr or YouTube commonly handle millions of audio, image and video documents. The support of these collections over the web associates this multimedia data with textual information (e.g. meta-data, tags, webpages). This text is often the only handle for the access of the multimedia data. There is a growing interest to map established findings in content-based multimedia information retrieval to these large-scale contexts. The present project precisely follows that line.
In this project, we wish to propose solutions to enable indexing and retrieval of multimedia data for large-scale collections and in a distributed context of processing and data distribution. We identify distinct inter-related problems that we wish to address:
This project will be developed in the context of truly large-scale operations. We are involved in the organization of the ImageCLEF multimedia retrieval track based on the Wikipedia collection comprising few hundred of thousands of images with associated text. We have also created contacts with maintainers of the CoPhIR (Content-based Photo Image Retrieval) collection comprising now 106 millions tagged images from Flickr. While the former will be a proper retrieval performance evaluation platform, the latter will be a suitable testbed for scalability of our system and procedures.