Research Projects

The group is actively involved in several national and international projects:

Machine Leaning and Information Geometry

DIP Fund - Ke Sun

In this project, we try to understand Machine Learning from an Information Geometric perspective.


MAAYA: Multimedia Analysis and Access for Documentation and Decipherment of Maya Epigraphy SNF Grant number 144238/ May 2013 - Apr 2016

The aim of this bi-disciplinary project is to tightly integrate the work of Maya epigraphists and computer scientists to (1) jointly design, develop, and assess new computational tools that robustly and effectively support the work of Maya hieroglyphics experts; (2) advance the state of Maya epigraphy through the combination of expert knowledge and the advanced use of these tools; and (3) make these new resources and knowledge available to the scholar community through the creation of an online system (which to our knowledge would be one of a kind) that would allow for search, comparison, annotation, and visualization tasks as part of new investigations worldwide.


KEYSTONE: semantic KEYword-based Search on sTructured data sOurcEs KEYSTONE: semantic KEYword-based Search on sTructured data sOurcEs (COST Action IC 1302: PI: Dr Francesco Guerra - 15 October 2013 - 14 October 2017)

The main objective of the Action is to launch and establish a cooperative network of researchers, practitioners, and application domain specialists working in fields related to semantic data management, the Semantic Web, information retrieval, artificial intelligence, machine learning and natural language processing, that coordinates collaboration among them to enable research activity and technology transfer in the area of keyword-based search over structured data sources. The coordination effort will promote the development of a new revolutionary paradigm that provides users with keyword-based search capabilities for structured data sources as they currently do with documents. Furthermore, it will exploit the structured nature of data sources in defining complex query execution plans by combining partial contributions from different sources.

The main objective of the Action is complemented by the following secondary objectives:

  • Promote the development of novel techniques for keyword-based search over structured data sources.
  • Facilitate the transfer of knowledge and technology to the scientific community, practitioners and the enterprises.
  • Build a critical mass of research activities and outcomes that achieve the sustainability of the research themes beyond the Action

Former Projects


MUMIA: Multilingual and multifaceted interactive information access (COST Action 1002: PI: Dr Michail Salampasis - 30 November 2010 - 29 November 2014)

The tremendous power and speed of current search engines to respond, almost instantaneously to millions of user queries on a daily basis is one of the greatest successes of the past decade. While this technology empowers users need to extract relevant information from the hundreds of thousands of terabytes of existing data available on the web, the next decade presents many new grand challenges. This next wave of search technology is faced with even greater demands, not only in terms of volume of requests, but also in terms of the changes to the content available, and the dynamics of Web 2.0+ data being produced. These increased and new demands mean that search technology must be able to search, filter, extract, combine, integrate, and process multiple and distributed sources of multilingual content, delivered to an even wider global audience and variety of population. Inevitably, Multilingual and Multifaceted Interactive Information Access (MUMIA) research and development will be a key part of the next generation of search technology. Machine Translation (MT), Information Retrieval (IR) and Multifaceted Interactive Information Access (MIIA) are three disciplines which address the main components of MUMIA. However, relevant research, which is vitally important for the development of next generation search systems, is fragmented. This Action will coordinate the collaboration between these disciplines, fostering research and technology transfer in these areas and play an important role in the definition of the future of search. To form a common basis for collaboration the domain of patent retrieval has been selected as a use case, as it provides highly sophisticated and information intensive search tasks that have significant economic ramifications. This Action will explore innovative frameworks to empower the synergies from the disparate research fields of MT/IR/MIIA within the specific context of patent search and other next generation Web applications.


(IM)2: Interactive Multimedia Information Management (Swiss NCCR / Phase I: Jan 2002 - Dec 2005 / Phase II: Jan 2006 - Dec 2009 / Phase III: Jan 2010 - Dec 2013)

Interactive Multimodal Information Management (IM)2 aims at advancing research and developing prototypes in the field of advanced multimodal man-machine interaction. More specifically (IM)2 addresses the technologies that co-ordinate natural input modes (such as speech, pen, touch, hand gestures, head and body movements, and eventually physiological sensors) with multimedia system output, such as speech, sound, images, 3D graphics, and animation. The field of multimodal interaction covers a wide range of critical activities and applications, including recognition and interpretation of spoken, written and gestural language, computer vision, as well as indexing and management of multimedia documents. Other key sub-themes include the protection of information content, limiting information access, and the structuring, retrieval and presentation of multimedia information. These multimodal interfaces represent a new highly strategic direction in the development of advanced information technologies. Thanks to these interfaces, man-machine interactions will be more flexible and easier to use, hence more productive. Ultimately, these multimodal interfaces should flexibly accommodate a wide range of users, tasks, and environments for which any single mode may not suffice. The ideal interface should primarily be able to deal with more comprehensive and realistic forms of data, including mixed data types (i.e., data from different input modalities such as image and audio).

Project main website:


(EU-FP7-NoE / Mar 2008 - Feb 2012)

PetaMedia is a Network of Excellence funded by the European Commission's Seventh Framework Programme. FP7 is a key tool to respond to Europe's needs in terms of jobs and competitiveness, and one of its goals is to maintain leadership in the global knowledge economy.

Four partners form the core of PetaMedia, each representing a national network: Delft University of Technology; EPFL from Switzerland; Queen Mary University London from the UK and from Germany the Technical University of Berlin. From The Netherlands, Delft University of Technology coordinates PetaMedia.

The goal of the NoE PetaMedia is to bring together the research of four national networks in the Netherlands, Switzerland, UK and Germany in the area of multimedia content analysis (MCA) and social peer-to-peer networks and eventually to establish a European virtual centre of excellence.

The collective research that will integrate the four national networks will be directed towards the synergetic combination of user-based collaborative tagging, peer-to-peer networks and multimedia content analysis, and towards identification and exploration of potentials and limitations of combined tagging, MCA and SP2P concepts. Solutions and collaborative research field trials will be build on the coordinating partner's open source P2P software Tribler.

Project main website:

SNF Grant number 200021-119906/ Apr 2009 - Mar 2011

Information search systems come as a solution to mine and retrieve information from large data collections. In the case of exploration, information networks are generally manually organized via moderated user communities (e.g. Wikipedia or YouTube). Fully automated solutions generally come from the domain of Adaptive Hypermedia, creating information networks from the information content. In this project, we wish to design a system that, starting from a rather classical and pragmatic content-based analysis of the media collection is able to setup an initial useful browsing environment. At this stage, experience from the design of multimedia content-based search systems is exploited for the representation and organization of the data along this network. Since it is often the case that data is partly annotated, an initial propagation of the semantic knowledge along this network will enhance its quality. Then, users are invited to actually use the browsing system. A short-term adaptation of the navigation environment according to estimates of the user mental state will allow for augmenting the browsing efficiency and increase the user satisfaction. From a long-term perspective, collaborative user interaction is collected and used as an extra source of semantic information to perform semantic information propagation and incrementally enhance the initial bootstrapping content-based navigation system (project page).


SNF Grant number 200021-122036/ Nov 2008 - Aug 2010

This project aims at supporting, from an initial design of a feature selection methodology, a framework for the efficient calculation of N-way feature information interaction and its systematic exploitation in context of multi modal information processing such as classification or information retrieval. We will do so by reimplementing the calculation of the feature information interaction within the theoretical framework of combinatorics which underlies its main formula. For further speeding up, an intelligent search and sub sampling algorithm seems promising. To properly exploit feature information interactions, the simple feature selection we apply so far turns out not to be sufficient. We plan to develop an information fusion approach based on feature selection and construction such that the underlying relevant attribute relationships can be automatically learnt and exploited from training examples. We think that an improved information fusion will significantly enhance the performance of multimedia data classification and retrieval

Collection Guide

(SNF grant number 200021-109377 / Oct 2006 - Sep 2010)

We look at multimedia information management in a 'queryless' context. We assume the presence of a (large) number of multimedia items and wish to construct a framework that should provide a comprehensive view of the content of these collections effectively. The baseline is a simple draw that shows items in random order. At the center of the project, we focus on estimating the properties and structures of the population of the feature space that represents the multimedia collection. The background we wish to exploit is that of discrete optimization, justified by our view of the collection as discrete points within a high-dimensional space whose topology is unknown and requires modeling. This departs from the traditional stochastic view where items in the collection are seen as being related to samples of a given distribution (e.g., on features). We show that from solutions to that problem, we may derive efficient and adaptive solutions to several problems such as collection sampling, clustering and visualization.

Emotion-driven personalized content-based multimedia management

(SNF grant number 200021-113274 / Mar 2007 - Feb 2009) [Principal Investigator: Prof. T Pun / project page]

A central issue in the multimedia and human-computer interaction domains is to create systems that react in a nearly “human” manner. Imagine yourself coming home. You missed a movie on the TV. You have never seen it and would like to obtain a summary of that movie, more precisely a summary that you are going to like. Alternatively, you would like to retrieve movie segments that give you a particular feeling, such as joy. The project aims at developing a theoretical framework for accessing multimedia data, in particular videos, that will take into account user emotional preferences and behavior. This framework will be validated by a concrete platform for emotion-based access and summarization of videos.

Individual emotional reactions will first be learnt from a selected set of movies presented to a user, by means of several non-invasive physiological sensors. These reactions will constitute individual user profiles describing emotional responses, positive or negative, towards various elements that appear in a movie. A mapping between the video characteristics and the user responses will thus be established. This mapping will constitute the individual user emotional profile. Knowing this user emotional profile, that is what s/he prefers, dislikes, strongly reacts to, etc., it will be possible to develop a truly personalized platform for interacting with videos. Two application examples are planned. The first one is a personalized retrieval tool that will allow searching in the movie database on the basis of emotional features and criteria. The second application is an emotion-based video summarization tool, which would allow a user to ask for a summary emphasizing some particular emotions, for instance joy. To ensure privacy the emotional profiles will only be known by the user, who will always retain the ultimate control over the system.

This project lies at the frontier between human-computer interaction (HCI) and affective computing, emotion research, content-based processing of multimedia data. It aims at obtaining novel and fundamental results: in HCI, regarding user modeling and emotion-based interaction customization; in emotion research, regarding determination of higher level emotions, coupling between temporal emotion patterns and dynamic movie characteristics; in content-based processing of multimedia data, by significantly enhancing existing techniques though the use of emotional information. The targeted applications will exploit emotion recognition in an innovative way. The work will also permit various further developments in the domain of interaction with multimedia data.


MultiMatch: Multimedia/Multilingual Access to Cultural Heritage (EU-FP6-STREP 033104 / May 2006 - Oct 2008)

The MultiMatch search engine will be able to:

  • identify relevant material via an in-depth crawling of selected cultural heritage institutions, accepting and processing any semantic web encoding of the information retrieved;
  • crawl the Internet to identify websites with cultural heritage information, locating relevant texts, images and videos, regardless of the source and target languages used to write the query and/or describe the results;
  • automatically classify the results in a semantic-web compliant fashion, based on document content, its metadata, its context, and on the occurrence of relevant CH concepts in the document, and automatically extract relevant information which will then be used to create cross-links between related material, such as the biography of an artist, exhibitions of his/her work, critical analyses, etc.;
  • organize and further analyse the material crawled to serve focused queries generated from user-formulated information needs;
  • interact with the user to obtain a more specific definition of initial information requirements, and finally;
  • organize and display search results in an integrated, user-friendly manner, allowing users to access and exploit the information retrieved regardless of language barriers.

The project’s R&D work is organized around three activities:

  • User-oriented research activities will primarily investigate the user requirements and consequent definition of the required functionality of the system, content selection and preparation, studies on the ontologies adopted by cultural heritage institutions and the semantic encoding to be adopted by the system.
  • System-oriented research activities include the study and development of software components for the acquisition, indexing, classification, retrieval and presentation of multilingual cultural heritage information in diverse and mixed media and their integration in the system prototypes.
  • Validation activities will include testing of the system and its integrated components.


(SNF grants number 2100-066648/200020-105282 / Oct 2002 - Oct 2006)

In the first phase of this project, we have proposed DEVA as annotation model, designed as an extension of the Dublin Core in the context of RDF. During this phase, we have soon identified the need for an intelligent environment capable of handling knowledge arising from differing sources and perspectives. In this respect, we have aligned our developments with that of the Semantic Web community and we have proposed the Semantic Web Knowledge Base as a reasoning engine capable of retracting previous conclusions as new contradicting facts are entered. SWKB has been integrated into a practical framework and early results show the appropriateness of the approach. The objective of this proposed second phase is to continue in the direction of re-enforcing the smooth acquisition and inference of knowledge from users. The main focus is to enable the semi-automatic annotation of multimedia collections for accurate high level semantic retrieval. We mostly address two aspects in this phase of the project: * We wish to enhance interactivity during the annotation process by removing situations where the user may be overwhelmed by a surge of information. This is done at the collection level where sampling-based and group-based annotation techniques are defined. In this part, we will construct a close relationship with our efforts on multimedia collection visualization and navigation. The general line that we follow is to define an incremental annotation process whereby a collection item incrementally receives semantic information from disparate sources. Such an incremental annotation process is facilitated and reinforced by a knowledge propagation strategy, based on inter-document relationships created by either content-based techniques or inspirations from collaborative filtering. We also want to enhance interactivity for the manipulation of ontologies whose size may easily become unmanageable. By exploiting our reasoning engine, we propose to highlight and filter out online respectively relevant and irrelevant parts of the ontology. Also by exploiting any co-occurrence of terms or between term and low-level features (in relation to our auto-annotation effort), we wish to construct a recommendation system for the multimedia description. We also propose to enable collaborative annotation context in which the annotation task is distributed amongst human and software operators by allowing multiple description to co-exist and complete each other. Conflict resolution should be addressed by either making hard decisions or by allowing different points of view to co-exist. We see collaborative annotation as beneficial to improve reliability and objectivity. It also forms a richer base for knowledge expansion and propagation. By exploiting strategies inspired from collaborative filtering, one may define user profiles (rather determine user domain of expertise) to add reliability measures on respective annotations. We consider this development is important in a context where classical multimedia document collection are of unmanageable size.


SIMILAR The European Taskforce creating Interfaces SIMILAR to Human-Human Communications (EU-NoE)


M4: Multimodal meeting manager

2002/03 - 2005/02 IST Program, Nr. 2001-34485.


WebKit: Intuitive Physical Interfaces to the Web


Viper SNF grants number 2100-045581/2000-059152/2000-052426 / Apr 1996 - Mar 2002


The support of funding agencies is gratefully acknowledged here:

research/projects.txt · Last modified: 2015/06/03 20:16 by sun

Keywords: machine learning, information geometry, data mining, Big Data, affective information retrieval (recherche d'information), information visualisation, content-based image and video retrieval (CBIR, CBR, CBVR, CBMR, CBMIR), information mining, classification, multimedia and multimodal information management, semantic web, knowledge base (RDF, OWL, XML, metadata, auto-annotation, description), multimodal information fusion