teaching:22-sampling [Viper :: Machine Learning, Data Mining and Information Retrieval]

Master's Project: Sampling high-dimensional data for visualization

Machine Learning for Machine Learning

Supervision: Stephane Marchand-Maillet

Date of proposal: May. 2022

Applying Machine Learning to data analysis requires the annotation of data for training. Such an interactive annotation requires by definition to easily manipulate interactively the visualization of large volumes of data. Sampling is thus required but needs to be carefully designed so as to still maintain the representativity of the displayed data.

Further interactive tools exploiting data density for assisting the labeling may also be offered to simplify the interaction. Again, one should be careful not to introduce biases at this stage.

Underlying these operations are fundamental processes like outlier characterization or density estimation. This project proposes to study sampling strategies, either based on generative modeling or decimation. An actual scalable implementation of the process should be achieved in C++/Qt and tested over the manual or assisted gating of Flow Cytometry data.

If interested please contact me.

teaching/22-sampling.txt · Last modified: 2022/05/23 10:25 by marchand

Keywords: machine learning, information geometry, data mining, Big Data, affective information retrieval (recherche d'information), information visualisation, content-based image and video retrieval (CBIR, CBR, CBVR, CBMR, CBMIR), information mining, classification, multimedia and multimodal information management, semantic web, knowledge base (RDF, OWL, XML, metadata, auto-annotation, description), multimodal information fusion