Dept of Computer Science
Centre Universitaire d'Informatique (CUI)
Université de Genève
Supervision: Stephane Marchand-Maillet
Date of proposal: May. 2022
Applying Machine Learning to data analysis requires the annotation of data for training. Such an interactive annotation requires by definition to easily manipulate interactively the visualization of large volumes of data. Sampling is thus required but needs to be carefully designed so as to still maintain the representativity of the displayed data.
Further interactive tools exploiting data density for assisting the labeling may also be offered to simplify the interaction. Again, one should be careful not to introduce biases at this stage.
Underlying these operations are fundamental processes like outlier characterization or density estimation. This project proposes to study sampling strategies, either based on generative modeling or decimation. An actual scalable implementation of the process should be achieved in C++/Qt and tested over the manual or assisted gating of Flow Cytometry data.
If interested please contact me.