Master's Project: Learning Large-scale Indexes

Machine Learning for Machine Learning

Supervision: Stephane Marchand-Maillet

Date of proposal: Mar. 2022

Machine Learning operations are typically data-greedy. Finding the k nearest neighbors from a given data is underlying any learning operation and should therefore be made extremely efficient. This is the role of indexes. However, when it comes to datasets with billions of items and high dimensionality, indexing and search are costly operations, which cannot afford tree or graph structures. Recent proposals use distance-sensitive Bloom filters to implement group testing operations [1].

This project proposes to explore the capabilities of (Deep) Neural Nets (as Universal Approximators and in the line of Learning-to-Hash [2]) to construct geometric Bloom filters and also to learn group splits to maximize group testing.

If interested please contact me.

[1] J. Engels et al. Practical Near Neighbor Search via Group Testing. NeurIPS, 2021.

[2] Learning to Hash

teaching/22-bloom.txt · Last modified: 2022/02/25 14:37 by marchand

Keywords: machine learning, information geometry, data mining, Big Data, affective information retrieval (recherche d'information), information visualisation, content-based image and video retrieval (CBIR, CBR, CBVR, CBMR, CBMIR), information mining, classification, multimedia and multimodal information management, semantic web, knowledge base (RDF, OWL, XML, metadata, auto-annotation, description), multimodal information fusion