teaching:23-knngraph [Viper :: Machine Learning, Data Mining and Information Retrieval]

Master, Bachelor or App Info project

Approximate KNN Graph construction

Supervision: Stephane Marchand-Maillet

Date of proposal: Dec. 2023

The construction of the k-nearest neighbor graph is (implicitly or explicitly) at the foundation of most Data Analysis and Machine Learning algorithms. Retaining only the local interactions between data points creates a sparse information index.

Blunt KNN graph construction takes O(N^2) distance value computations. NN-descent [1,2] and pyNNDescent [3] propose an approximate solution based on the assumption that “the neighbor of my neighbor may be my neighbor”. However, as predicted by the curse of dimensionality, these algorithms show low recall (neighborhood preservation) in high dimensions.

The goal of the work is to implement a different strategy. Since any approximate algorithm will break the streamlining of computation, it will become de facto slower. The implementation must therefore be very careful so as not to be penalized against hardware acceleration. The languages are C, C++, CUDA and Python. If interested please contact me.

[1] Wei Dong, Charikar Moses, and Kai Li. 2011. Efficient k-nearest neighbor graph construction for generic similarity measures. In Proceedings of the 20th international conference on World wide web. 577–586.

[2] Relative NN-Descent Naoki Ono, Yusuke Matsui, Relative NN-Descent: A Fast Index Construction for Graph-Based Approximate Nearest Neighbor Search. ArXiv 2310.20419.

[3] py-NNDescent

teaching/23-knngraph.txt · Last modified: 2023/12/13 11:32 by marchand

Keywords: machine learning, information geometry, data mining, Big Data, affective information retrieval (recherche d'information), information visualisation, content-based image and video retrieval (CBIR, CBR, CBVR, CBMR, CBMIR), information mining, classification, multimedia and multimodal information management, semantic web, knowledge base (RDF, OWL, XML, metadata, auto-annotation, description), multimodal information fusion