Supervision: Stephane Marchand-Maillet
Date of proposal: Dec. 2023
The construction of the k-nearest neighbor graph is (implicitly or explicitly) at the foundation of most Data Analysis and Machine Learning algorithms. Retaining only the local interactions between data points creates a sparse information index.
Blunt KNN graph construction takes O(N^2) distance value computations. NN-descent [1,2] and pyNNDescent [3] propose an approximate solution based on the assumption that “the neighbor of my neighbor may be my neighbor”. However, as predicted by the curse of dimensionality, these algorithms show low recall (neighborhood preservation) in high dimensions.
The goal of the work is to implement a different strategy. Since any approximate algorithm will break the streamlining of computation, it will become de facto slower. The implementation must therefore be very careful so as not to be penalized against hardware acceleration. The languages are C, C++, CUDA and Python. If interested please contact me.
[1] Wei Dong, Charikar Moses, and Kai Li. 2011. Efficient k-nearest neighbor graph construction for generic similarity measures. In Proceedings of the 20th international conference on World wide web. 577–586.
[2] Relative NN-Descent Naoki Ono, Yusuke Matsui, Relative NN-Descent: A Fast Index Construction for Graph-Based Approximate Nearest Neighbor Search. ArXiv 2310.20419.
[3] py-NNDescent