Accelerating Vector Search: Using GPU-Powered Indexes with RAPIDS RAFT

by **Chromium6** Thu Oct 12, 2023 9:18 pm

https://developer.nvidia.com/blog/accelerating-vector-search-using-gpu-powered-indexes-with-rapids-raft/

more at link:
Technical Blog

Data Center / Cloud

Accelerating Vector Search: Using GPU-Powered Indexes with RAPIDS RAFT

Sep 11, 2023
By Mickael Ide and Corey Nolet

In the AI landscape of 2023, vector search is one of the hottest topics due to its applications in large language models (LLM) and generative AI. Semantic vector search enables a broad range of important tasks like detecting fraudulent transactions, recommending products to users, using contextual information to augment full-text searches, and finding actors that pose potential security risks.

Data volumes continue to soar and traditional methods for comparing items one by one have become computationally infeasible. Vector search methods use approximate lookups, which are more scalable and can handle massive amounts of data more efficiently. As we show in this post, accelerating vector search on the GPU provides not only faster search times, but the index building times can also be substantially faster.

This post provides:

An introduction to vector search with a brief review of popular applications
An overview of the RAFT library for accelerating vector search on the GPU
Performance comparison of GPU-accelerated vectors search indexes against the state-of-the-art on the CPU
The second post in this series dives deeper into each of the GPU-accelerated indexes mentioned in this post and gives a brief explanation of how the algorithms work, along with a summary of important parameters to fine-tune their behavior. For more information, see Accelerating Vector Search: Fine-Tuning GPU Index Algorithms.

The third post in this series focuses on IVF-Flat, an ANN algorithm found in RAPIDS RAFT. We discuss how the algorithm works, and demonstrate the usage of both the Python and C++ APIs in RAFT. We cover setting parameters for index building and give tips on how to configure GPU-accelerated IVF-Flat search. For more information, see Accelerated Vector Search: Approximating with RAPIDS RAFT IVF-Flat.

What is vector search?

Diagram shows a list of vectors that may have been encoded from sources like images, documents, or videos and a query vector for which you would like to find the closest vectors from the list.

Figure 1. Vector search process
Figure 1 shows that vector search entails creating an index of vectors and performing lookups to find some number of vectors in the index that are closest to a query vector. The vectors could be as small as three-dimensional points from a lidar point cloud or larger embeddings from text documents, images, or videos.

Vector search is the process of querying a database to find the most similar vectors. This similarity search is done on numerical vectors that can represent any type of object (Figure 2). These vectors are often embeddings created from multimedia like images, video, and text fragments or entire documents that went through a deep learning model to encode their semantic characteristics into a vector form.

Embedding vectors typically have the advantage of being a smaller object than the original document (lower dimensionality), while maintaining as much information about the source as possible. Therefore, two documents that are similar often have similar embeddings.

Image of a 3D point cloud such as one created from LIDAR.
Figure 2. Vectors represent data points in higher dimensions
The points in Figure 2 are 3D but they could be 500 dimensions or even higher.

This makes it easier to compare objects, as the embedding vectors are smaller and retain most of the information. When two documents share similar characteristics, their embedding vectors are often spatially close, or similar.

Approximate methods for vector search
To handle larger datasets efficiently, approximate nearest neighbor (ANN) methods are often used for vector search. ANN methods speed up the search by approximating the closest vectors. This avoids the exhaustive distance computation often required by an exact brute-force approach, which requires comparing the query against every single vector in the database.

In addition to the search compute cost, storing many vectors can also consume a large amount of memory. To ensure both fast searches and low memory usage, you must index vectors in an efficient way. As we outline a bit later, this can sometimes benefit from compression. A vector index is a space-efficient data structure built on mathematical models that is used for efficiently querying several vectors at a time.

Updating the indexes, such as from inserting and deleting vectors, can cause problems when indexes take hours or even days to build. It turns out that these indexes can often be built much faster on the GPU. We showcase this performance later in the post.