Powering AI With Vector Databases: A Benchmark - Part I

- Provide high-end clientele with the best in class online shopping experience to serve the massive paradigm shift into online conversational assistants meeting the expectations set by high-end physical stores;
- Augment customer experience through accurate and reliable AI-enabled multimodal tools by advising and influencing customer shopping journeys leading to increased conversion rates.

Engine Features
Benchmarking VSE: An Analysis
To select VSE candidates, we had to consider several criteria: Diversity of index types, metric types, model serving, open-source community adoption and quality of documentation. We understand that Milvus is the most actively developed engine in the vector databases ecosystem backed up with a rich documentation collection. We were also thrilled by Milvus's diversity of metrics and index types. Besides Weaviate and Vespa, no other engine has a model serving functionality. We don't consider Vespa due to time constraints favouring the most recent ones, allowing graph data models, Weaviate. Qdrant posed several difficulties in our benchmark setup, and Pinecone has a proprietary index type which hinders a reliable comparison.In this regard, the rest of the article will be focused on both Milvus and Weaviate, as they appear to be the VSEs that entirely meet our criteria.
Setup
Without further ado, let us first describe the setup for the experiment we ran for this blog. To evaluate the selected engines, we have used an Azure machine with the following hardware and software conditions:Hardware
CPU:
Intel Xeon E5-2690 v4
RAM: 112 GB
ROM: 1024 Gb HDD
Software
Operative System: Linux 16.04-LTS
Environment: Anaconda 4.8.3 with Python 3.8.12
Dataset
For a question of reproducible research, we have used a public dataset composed of crawled data from startups-list.com. The raw parsed data can be found in this Google Startup Dataset, which comprises 40.474 Startup records, with the following attributes (this is the same dataset as explored in the Qdrant tutorial):- name;
- company description;
- location;
- picture URL.
To stress-test these engines, we have built different scenarios to accommodate varying sizes of the created index. For one, we want to see the impact of an increase to the number of records (referred to as entities from this point onwards), and for another, the effect of an increase to the number of columns (assuming we want to append each entity with more than one encoding). Do note that none of the currently tested versions of the engines provides multiple encodings for a single entity. Thus, the scenarios considering more than one encoding per entity (i.e., a startup object encompassing the information mentioned above) have been adapted: We create a replica of the index, for each different entity representation.
The final list of scenarios is listed below.
Indexation Algorithm
The algorithm used to build an index has implications in the quality of the results, not only for the data quality (accuracy) but also for the system performance (used memory and speed). More information on the different approaches can be found in this Pinecone blog article. An up-to-date ANN benchmark repository can also be found in the famous GitHub repository by Erik Bern, with graphical quality/speed representations for popular public datasets.
Qdrant and Weaviate implement only HNSW natively. Thus, the experiment uses HNSW solely as the de facto indexation algorithm [2]. The configuration parameters for HNSW have also been fixed for all engines:
- building parameters:
- M: Maximum degree of the node = 4
- efConstruction: Take the effect in stage of index construction = 8
- search parameters:
- ef: Take the effect in the stage of search scope, should be larger than the number of results (top_k) = 100
Queries
Following the principles of statistical analysis, we want all scenarios to execute a minimum of 30 times. During the execution of index querying, it is vital to use different queries to avoid the engines employing implicit result caches, which would benefit the querying speed. Thus, we're feeding the following queries sequentially:
- 'Berlin';
- 'Chicago';
- 'Los Angeles';
During 30 runs.
Milvus (1.1.1)
Milvus is an open-source vector database built to manage vectorial data and power embedding search. It originated in October 2019 under an LF AI & Data Foundation graduate project. The latest version is Milvus 2.0.0, which is in steady development, with the release candidate eight having been released just in 5-11-21 (at the time of writing of this technical blog). However, upon trying to setup Milvus, the team has encountered multiple challenges:
- Indexing spikes lead to increases of 2 to 3 times the average time (see this GitHub issue);
- Errors set up scenarios S2 through S9 related to networking problems being hashed out as the Milvus team works through the Milvus 2.0.0 GA.
While this release shows promise regarding the revamped and additional features, the following comparative analysis uses Milvus 1.1.1.
Indexing
In Milvus, users index their data in a collection that encompasses several entities. An entity is an object with a record containing several fields or attributes. For efficient retrieval, a collection will be partitioned, which will hold several segments. More information can be found in the Milvus glossary.
We have developed a script that handles both Collection and Index creation in Milvus. The results for each run are presented below.
Milvus' indexing times appear mostly consistent, but, as the number of entities increases, the system struggles to maintain a constant execution time. This pattern is particularly evident in the scenarios indexing the total number of entities (S3, S6 and S9).

Milvus 1.1.1 Average Indexing Time for Scenarios S1 through S9.
In any case, we see growth in the average indexing time. This behaviour comes as no surprise, given that the indexing time for scenarios S4-S9 sum the indexing time for each different representation’s index. For example, we expected an approximately two times growth from scenario S3 to S6 (S6 has double the representations of S3), and the average execution time shows us as much. The same logic applies to scenarios S7-S9, where representations increase five times from scenarios S1-S3. Another expected behaviour is the increase in execution time as the number of entities increases. This pattern also resurfaces each time the number of representations changes (in scenarios S1, S4 and S7).
Querying
The indexes created during indexation had to be explicitly loaded before querying the system, as we’ve detected a warm-up effect during the first run of each scenario. In Milvus, we don’t search the index explicitly. The pymilvus package readily provides a call method that searches indexes associated with the given collection name. Querying time results for each run are presented below.
Milvus 1.1.1 Querying Time for Scenarios S1 through S9.

Milvus 1.1.1 Average Querying Time for Scenarios S1 through S9.
Weaviate
Weaviate is another option regarding VSE due to its promise within the Vector Search paradigm. As is the case with Milvus, it intends to provide several key features that benefit a distributed and performant system (Horizontal scaling in Fall 2021).
Indexing
When indexing data in Weaviate, the first step is to create a Class object, which handles the objects' schema and additional configuration parameters (such as the index algorithm parameters), expressed as a JSON file. Weaviate provides no explicit information on the necessity to create an index from an object collection. Implicitly, it appears to index all provided vectors using HNSW. Indexing results are shown below.
Weaviate Indexing Time for Scenarios S1 through S9.

As we’ve seen previously, the average indexing time sees an increase, first while increasing the number of entities and later when increasing the number of representations per entity. Here, more than in Milvus, we see that the values for scenarios of increasing representations (S1, S4, S7 and S2, S5, S8 and S3, S6, S9, respectively) follow a linear increase.
Querying
Analogous to Milvus, query searching in Weaviate is carried out by accessing the specified collection with implicit index information. The results are listed in the charts below.
This is the first part of a 2-part article.
You can read about the results analysis and the conclusions here.