At Assembled, our issue resolution engine is designed to assist customer support by suggesting potential answers to support queries. We use Retrieval Augmented Generation (RAG) for much of this pipeline because it's quicker to iterate on than fine-tuning, doesn’t require training on customer data (which many companies prefer), and generally provides high-quality results.
However, we encountered a significant challenge with RAG: relying solely on vector search (even using both dense and sparse vectors) doesn’t always deliver satisfactory results for certain queries. This issue was particularly evident when users entered specific keywords that didn’t accurately match stored knowledge articles.
Customer support teams often have multiple articles on similar topics and lack a tightly curated knowledge base, leading vector search to sometimes return irrelevant results to our RAG engine and reduce response accuracy. Users familiar with traditional keyword searches were puzzled when our system couldn't find the right documents, especially for short queries with prominent but ambiguous keywords.
For example, if a user asked “what features are included in a premium plan?”, vector search would excel at pulling documents about different plans, customer testimonials, or marketing materials. However, vector search would often fail at finding articles specifically targeting premium plans.
To address this issue, we integrated a new keyword search infrastructure that combines its results with vector search for optimal performance. In the above example, keyword search would hone in on “features”, “premium”, and “plan”, and narrow search results to documents specifically matching these keywords. A hybrid approach with both vector and keyword search allows us to effectively return articles with semantic matches while also providing users with the familiar feel of traditional keyword search. Our intuition was based on our experience with other machine learning systems where ensemble models generally outperform single models.
To enable a hybrid store solution, we developed a document store abstraction in our code, allowing us to integrate multiple search algorithms. The abstraction is simple but captures all the essential functionalities of a document store and search system. It handles document management and querying and is agnostic to the actual implementation (vector search, keyword search, etc.). Here’s what it looks like:
With this abstraction, we had the primitives we needed to swap different search systems in and out easily. Uploading a document could be done once and then sync across multiple document stores. Similarly, searching for a document could be done in parallel across multiple document stores using a standardized query.
The interesting part is that our hybrid search store itself implements the DocumentStore
interface. This means that, from the perspective of the caller, it doesn't matter whether they are interacting with a single store or our complex hybrid store — they use the same interface and methods. This design ensures that all of the logic for determining which documents are retrieved is hidden from the caller and can be tested separately. To implement the hybrid store, we passed in multiple child document stores and parallelized the search across all of the child stores.
Enabling multiple document stores introduced technical challenges, especially around ensuring synchronization. Out-of-sync document stores could lead to subtle bugs, such as a document being present in one store but not another. To address this, we implemented the following:
To optimize search performance, we explored several algorithms for merging the results from our different document stores.
Our initial approach involved experimenting with various weighting mechanisms for sparse/dense vectors and keyword search. The goal was to find optimal weightings that leverage the strengths of each search method. However, identifying the correct weightings proved challenging due to the unknown distribution of vector search scores. This variability made it difficult to determine the relative importance of different weightings.
What’s more, empirical data showed that similarity scores (dot product and Euclidean distance) varied widely across our customer base. The differential performance across these metrics made it impractical to develop a universal weighting scheme for combining vector and keyword searches. Tuning these weights on a per-customer basis was not scalable.
Next, we turned to rank fusion algorithms, inspired by literature reviews and their demonstrated effectiveness in search optimization (see [0] and [1]). Rank fusion algorithms, particularly Reciprocal Rank Fusion (RRF), provided a promising alternative. Here’s how most rank fusion algorithms work:
After extensive testing, Reciprocal Rank Fusion (RRF) consistently outperformed many of the more complex methods we evaluated. Several factors contributed to this:
By implementing RRF, we achieved a flexible and scalable method for combining search results. Using RRF, we not only enhanced the accuracy and relevance of search outcomes but also simplified the overall search infrastructure, ensuring a robust solution for our diverse customer set.
Finally, a note on our search engine choices. At Assembled, we use Pinecone for vector search and Algolia for keyword search. After minimal testing with other providers, we concluded that the marginal benefits of further optimization were not significant. Consequently, we decided against hosting our own open-source vector database, such as Milvus, or managing our own keyword search on Elasticsearch.
Using B2B solutions like Pinecone and Algolia offers several advantages:
We’re seeing exciting results, but there’s always a lot more to do. Since implementing this framework, there have been a lot of developments in RAG-based techniques, such as fine tuning of embedding models, applying matrix transformations on vector results, HyDE, etc. If you’re interested in helping us solve these problems, check out our open roles.