Similarity search vs. Reranking

October 28, 2023

“I don’t get it- if I’m taking the top similar vectors from the vector database, then why would I need to re-rank this again?”, asked Jean. “Isn’t that what the vector database just did?”

He had a valid question.

Jean was building a backend for a question answering system. We already had a huge corpus of question and answers. He was trying to fetch answers based on a new question the user would be asking.

The general idea was: you take the embedding of the question, search for top 5 closest answers embeddings in the vector database, and surface them.

But there was a problem: while the most relevant answer would be somewhere in the top 5, it was rarely the top match for our queries.

>>> diagram: expectation vs realituy

Why was the vector db returning not-quite correct answers?

Vector databases are used to fetch similar vectors. Given a query vector, it tries to find vectors that are closest to the query vector. But what does “closest” mean?

Generally, the database computes the dot product or cosine similarity between the query vector and candidate vectors and returns the ones with the highest scores.

Think about this for a second: questions generally have

So lets break down what’s happening here:

  • Jean is trying to fetch answers to the a question.

  • But the database is returning the answers that look like the question.

Although that’s a subtle difference, this is the reason why everything the db returns looks reasonable, but the top answer isn’t the best answer.

If Jean was instead, trying to show “questions similar to your questions”, then the similarity search would have been the only thing he’d need.

But the similarity search does land us in the right vicinity of the correct answer.

It gets us one step closer, rapidly fast. Now since we have only a smaller list of answers to work on, we can run something that works on top of this.

Re-ranking results from the vector database

Since we only have a few candidate answer vectors at this stage, we can do some heavier computation on them and find the best candidate.

Typically, we can use a model that takes in (question, candidate answer) pairs, and return a score (0-1) for each, representing how well the candidate answer answers the question.

Then we sort the candidate answers based on the score. This is all there is to re-ranking.

Remember that when we create embeddings, we only needed 1 input: the string we were trying to embed.

def embed(q:str) -> list[float]:
    ...

Here, we need to focus on both the question and the candidate passage, so a re-ranking model takes in 2 inputs.

def score(q:str, c:str) -> float:
    ...

Now once you run this on all your candidate answers, you can just sort them by score.

def rerank(q:str, cs:list[str]):
    scores = np.array([score(q, c) for c in cs])
    sorted_order = reversed(scores.argsort()) # descending
    sorted_cs = [cs[x] for x in sorted_order]
    return sorted_cs

When to use each

  • Products like this
  • Question similar to your question

When to use reranking

  • Candidate list is already small