Day 16 - Going beyond Basic RAG : Part 1

Diving deeper into advanced RAG techniques

Aug 30, 2024

Welcome back to my 75 Days of Generative AI series. We've come a long way since we started exploring the basics of large language models (LLMs) and Retrieval-Augmented Generation (RAG). In our previous articles, we learned how to run a basic LLM and built an end-to-end LLMTwin app that can write like us. Today, we will take our RAG skills to the next level by diving into some advanced techniques that will make our LLMs even more powerful and accurate.

Advanced RAG Techniques

In this article, we'll explore two advanced RAG techniques that will help us improve the performance of our LLMs. More will be discussed in upcoming articles. These techniques are crucial in building more sophisticated AI models that can understand and respond to complex queries.

Technique 1: Context Enrichment Window

Overview

This technique improves your ability to find the right data from vector db to pass on as context to your LLM. Instead of just finding small, separate pieces of text, it also includes the surrounding text to give more context and make the information easier to understand.

Why is it useful?

Traditional vector search often returns isolated chunks of text, which may lack the necessary context for full understanding. This approach aims to provide a more comprehensive view of the retrieved information by including neighboring text chunks.

Example

This is one of the most basic techniques used when you build a retriever in Langchain. It’s something we’ve already covered in our previous article about a basic RAG. For someone who wants a refresher, the code is below

from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

docs = []
loader = PyPDFLoader("/path/to/file")
docs.extend(loader.load())

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1000,
    chunk_overlap = 200
)
splits = text_splitter.split_documents(docs)

The output of the first 2 chunks is as follows for my document is as follows

Instructor (Andrew Ng): Okay. Good morning. Welcome to CS229, the machine
learning class. So what I wanna do today is just spend a little time going over the logistics
of the class, and then we'll start to talk a bit about machine learning.
By way of introduction, my name's Andrew Ng and I'll be instructor for this class. And so
I personally work in machine learning, and I' ve worked on it for about 15 years now, and
I actually think that machine learning is th e most exciting field of all the computer
sciences. So I'm actually always excited about teaching this class. Sometimes I actually
think that machine learning is not only the most exciting thin g in computer science, but
the most exciting thing in all of human e ndeavor, so maybe a little bias there.
I also want to introduce the TAs, who are all graduate students doing research in or
related to the machine learni ng and all aspects of machin e learning. Paul Baumstarck

I also want to introduce the TAs, who are all graduate students doing research in or
related to the machine learni ng and all aspects of machin e learning. Paul Baumstarck
works in machine learning and computer vision. Catie Chang is actually a neuroscientist
who applies machine learning algorithms to try to understand the human brain. Tom Do
is another PhD student, works in computa tional biology and in sort of the basic
fundamentals of human learning. Zico Kolter is the head TA — he's head TA two years
in a row now — works in machine learning a nd applies them to a bunch of robots. And
Daniel Ramage is — I guess he's not here — Daniel applies l earning algorithms to
problems in natural language processing.
So you'll get to know the TAs and me much be tter throughout this quarter, but just from
the sorts of things the TA's do, I hope you can already tell that machine learning is

As you can see there is an overlap of 200 characters between the 2 chunks of data retrieved. It’s a simple yet effective technique.

Technique 2: Fusion Retrieval

Overview

This system integrates two powerful document retrieval techniques: vector-based similarity search and keyword-based BM25 retrieval. By leveraging the strengths of both approaches, the Fusion Retrieval system aims to enhance the quality and relevance of the retrieved documents.

Why is it useful?

Traditional retrieval methods often rely on either semantic understanding through vector-based representations or keyword matching using algorithms like BM25. Each approach has its strengths and weaknesses, leading to limitations in handling diverse query types effectively.

What is BM25?

BM25 is an algorithm that ranks documents by their relevance to a search query. It considers how frequently the query terms appear in each document, as well as the length of the documents. Documents with query terms appearing more frequently, especially in shorter documents, are considered more relevant and given a higher ranking score. You can think of it as an extension of the TFIDF algorithm commonly used in search data stores. Details of ranking functions are out of the scope of this article.

Ensemble Retrievers in Langchain

The EnsembleRetriever takes a list of retrievers as input and ensemble the results of their get_relevant_documents() methods and reranks the results based on the Reciprocal Rank Fusion algorithm. It provides the ability to combine the capabilities of multiple types of retrievers.

Code

from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

def split_docs(chunk_size=1000,chunk_overlap=200):
    docs = []
    loader = PyPDFLoader("/path/to/file")
    docs.extend(loader.load())
    text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1000,
    chunk_overlap = 200 )
    splits = text_splitter.split_documents(docs)
    return splits

from langchain_community.embeddings.fastembed import FastEmbedEmbeddings
from langchain.vectorstores import Chroma

persist_directory = 'docs/chroma_fusion'
def get_vector_store(splits):
    embed_model = FastEmbedEmbeddings(model_name="BAAI/bge-small-en-v1.5") 
    vectorstore = Chroma.from_documents(
    documents=splits,
    embedding=embed_model,
    persist_directory=persist_directory)
    return vectorstore

splits = split_docs()
vector_store = get_vector_store(splits)
vector_retriever = vector_store.as_retriever(search_kwargs={"k": 2})

from langchain_community.retrievers import BM25Retriever

def create_BM25_index(docs):
    bm25_retriever = BM25Retriever.from_documents(documents=docs)
    bm25_retriever.k = 2
    return bm25_retriever
bm25_retriever = create_BM25_index(splits)

from langchain.retrievers import EnsembleRetriever

ensemble_retriever = EnsembleRetriever(
    retrievers=[bm25_retriever, vector_retriever], weights=[0.5, 0.5]
)

docs = ensemble_retriever.invoke("instructor")

Conclusion

Congratulations on taking one step closer to getting better at RAG techniques. The two techniques defined today are simpler ones and we’ll be covering more in the coming days.

If you are new to this series then check out an index of my previous articles in the series.

Also, follow or connect with me on my LinkedIn here: https://www.linkedin.com/in/varunbhanot/

Varun’s Newsletter

Discussion about this post