Day 6 - Retrieval Augmented Generation - Basic Example

Giving a general purpose LLM a context

Jul 07, 2024

In the previous article we introduced Vector Databases and went through the basic process of loading, splitting, and retrieving data from a well-known Vector Database called Chroma DB. This is one of the prerequisites for Retrieval Augmented Generation (RAG). What is RAG and why is it important is covered in the Day 6 article of my 75 Days Of Generative AI series

Introduction

Retrieval-augmented generation (RAG) is a technique in natural language processing that seamlessly combines the power of information retrieval with advanced text generation. This innovative approach addresses one of the key challenges in AI-driven language models: generating accurate, context-aware responses based on vast amounts of information. By leveraging RAG, AI systems can dynamically access and utilize relevant information from large datasets, significantly enhancing their ability to provide informed, up-to-date, and contextually appropriate responses.

At its core, RAG operates on a two-step process. First, it employs sophisticated retrieval mechanisms to identify and extract pertinent information from a corpus of documents or knowledge bases. This step ensures that the most relevant context is available for the subsequent generation phase. Second, it utilizes this retrieved information to guide and enrich the text generation process, resulting in outputs that are relevant facts.

Setting up the environment

We will be enhancing our example of loading data from a transcript of a Machine learning lecture into a Vector DB that was published in yesterday’s article.

Loading the data

Here are the steps repeated for loading the data into a ChromaDB

!pip install langchain  langchain_community
!pip install -q pypdf # library for reading PDF
!pip install fastembed # embedding library from Qdrant. Its small and fast!
!pip install chromadb

from langchain.document_loaders import PyPDFLoader

docs = []
loader = PyPDFLoader("path_to_file/MachineLearning-Lecture01.pdf")
docs.extend(loader.load())

from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1500,
    chunk_overlap = 150
)
splits = text_splitter.split_documents(docs)

from langchain_community.embeddings.fastembed import FastEmbedEmbeddings
# using smallest embedding model so that it fits in our free GPU and RAM instance
embed_model = FastEmbedEmbeddings(model_name="BAAI/bge-small-en-v1.5")

from langchain.vectorstores import Chroma
persist_directory = 'docs/chroma/'

vectordb = Chroma.from_documents(
    documents=splits,
    embedding=embed_model,
    persist_directory=persist_directory
)

Now, the next step is to combine our contextual data from ChromaDB with our LLM ( refer to this article for basics of running an LLM using LangChain )

We will be using llama3 LLM from Meta which is an open-source LLM

Connect to HuggingFaceHub

from langchain_community.llms import HuggingFaceHub
llm = HuggingFaceHub(repo_id='meta-llama/Meta-Llama-3-8B-Instruct', huggingfacehub_api_token='your huggingface token')

Now we use the power of LangChaing to create a Q&A chain to combine LLM with contextual data. There are multiple types of chains which we will be covering in upcoming articles!

from langchain.chains import RetrievalQA
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectordb.as_retriever(search_kwargs={"k": 1}),
    return_source_documents=False
)

# Function to ask questions
def ask_question(question):
    result = qa_chain({"query": question})
    answer = result['result']
    return answer

# Example usage
questions = [
    "What is the main topic of this lecture?"
]

Now we trigger our model to ask the question

for question in questions:
    print(f"\nQuestion: {question}")
    answer = ask_question(question)
    print(f"Answer: {answer}")

Here is the output that we get

Question: What is the main topic of this lecture?
Helpful Answer: The main topic of this lecture is the introduction to the machine learning class, including the instructor's background and the introduction of the teaching assistants. The lecture does not yet dive into the actual topic of machine learning.

Magical isn’t it? Not only does the answer consist of the topic but goes into a bit of detail too the fact that it doesn’t yet deep dive into the actual topic!

This is a very basic example of how powerful the RAG technique can be in its basic form. Soon we’ll dive deeper into advanced techniques and also build some real-world apps using them. So stay tuned!

If you like this series, subscribe to my newsletter and follow me on LinkedIn!