In the previous article we introduced Vector Databases and went through the basic process of loading, splitting, and retrieving data from a well-known Vector Database called Chroma DB. This is one of the prerequisites for Retrieval Augmented Generation (RAG). What is RAG and why is it important is covered in the Day 6 article of my 75 Days Of Generative AI series
Introduction
Retrieval-augmented generation (RAG) is a technique in natural language processing that seamlessly combines the power of information retrieval with advanced text generation. This innovative approach addresses one of the key challenges in AI-driven language models: generating accurate, context-aware responses based on vast amounts of information. By leveraging RAG, AI systems can dynamically access and utilize relevant information from large datasets, significantly enhancing their ability to provide informed, up-to-date, and contextually appropriate responses.
At its core, RAG operates on a two-step process. First, it employs sophisticated retrieval mechanisms to identify and extract pertinent information from a corpus of documents or knowledge bases. This step ensures that the most relevant context is available for the subsequent generation phase. Second, it utilizes this retrieved information to guide and enrich the text generation process, resulting in outputs that are relevant facts.
Setting up the environment
We will be enhancing our example of loading data from a transcript of a Machine learning lecture into a Vector DB that was published in yesterday’s article.
Loading the data
Here are the steps repeated for loading the data into a ChromaDB
!pip install langchain langchain_community
!pip install -q pypdf # library for reading PDF
!pip install fastembed # embedding library from Qdrant. Its small and fast!
!pip install chromadb
from langchain.document_loaders import PyPDFLoader
docs = []
loader = PyPDFLoader("path_to_file/MachineLearning-Lecture01.pdf")
docs.extend(loader.load())
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
chunk_size = 1500,
chunk_overlap = 150
)
splits = text_splitter.split_documents(docs)
from langchain_community.embeddings.fastembed import FastEmbedEmbeddings
# using smallest embedding model so that it fits in our free GPU and RAM instance
embed_model = FastEmbedEmbeddings(model_name="BAAI/bge-small-en-v1.5")
from langchain.vectorstores import Chroma
persist_directory = 'docs/chroma/'
vectordb = Chroma.from_documents(
documents=splits,
embedding=embed_model,
persist_directory=persist_directory
)
Now, the next step is to combine our contextual data from ChromaDB with our LLM ( refer to this article for basics of running an LLM using LangChain )
We will be using llama3 LLM from Meta which is an open-source LLM
Connect to HuggingFaceHub
from langchain_community.llms import HuggingFaceHub
llm = HuggingFaceHub(repo_id='meta-llama/Meta-Llama-3-8B-Instruct', huggingfacehub_api_token='your huggingface token')
Now we use the power of LangChaing to create a Q&A chain to combine LLM with contextual data. There are multiple types of chains which we will be covering in upcoming articles!
from langchain.chains import RetrievalQA
qa_chain = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=vectordb.as_retriever(search_kwargs={"k": 1}),
return_source_documents=False
)
# Function to ask questions
def ask_question(question):
result = qa_chain({"query": question})
answer = result['result']
return answer
# Example usage
questions = [
"What is the main topic of this lecture?"
]
Now we trigger our model to ask the question
for question in questions:
print(f"\nQuestion: {question}")
answer = ask_question(question)
print(f"Answer: {answer}")
Here is the output that we get
Question: What is the main topic of this lecture?
Helpful Answer: The main topic of this lecture is the introduction to the machine learning class, including the instructor's background and the introduction of the teaching assistants. The lecture does not yet dive into the actual topic of machine learning.
Magical isn’t it? Not only does the answer consist of the topic but goes into a bit of detail too the fact that it doesn’t yet deep dive into the actual topic!
This is a very basic example of how powerful the RAG technique can be in its basic form. Soon we’ll dive deeper into advanced techniques and also build some real-world apps using them. So stay tuned!
If you like this series, subscribe to my newsletter and follow me on LinkedIn!
Hey Varun,
I tried following the steps but I am getting error for
embed_model = FastEmbedEmbeddings(model_name="BAAI/bge-small-en-v1.5")
Can I check which version of langchain and fastembed being used?
"Error: 1 validation error for FastEmbedEmbeddings
_model
extra fields not permitted (type=value_error.extra)"