Day 17 - Going beyond Basic RAG : Part 2

Rerank and Query Transformation

Aug 31, 2024

Welcome back to my 75 Days of Generative AI series. In our previous article, we explored advanced RAG techniques like Context Enrichment Window and Fusion Retrieval. Today, we'll take it to the next level by diving into rerank and query transformation techniques that will further refine our LLMs' performance. These techniques are crucial in building more sophisticated AI models that can understand and respond to complex queries. Let's get started!

Technique 3: Query Transformation

RAG systems often struggle to retrieve the most relevant information, especially when dealing with complex or ambiguous queries. These query transformation techniques address this issue by reformulating queries to match relevant documents better or to retrieve more comprehensive information.

Three Techniques to Improve Retrieval

1. Query Rewriting: Reformulates queries to be more specific and detailed, improving the likelihood of retrieving relevant information.

2. Step-back Prompting: Generates broader queries for better context retrieval, allowing for the retrieval of relevant background information.

3. Sub-query Decomposition: breaks down complex queries into simpler sub-queries, enabling the retrieval of information covering different aspects of a complex query.

How do they work?

All three techniques use an LLM model with custom prompt templates to guide the model in generating appropriate transformations.

Example Use Case

Query Rewriting

import os
from langchain.prompts import PromptTemplate
from langchain_groq import ChatGroq

os.environ["GROQ_API_KEY"] = "api_key"
re_write_llm = ChatGroq(temperature=0, model_name="llama-3.1-70b-versatile")

query_rewrite_template = """You are an AI assistant tasked with reformulating user queries to improve retrieval in a RAG system. 
Given the original query, rewrite it to be more specific, detailed, and likely to retrieve relevant information.

Original query: {original_query}

Rewritten query:"""

query_rewrite_prompt = PromptTemplate(
    input_variables=["original_query"],
    template=query_rewrite_template
)

query_rewriter = query_rewrite_prompt | re_write_llm

def rewrite_query(original_query):
    response = query_rewriter.invoke(original_query)
    return response.content

# example query over the understanding climate change dataset
original_query = "What is the impact of Gen AI on content generation?"
rewritten_query = rewrite_query(original_query)
print("Original query:", original_query)
print("\nRewritten query:", rewritten_query)

Original query: What is the impact of Gen AI on content generation?

Rewritten query: What are the current and potential effects of Generative Artificial Intelligence (Gen AI) on the quality, efficiency, and authenticity of content generation across various industries, including media, marketing, and education?

Step-back Prompting

step_back_llm = ChatGroq(temperature=0, model_name="llama-3.1-70b-versatile")


# Create a prompt template for step-back prompting
step_back_template = """You are an AI assistant tasked with generating broader, more general queries to improve context retrieval in a RAG system.
Given the original query, generate a step-back query that is more general and can help retrieve relevant background information.

Original query: {original_query}

Step-back query:"""

step_back_prompt = PromptTemplate(
    input_variables=["original_query"],
    template=step_back_template
)

# Create an LLMChain for step-back prompting
step_back_chain = step_back_prompt | step_back_llm

def generate_step_back_query(original_query):
    """
    Generate a step-back query to retrieve broader context.
    
    Args:
    original_query (str): The original user query
    
    Returns:
    str: The step-back query
    """
    response = step_back_chain.invoke(original_query)
    return response.content

# example query over the understanding climate change dataset
original_query = "What is the impact of Gen AI on content generation?"
step_back_query = generate_step_back_query(original_query)
print("Original query:", original_query)
print("\nStep-back query:", step_back_query)

Original query: What is the impact of Gen AI on content generation?

Step-back query: What is the role of artificial intelligence in content creation?

Sub-query Decomposition

sub_query_llm = ChatGroq(temperature=0, model_name="llama-3.1-70b-versatile")

# Create a prompt template for sub-query decomposition
subquery_decomposition_template = """You are an AI assistant tasked with breaking down complex queries into simpler sub-queries for a RAG system.
Given the original query, decompose it into 2-4 simpler sub-queries that, when answered together, would provide a comprehensive response to the original query.

Original query: {original_query}

example: What are the impacts of climate change on the environment?

Sub-queries:
1. What are the impacts of climate change on biodiversity?
2. How does climate change affect the oceans?
3. What are the effects of climate change on agriculture?
4. What are the impacts of climate change on human health?"""


subquery_decomposition_prompt = PromptTemplate(
    input_variables=["original_query"],
    template=subquery_decomposition_template
)

# Create an LLMChain for sub-query decomposition
subquery_decomposer_chain = subquery_decomposition_prompt | sub_query_llm

def decompose_query(original_query: str):
    """
    Decompose the original query into simpler sub-queries.
    
    Args:
    original_query (str): The original complex query
    
    Returns:
    List[str]: A list of simpler sub-queries
    """
    response = subquery_decomposer_chain.invoke(original_query).content
    sub_queries = [q.strip() for q in response.split('\n') if q.strip() and not q.strip().startswith('Sub-queries:')]
    return sub_queries

original_query = "What is the impact of Gen AI on content generation?"
sub_queries = decompose_query(original_query)
print("\nSub-queries:")
for i, sub_query in enumerate(sub_queries, 1):
    print(sub_query)

Sub-queries: To break down the original query into simpler sub-queries, I'll consider the various aspects of content generation and the potential impacts of Gen AI on these areas.

Here are 3-4 sub-queries that can help provide a comprehensive response to the original query:

Original query: What is the impact of Gen AI on content generation?

1. How does Gen AI affect the quality and authenticity of generated content? (This sub-query explores the potential benefits and drawbacks of Gen AI-generated content, including issues related to accuracy, bias, and trustworthiness.)

2. What are the implications of Gen AI on the role of human content creators and writers? (This sub-query examines the potential impact of Gen AI on the job market, the skills required for content creation, and the relationship between human creators and AI-generated content.)

3. How does Gen AI change the way content is consumed and interacted with by audiences? (This sub-query investigates the potential effects of Gen AI on user experience, engagement, and the way people interact with AI-generated content, including issues related to personalization and recommendation algorithms.)

4. What are the potential risks and challenges associated with the widespread adoption of Gen AI in content generation? (This sub-query delves into the potential risks and challenges of relying on Gen AI for content generation, including issues related to misinformation, intellectual property, and regulatory frameworks.)

By answering these sub-queries, we can gain a deeper understanding of the impact of Gen AI on content generation and the various aspects of this complex topic.

Technique 4: Reranking

Reranking is all about reassessing and reordering initially retrieved documents to ensure that the most pertinent information is prioritized for subsequent processing or presentation. But why do we need reranking in the first place?

The primary motivation for reranking in RAG systems is to overcome the limitations of initial retrieval methods, which often rely on simpler similarity metrics. Reranking allows for more sophisticated relevance assessment, taking into account nuanced relationships between queries and documents that might be missed by traditional retrieval techniques.

Key components of a reranking system

1. Initial Retriever: Often a vector store using embedding-based similarity search.

2. Reranking Model: This can be either a Large Language Model (LLM) for scoring relevance or a Cross-Encoder model specifically trained for relevance assessment.

3. Scoring Mechanism: A method to assign relevance scores to documents.

4. Sorting and Selection Logic: To reorder documents based on new scores.

Reranking process

1. Initial Retrieval: Fetch an initial set of potentially relevant documents.

2. Pair Creation: Form query-document pairs for each retrieved document.

3. Scoring: Use either LLM or Cross-Encoder method to score document relevance.

4. Score Interpretation: Parse and normalize the relevance scores.

5. Reordering: Sort documents based on their new relevance scores.

6. Selection: Choose the top K documents from the reordered list.

The choice between LLM-based and Cross-Encoder reranking methods depends on factors such as required accuracy, available computational resources, and specific application needs. Both approaches offer substantial improvements over basic retrieval methods and contribute to the overall effectiveness of RAG systems.

Langchain provides two well-known implementations of reranking which can be found here and here

Conclusion

That's it for today's article on rerank and query transformation techniques. These advanced RAG techniques will take your LLMs to the next level. Stay tuned for more insights on Generative AI. Follow me on LinkedIn for daily updates on my 75 Days of Generative AI series.

Varun’s Newsletter

Discussion about this post