Advanced Use cases

Use Case 1: Building a Question-Answering System with Existing Collections using LangChain

In this example, we'll demonstrate how to build a question-answering system that uses NeoAthena for document retrieval and LangChain for orchestration. This application will:

  • Take a user question as input

  • Retrieve relevant documents from a NeoAthena collection

  • Process those documents into appropriate context

  • Generate a comprehensive answer using an LLM

Step 1: Installation and Setup

First, install the required packages:

pip install langchain langchain-openai langchain-core neoathena

Step 2: Import Dependencies

from langchain_openai import ChatOpenAI
from neoathena import NeoAthenaClient
from langchain import hub 
from typing import List
from langchain_core.documents import Document

This imports:

  • ChatOpenAI: LangChain's wrapper for OpenAI's chat models

  • NeoAthenaClient: The client library to interact with NeoAthena's API

  • hub: LangChain's prompt hub for accessing community-built prompts

Step 3: Configure API Keys

OPENAI_API_KEY = "your-openai-api-key"
llm = ChatOpenAI(model='gpt-4')
client = NeoAthenaClient(api_key="your-neoathena-api-key")

Replace the placeholder API keys with your actual keys. This step establishes connections to:

  • OpenAI's API for the language model

  • NeoAthena's API for document retrieval

Step 4: Define the Retrieval Function

# Note: This example uses a collection that contains documents about climate change
def retrieve_docs(question: str):   
    try:
        results = client.retrieve_from_collection(
            collection_name="your-collection-name",
            query=question,
            top_k=4
        )
        return results
    except Exception as e:
        print(f"Retrieval failed: {e}")

This function:

  • Takes a user question as input

  • Queries NeoAthena's API to find relevant documents

  • Returns the top 4 most relevant documents (adjust as needed)

  • Includes error handling for reliability

Step 5: Format Retrieved Documents

def documents_to_string(documents: List[Document]) -> str:
    """Joins the page_content of all retrieved Document objects into a single string."""
    return "\n\n".join(doc.page_content for doc in documents)

This helper function converts the retrieved documents into a format suitable for the LLM prompt, combining all document contents with clear separation.

Step 6: Create the LangChain Pipeline

# Pull a RAG prompt template from LangChain's hub
prompt = hub.pull("rlm/rag-prompt")

# Build the chain
chain = (
    {   
        "context": lambda x: documents_to_string(retrieve_docs(x["question"])), 
        "question": lambda x: x["question"]
    }
    | prompt
    | llm
)

This step:

  • Uses LangChain's hub to access a pre-built RAG prompt template

  • Creates a processing chain that:

    • Takes a question

    • Retrieves documents from NeoAthena

    • Formats them as context

    • Combines them with the prompt template

    • Sends everything to the LLM for response generation

Step 7: Run the Question-Answering System

# Execute the chain with a question
response = chain.invoke({"question": "What is climate change?"})

# Print the response
print(response.content)

The system outputs a comprehensive answer that combines information retrieved from your NeoAthena collection with the reasoning capabilities of the LLM:

Response:

Climate change refers to significant alterations in global climates, typically characterized by a rise in global temperatures. This phenomenon is driven by human activities such as burning fossil fuels, deforestation, and industrial emissions, which release greenhouse gases into the atmosphere, trapping heat. The effects of climate change include extreme weather events, rising sea levels, disruptions to ecosystems, threats to biodiversity, and substantial economic impacts.

Key Benefits of This Integration

  • Simplicity: Just a few lines of code to create a powerful RAG system

  • Flexibility: Easily change LLMs, prompt templates, or retrieval parameters

  • Scalability: Works with collections of any size in NeoAthena

  • Accuracy: Combines NeoAthena's precise retrieval with LLM reasoning

Use Case 2: Building a Document Summarization Bot with LangGraph & NeoAthena

Step 1: Installation and Setup

Install the required packages:

pip install langchain langchain-core langchain-openai langgraph neoathena

Step 2: Import Dependencies

Imports all the modules for the application.

from langchain import hub
from langchain_core.documents import Document
from neoathena import NeoAthenaClient
from langgraph.graph import START, StateGraph
from typing_extensions import List, TypedDict
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI

Step 3: Configure API Keys

OPENAI_API_KEY="your-openai-api-key"
llm = ChatOpenAI(model='gpt-4')
client = NeoAthenaClient(api_key="your-neoathena-api-key")

Replace the placeholder API keys with your actual keys.

Step 4: Define Summarization Prompt

prompt = ChatPromptTemplate.from_messages(
    [("system", "Write a concise summary of the following:\\n\\n{context}")]
)

This creates a simple instruction that tells the LLM to summarize whatever content is passed to it.

Step 5: Create State Structure

class State(TypedDict):
    question: str
    context: List[Document]
    answer: str

This defines what information the system needs to track during processing: the user's question, the retrieved documents, and the final answer.

Step 6: Implement Retrieval Function

def retrieve(state: State):   
    try:
        retrieved_docs = client.retrieve_from_collection(
            collection_name="your-collection-name",
            query=state["question"],
            top_k=4
        )
        return {"context": retrieved_docs}
    except Exception as e:
        print(f"Retrieval failed: {e}")

This function searches for relevant documents in the NeoAthena collection based on the user's question, retrieving the top 4 matches.

Step 7: Implement Generation Function

def generate(state: State):
    docs_content = "\n\n".join(doc.page_content for doc in state["context"])
    messages = prompt.invoke({"question": state["question"], "context": docs_content})
    response = llm.invoke(messages)
    return {"answer": response.content}

This function combines all the retrieved documents, sends them to the AI model along with our prompt, and gets back a concise summary.

Step 8: Build the LangGraph Workflow

graph_builder = StateGraph(State).add_sequence([retrieve, generate])
graph_builder.add_edge(START, "retrieve")
graph = graph_builder.compile()

Connect the functions into a workflow: first retrieve relevant documents, then generate a summary from them.

Step 9: Execute and Test

response = graph.invoke({"question": "What is a climate change?"})
print(response["answer"])

Response:

Climate change, primarily caused by human activities such as burning fossil fuels, deforestation, and industrial emissions, is leading to rising temperatures, extreme weather events, and disruptions to ecosystems. This is putting biodiversity and human lives at risk, particularly in vulnerable communities. The economic impact on agriculture, tourism, and infrastructure is severe. If no immediate action is taken to reduce emissions, the situation will worsen. Solutions include investing in renewable energy, promoting energy efficiency, protecting forests, and advancing public awareness and education. Immediate global action and cooperation can combat climate change and ensure a sustainable future.

The summarization bot generates a concise overview by extracting key information from documents in your NeoAthena collection and using the LLM to synthesize this content into a coherent summary that captures the essential points of the original material.

Next Steps

By combining NeoAthena's effortless document management and retrieval capabilities with LangChain and LangGraph's flexible orchestration, you can quickly build sophisticated AI applications that leverage your organization's knowledge. Whether you need question-answering systems or document summarization tools, this integration provides a powerful foundation for creating intelligent document processing solutions tailored to your specific needs.

Last updated

Was this helpful?