Coding implementation for constructing dialog analysis assistants with FAISS, LANGCHAIN, PYPDF, and TINYLLAMA-1.1B-CHAT-V1.0

by root March 23, 2025

written by root March 23, 2025 0 comment 149 views

RAG-driven dialog analysis assistants deal with the constraints of conventional language fashions by combining them with data retrieval techniques. The system searches a particular data base, retrieves related data, and talks with acceptable citations. This strategy reduces hallucinations, processes domain-specific data, and processes the premise responses of searched texts. On this tutorial, we reveal constructing such an assistant utilizing the open supply mannequin Tinyllama-1.1b-chat-v1.0 to reply questions on scientific papers by embracing faces and utilizing the open supply mannequin Tinyllama-1.1b-chat-v1.0 from the run-chain framework.

First, set up the required libraries.

!pip set up langchain-community langchain pypdf sentence-transformers faiss-cpu transformers speed up einops

Subsequent, import the libraries you want.

import os
import torch
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain.chains import ConversationalRetrievalChain
from langchain_community.llms import HuggingFacePipeline
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import pandas as pd 
from IPython.show import show, Markdown

Mount the drive and save the paper in additional steps.

from google.colab import drive
drive.mount('/content material/drive')
print("Google Drive mounted")

The data base makes use of PDF paperwork of scientific papers. Let’s create a operate that masses and processes these paperwork.

def load_documents(pdf_folder_path):
    paperwork = []


    if not pdf_folder_path:
        print("Downloading a pattern paper...")
        !wget -q https://arxiv.org/pdf/1706.03762.pdf -O consideration.pdf
        pdf_docs = ["attention.pdf"]
    else:
        pdf_docs = [os.path.join(pdf_folder_path, f) for f in os.listdir(pdf_folder_path)
                   if f.endswith('.pdf')]


    print(f"Discovered {len(pdf_docs)} PDF paperwork")


    for pdf_path in pdf_docs:
        strive:
            loader = PyPDFLoader(pdf_path)
            paperwork.lengthen(loader.load())
            print(f"Loaded: {pdf_path}")
        besides Exception as e:
            print(f"Error loading {pdf_path}: {e}")


    return paperwork




paperwork = load_documents("")

Subsequent, it is advisable cut up these paperwork into small chunks for environment friendly search.

def split_documents(paperwork):
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000,
        chunk_overlap=200,
        length_function=len,
    )
    chunks = text_splitter.split_documents(paperwork)
    print(f"Break up {len(paperwork)} paperwork into {len(chunks)} chunks")
    return chunks


chunks = split_documents(paperwork)

Create a vector embedding of doc chunks utilizing a press release converter.

def create_vector_store(chunks):
    print("Loading embedding mannequin...")
    embedding_model = HuggingFaceEmbeddings(
        model_name="sentence-transformers/all-MiniLM-L6-v2",
        model_kwargs={'system': 'cuda' if torch.cuda.is_available() else 'cpu'}
    )


    print("Creating vector retailer...")
    vector_store = FAISS.from_documents(chunks, embedding_model)
    print("Vector retailer created efficiently!")
    return vector_store


vector_store = create_vector_store(chunks)

Subsequent, let’s load an open supply language mannequin and generate a response. I take advantage of Tinyllama. It is sufficiently small to run in Colab, however highly effective sufficient for duties.

def load_language_model():
    print("Loading language mannequin...")
    model_id = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"


    strive:
        import subprocess
        print("Putting in/updating bitsandbytes...")
        subprocess.check_call(["pip", "install", "-U", "bitsandbytes"])
        print("Efficiently put in/up to date bitsandbytes")
    besides:
        print("Couldn't replace bitsandbytes, will proceed with out 8-bit quantization")


    from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, pipeline
    import torch


    tokenizer = AutoTokenizer.from_pretrained(model_id)


    if torch.cuda.is_available():
        strive:
            quantization_config = BitsAndBytesConfig(
                load_in_8bit=True,
                llm_int8_threshold=6.0,
                llm_int8_has_fp16_weight=False
            )


            mannequin = AutoModelForCausalLM.from_pretrained(
                model_id,
                torch_dtype=torch.bfloat16,
                device_map="auto",
                quantization_config=quantization_config
            )
            print("Mannequin loaded with 8-bit quantization")
        besides Exception as e:
            print(f"Error with quantization: {e}")
            print("Falling again to plain mannequin loading with out quantization")
            mannequin = AutoModelForCausalLM.from_pretrained(
                model_id,
                torch_dtype=torch.bfloat16,
                device_map="auto"
            )
    else:
        mannequin = AutoModelForCausalLM.from_pretrained(
            model_id,
            torch_dtype=torch.float32,
            device_map="auto"
        )


    pipe = pipeline(
        "text-generation",
        mannequin=mannequin,
        tokenizer=tokenizer,
        max_length=2048,
        temperature=0.2,
        top_p=0.95,
        repetition_penalty=1.2,
        return_full_text=False
    )


    from langchain_community.llms import HuggingFacePipeline
    llm = HuggingFacePipeline(pipeline=pipe)
    print("Language mannequin loaded efficiently!")
    return llm


llm = load_language_model()

Now let’s mix vector shops with language fashions to construct an assistant.

def format_research_assistant_output(question, response, sources):
    output = f"n{'=' * 50}n"
    output += f"USER QUERY: {question}n"
    output += f"{'-' * 50}nn"
    output += f"ASSISTANT RESPONSE:n{response}nn"
    output += f"{'-' * 50}n"
    output += f"SOURCES REFERENCED:nn"


    for i, doc in enumerate(sources):
        output += f"Supply #{i+1}:n"
        content_preview = doc.page_content[:200] + "..." if len(doc.page_content) > 200 else doc.page_content
        wrapped_content = textwrap.fill(content_preview, width=80)
        output += f"{wrapped_content}nn"


    output += f"{'=' * 50}n"
    return output


import textwrap


research_assistant = create_research_assistant(vector_store, llm)


test_queries = [
    "What is the key idea behind the Transformer model?",
    "Explain self-attention mechanism in simple terms.",
    "Who are the authors of the paper?",
    "What are the main advantages of using attention mechanisms?"
]


for question in test_queries:
    response, sources = research_assistant(question, return_sources=True)
    formatted_output = format_research_assistant_output(question, response, sources)
    print(formatted_output)

On this tutorial, we constructed a conversational analysis assistant utilizing generations searched utilizing open supply fashions. RAG enhances the language mannequin by integrating doc searches, decreasing hallucinations, and guaranteeing domain-specific accuracy. This information advances the setup of the atmosphere, processing scientific papers, creating vector embeddings utilizing face and sentence transformers, and integrating open supply language fashions like Tinyllama. The assistant will get a piece of the associated paperwork and generates a response with the quotation. This implementation permits customers to question the data base, making AI-powered analysis extra dependable and environment friendly to reply domain-specific questions.

Right here is Colove Notebook. Additionally, remember to comply with us Twitter And be a part of us Telegram Channel and LinkedIn grOUP. Remember to affix us 85k+ ml subreddit.

Asif Razzaq is CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, ASIF is dedicated to leveraging the chances of synthetic intelligence for social advantages. His newest efforts are the launch of MarkTechPost, a synthetic intelligence media platform. That is distinguished by its detailed protection of machine studying and deep studying information, and is straightforward to grasp by a technically sound and huge viewers. The platform has over 2 million views every month, indicating its recognition amongst viewers.

Welcome to Ivugangingo!

At Ivugangingo, we're passionate about delivering insightful content that empowers and informs our readers across a spectrum of crucial topics. Whether you're delving into the world of insurance, navigating the complexities of cryptocurrency, or seeking wellness tips in health and fitness, we've got you covered.

Coding implementation for constructing dialog analysis assistants with FAISS, LANGCHAIN, PYPDF, and TINYLLAMA-1.1B-CHAT-V1.0

These seven huge monetary expertise corporations count on Bitcoin costs to soften

Arithmetic couple solves main group idea issues after 20 years of labor

Converter

Editors Pick

Newsletter

Categories

Related Posts

Leave a Comment Cancel Reply

Latest

Best selling

Top rated

Products

Latest Posts

Welcome to Ivugangingo!

Random Picks