Tip 2: Use structured output
Utilizing structured output means forcing LLM to output legitimate JSON or YAML textual content. It will reduce down on pointless ramblings and provide help to get “to the purpose” solutions about what you want out of your LLM. It additionally makes it simpler to validate LLM responses, so the next suggestions are useful:
Here is how to do that utilizing Gemini’s API:
import jsonimport google.generativeai as genai
from pydantic import BaseModel, Subject
from document_ai_agents.schema_utils import prepare_schema_for_gemini
class Reply(BaseModel):
reply: str = Subject(..., description="Your Reply.")
mannequin = genai.GenerativeModel("gemini-1.5-flash-002")
answer_schema = prepare_schema_for_gemini(Reply)
query = "Listing all of the the explanation why LLM hallucinate"
context = (
"LLM hallucination refers back to the phenomenon the place massive language fashions generate plausible-sounding however"
" factually incorrect or nonsensical data. This could happen because of numerous elements, together with biases"
" within the coaching information, the inherent limitations of the mannequin's understanding of the actual world, and the "
"mannequin's tendency to prioritize fluency and coherence over accuracy."
)
messages = (
[context]
+ [
f"Answer this question: {question}",
]
+ [
f"Use this schema for your answer: {answer_schema}",
]
)
response = mannequin.generate_content(
messages,
generation_config={
"response_mime_type": "software/json",
"response_schema": answer_schema,
"temperature": 0.0,
},
)
response = Reply(**json.masses(response.textual content))
print(f"{response.reply=}")
Right here “prepare_schema_for_gemini” is a utility operate that prepares the schema to match Gemini’s unusual necessities. Its definition could be discovered right here. code.
This code defines a Pydantic schema and sends this schema as a part of the question for the sector “response_schema”. This forces LLM to comply with this schema in its responses, making it simpler to parse the output.
Tip 3: Use chains of thought and higher prompts
In some instances, giving the LLM time to think about the response earlier than committing to a remaining reply could produce a better high quality response. This method is known as thought chaining and is extensively used as a result of it’s efficient and really simple to implement.
You may also explicitly ask the LLM to reply “N/A” if it doesn’t discover sufficient context to generate a high quality response. This offers a simple manner out as a substitute of making an attempt to reply unanswerable questions.
For instance, contemplate the next easy query and context:
context
Thomas Jefferson (April 13) [O.S. April 2](1743 – July 4, 1826) was an American politician, planter, diplomat, lawyer, architect, thinker, and Founding Father who served because the third President of america from 1801 to 1809. Served.[6] He was the principle creator of the Declaration of Independence. After the Revolutionary Warfare, earlier than turning into president in 1801, Jefferson served as the primary U.S. Secretary of State below George Washington after which because the second vice chairman below John Adams. Jefferson was a number one supporter of democracy, republicanism, and pure rights, authoring formative paperwork and choices on the state, nationwide, and worldwide ranges. (Supply: Wikipedia)
query
What yr did Davis Jefferson die?
A easy method yields the next outcomes:
response
Reply=’1826′
That is clearly false, as Jefferson Davis isn’t talked about in context. It was Thomas Jefferson who died in 1826.
For those who change the response schema to make use of chain of thought, it is going to seem like this:
class AnswerChainOfThoughts(BaseModel):
rationale: str = Subject(
...,
description="Justification of your reply.",
)
reply: str = Subject(
..., description="Your Reply. Reply with 'N/A' if reply will not be discovered"
)
I am additionally including extra particulars about what to anticipate because the output if the query cannot be answered, utilizing the context “If no reply discovered, reply ‘N/A'”.
This new method offers you: foundation (Keep in mind the chain of thought):
The textual content supplied discusses Thomas Jefferson, not Jefferson Davis. It doesn’t include details about the dying of Jefferson Davis.
and the finals reply:
Reply=’Not relevant’
fantastic ! However may a extra normal method be used to detect hallucinations?
An agent can try this!
Tip 4: Use an agentic method
Construct a easy agent that implements a three-step course of.
- Step one is to ask the LLM a query with context to retrieve the primary attainable response and the related context utilized in that reply.
- The second step is to reformulate the query and first candidate responses as declarative statements.
- The third step is to ask the LLM to substantiate whether or not the related context is appropriate. accompany Candidate’s response. That is referred to as “self-verification.” https://arxiv.org/pdf/2212.09561
To implement this, we outline three nodes in LangGraph. The primary node asks the query whereas together with the context, the second node reformulates the query utilizing LLM, and the third node checks the implications of the assertion with respect to the enter context.
The primary node could be outlined like this:
def answer_question(self, state: DocumentQAState):
logger.data(f"Responding to query '{state.query}'")
assert (
state.pages_as_base64_jpeg_images or state.pages_as_text
), "Enter textual content or photos"
messages = (
[
{"mime_type": "image/jpeg", "data": base64_jpeg}
for base64_jpeg in state.pages_as_base64_jpeg_images
]
+ state.pages_as_text
+ [
f"Answer this question: {state.question}",
]
+ [
f"Use this schema for your answer: {self.answer_cot_schema}",
]
)response = self.mannequin.generate_content(
messages,
generation_config={
"response_mime_type": "software/json",
"response_schema": self.answer_cot_schema,
"temperature": 0.0,
},
)
answer_cot = AnswerChainOfThoughts(**json.masses(response.textual content))
return {"answer_cot": answer_cot}
And the second appears like this:
def reformulate_answer(self, state: DocumentQAState):
logger.data("Reformulating reply")
if state.answer_cot.reply == "N/A":
returnmessages = [
{
"role": "user",
"parts": [
{
"text": "Reformulate this question and its answer as a single assertion."
},
{"text": f"Question: {state.question}"},
{"text": f"Answer: {state.answer_cot.answer}"},
]
+ [
{
"text": f"Use this schema for your answer: {self.declarative_answer_schema}"
}
],
}
]
response = self.mannequin.generate_content(
messages,
generation_config={
"response_mime_type": "software/json",
"response_schema": self.declarative_answer_schema,
"temperature": 0.0,
},
)
answer_reformulation = AnswerReformulation(**json.masses(response.textual content))
return {"answer_reformulation": answer_reformulation}
The third one appears like this:
def verify_answer(self, state: DocumentQAState):
logger.data(f"Verifying reply '{state.answer_cot.reply}'")
if state.answer_cot.reply == "N/A":
return
messages = [
{
"role": "user",
"parts": [
{
"text": "Analyse the following context and the assertion and decide whether the context "
"entails the assertion or not."
},
{"text": f"Context: {state.answer_cot.relevant_context}"},
{
"text": f"Assertion: {state.answer_reformulation.declarative_answer}"
},
{
"text": f"Use this schema for your answer: {self.verification_cot_schema}. Be Factual."
},
],
}
]response = self.mannequin.generate_content(
messages,
generation_config={
"response_mime_type": "software/json",
"response_schema": self.verification_cot_schema,
"temperature": 0.0,
},
)
verification_cot = VerificationChainOfThoughts(**json.masses(response.textual content))
return {"verification_cot": verification_cot}
Comprises full code https://github.com/CVxTz/document_ai_agents
Discover how every node makes use of its personal schema for structured output and its personal prompts. That is made attainable by the pliability of each Gemini’s API and LangGraph.
Let’s run this code utilizing the identical instance as above ➡️
(Be aware: The primary immediate doesn’t use a thought chain in order that the take a look at triggers validation.)
context
Thomas Jefferson (April 13) [O.S. April 2](1743 – July 4, 1826) was an American politician, planter, diplomat, lawyer, architect, thinker, and Founding Father who served because the third President of america from 1801 to 1809. Served.[6] He was the principle creator of the Declaration of Independence. After the Revolutionary Warfare, earlier than turning into president in 1801, Jefferson served as the primary U.S. Secretary of State below George Washington after which because the second vice chairman below John Adams. Jefferson was a number one supporter of democracy, republicanism, and pure rights, authoring formative paperwork and choices on the state, nationwide, and worldwide ranges. (Supply: Wikipedia)
query
What yr did Davis Jefferson die?
End result for first node (first reply):
related_context=’Thomas Jefferson (April 13) [O.S. April 2]1743 – July 4, 1826) was an American politician, planter, diplomat, lawyer, architect, thinker, and founding father of america. served as president.
Reply=’1826′
Results of second node (reformulation of reply):
declarative reply= “Davis Jefferson died in 1826”
Outcomes for third node (validation):
foundation= “The context states that Thomas Jefferson died in 1826. The declare states that Davis Jefferson died in 1826. The context doesn’t point out Davis Jefferson, solely about Thomas Jefferson. It’s talked about.”
implication=’No’
Due to this fact, the validation step is rejected (There is no such thing as a causal relationship between the 2) First reply. It’s now attainable to keep away from returning illusions to the person.
Bonus tip: Use extra highly effective fashions
It isn’t at all times simple to use this tip because of price range and delay constraints, however you need to know {that a} sturdy LLM is much less prone to hallucinate. Due to this fact, if attainable, select the extra highly effective LLM to your most delicate use instances. You possibly can take a look at the hallucination benchmark right here. https://github.com/vectara/hallucination-leaderboard. We are able to see that the highest fashions on this benchmark (with the least hallucinations) additionally rank on the high of the standard NLP chief board.
On this tutorial, we thought of methods to enhance the reliability of LLM output by lowering the hallucination charge. Key suggestions embrace the usage of cautious formatting and prompts to information LLM calls, and the usage of workflow-based approaches designed to encourage brokers to evaluation their solutions.
This requires a number of steps.
- Get the precise context parts that LLM makes use of to generate solutions.
- Reformulate your reply (in declarative type) to make it simpler to confirm.
- Tells LLM to examine consistency between the context and the reformulated reply.
All of the following tips can vastly enhance your accuracy, however keep in mind that no technique is foolproof. If the LLM is simply too conservative throughout validation or misses actual hallucination instances, there may be at all times a danger of legitimate solutions being rejected. Due to this fact, rigorous analysis of particular LLM workflows stays important.
Comprises full code https://github.com/CVxTz/document_ai_agents

