Native LLM. Good.
However after your first few chats, you might be questioning, “What else can I do with this?”
So why not use some instrument to agentize your native LLM?
On this put up, we’ll discover learn how to flip your native LLM into an agent that makes use of your instruments. particularly,
- gemma 4 mannequin (edge-friendly variant) as native LLM
- orama To serve native LLMs
- OpenAI agent SDK for agent runtime
- Tavily Net Search MCP for instance of an exterior instrument
Construct a mini deep analysis agent that may search the net, accumulate proof, and synthesize solutions, together with citations, in response to consumer questions.
By the top of this text, you’ll have a working native deep analysis agent and a reusable implementation sample for changing an area mannequin into an area AI agent.
For those who’re interested by organising an area coding agent, we beforehand coated Gemma 4 + OpenCode. This put up focuses on extra basic patterns for connecting native fashions to agent runtimes and exterior instruments.
1. Establishing an area agent stack
Earlier than writing any code, we have to put together 4 components: Ollama, Gemma 4 (particularly the Gemma 4 E4B mannequin), OpenAI Brokers SDK, and Tavily MCP.
First, let’s set up Ollama.
On Home windows, you possibly can obtain the installer from Ollama’s official web site.
https://ollama.com/obtain
or use winget In PowerShell:
winget set up Ollama.Ollama
On Linux, Ollama may be put in as follows:
"curl -fsSL https://ollama.com/set up.sh | sh"
After set up, please verify the next:
ollama --version
On Home windows, all the time[スタート]Begin Ollama from the menu. As soon as executed, the native API endpoint shall be obtainable.
Subsequent, pull the native mannequin. Right here we use the Gemma 4 E4B variant.
ollama pull gemma4:e4b
Gemma 4 is available in a number of variations. The E4B mannequin is appropriate for our functions as a result of it’s designed with edge/native agent workflows in thoughts. My machine has an NVIDIA RTX 2000 Ada laptop computer GPU with about 8 GB VRAM. In case your machine is extra constrained, you possibly can attempt the lighter E2B variant.
ollama pull gemma4:e2b
Subsequent, you want an agent runtime library. To take action, we’ll use the OpenAI Brokers SDK.
pip set up openai-agents
An OpenAI appropriate shopper can also be required.
pip set up openai
One factor to notice right here: This doesn’t imply sending the mannequin name to OpenAI, as in a while we are going to level the shopper to Ollama’s native endpoint.
Lastly, we want a Tavily MCP endpoint. For those who’ve by no means used it earlier than, Tavily is a search API designed for LLM functions. This put up makes use of an MCP server to permit brokers to look the net.
First you want to create a Tavily account and get an API key. The Tavily platform can straight generate MCP hyperlinks of the next shapes:
https://mcp.tavily.com/mcp/?tavilyApiKey=<your-api-key>
The preparation is now full.
The usage of Tavily right here will not be the selection of the sponsor. Used right here as one handy MCP The identical sample works for different MCP appropriate instruments as effectively.
In reality, the complete stack right here will not be your solely possibility. As a substitute for utilizing Ollama, you should use LM Studio or llama.cpp to serve native fashions. As a substitute of the Gemma 4 mannequin, you can too attempt different fashions such because the Qwen household. For agent framework, There are additionally choices from Google or Anthropic. It’s also possible to join varied MCP instruments as an alternative of Tavily. I exploit this mixture just because I am used to that stack. Nevertheless, the important thing takeaway from this case research is the frequent native agent sample.
2. Configure native analysis agent
For OpenAI Brokers SDK, that is the ultimate Agent Objects you want to create:
from brokers import Agent
agent = Agent(
title="Native Analysis Agent",
directions=RESEARCH_AGENT_INSTRUCTIONS,
mannequin=mannequin,
mcp_servers=[tavily_server],
mcp_config={"include_server_in_tool_names": True},
)
Let’s unpack every half.
2.1 Mannequin
First, the mannequin.
from openai import AsyncOpenAI
from brokers import OpenAIChatCompletionsModel
MODEL_NAME = "gemma4:e4b"
OLLAMA_BASE_URL = "http://localhost:11434/v1"
shopper = AsyncOpenAI(
api_key="ollama",
base_url=OLLAMA_BASE_URL,
)
mannequin = OpenAIChatCompletionsModel(
mannequin=MODEL_NAME,
openai_client=shopper,
)
First, create a shopper that factors to Ollama’s native OpenAI-compatible endpoint.
Then use OpenAIChatCompletionsModel Wrap the Gemma mannequin right into a mannequin object. This enables the agent SDK to make use of that mannequin inside the agent loop.
Please watch out. api_key="ollama" worth is only a placeholder. Ollama does not really require an actual OpenAI API key. Use this as a result of the shopper expects this area.
2.2 Directions
Subsequent, outline directions for the agent with the specified investigative conduct.
from datetime import datetime
CURRENT_DATE = datetime.now().strftime("%B %d, %Y")
# Word that this instruction is iterated with AI
RESEARCH_AGENT_INSTRUCTIONS = f"""
[Role]
You're a concise analysis assistant.
[Task]
Reply the consumer's query by turning it right into a small net analysis process.
Use the present date when deciphering time-sensitive questions: {CURRENT_DATE}.
[Research behavior]
Begin with one focused search question.
For suggestion or comparability questions, full this analysis loop earlier than answering:
first establish the principle choices, then seek for comparability context, then synthesize a suggestion.
Use follow-up searches when the primary outcomes are inadequate, conflicting, or solely cowl a part of the query.
Choose related and credible sources, and observe which supply helps every essential declare.
Earlier than answering, verify whether or not the gathered proof is sufficient to assist the conclusion.
[Expected output]
Give a direct reply first, then briefly clarify the proof behind it.
Embrace supply hyperlinks for key factual claims.
[Rules]
Don't depend on reminiscence for details which will have modified.
Don't invent lacking particulars.
Hold the reply concise.
""".strip()
2.3 Instruments
Subsequent, equip your brokers with net search instruments. On this case, we use the Tavily search engine by MCP.
from brokers import Agent, Runner
from brokers.mcp import MCPServerStreamableHttp
TAVILY_MCP_URL = "YOUR_TAVILY_MCP_URL"
async with MCPServerStreamableHttp(
title="tavily",
params={"url": TAVILY_MCP_URL},
) as tavily_server:
instruments = await tavily_server.list_tools()
print("Accessible Tavily instruments:")
for instrument in instruments:
description = (instrument.description or "").exchange("n", " ")
print(f"- {instrument.title}: {description[:120]}")
agent = Agent(
title="Native Analysis Agent",
directions=RESEARCH_AGENT_INSTRUCTIONS,
mannequin=mannequin,
mcp_servers=[tavily_server],
mcp_config={"include_server_in_tool_names": True},
)
end result = await Runner.run(agent, RESEARCH_QUESTION, max_turns=MAX_TURNS)
This code block does three issues:
- Open a connection to Tavily’s MCP server.
async with MCPServerStreamableHttp(...) as tavily_server:As soon as related, Tavily exposes the obtainable instruments to the Agent SDK. - Create an Agent object inside the MCP context. Please watch out.
mcp_servers=[tavily_server]join Tavily’s MCP instrument to the agent. - Lastly run the agent
end result = await Runner.run(agent, RESEARCH_QUESTION, max_turns=MAX_TURNS). The context supervisor is essential right here as a result of the MCP connection is lively solely internally.async withblock.
mcp_config={"include_server_in_tool_names": True}That is primarily to make the hint simpler to learn. With out this, the instrument title would simply seem as:tavily_search. It will trigger the instrument title to seem as follows:mcp_tavily__tavily_search. This makes it extra apparent that the instrument name was made by the Tavily MCP server.
3. Ask your analysis query
Now that we have configured the agent, let’s take a look at it with one particular query:
“Which sport had the very best stakes within the group stage of the June 23, 2026 World Cup and why?”
Print a compact hint to search out out what occurred.
def compact(worth: object, restrict: int = 220) -> str:
textual content = str(worth).exchange("n", " ")
return textual content if len(textual content) <= restrict else textual content[:limit] + "..."
for step, merchandise in enumerate(end result.new_items, begin=1):
raw_item = getattr(merchandise, "raw_item", None)
raw_type = getattr(raw_item, "sort", "")
raw_name = getattr(raw_item, "title", "")
raw_output = getattr(raw_item, "output", "")
print(
f"{step:02d} | {sort(merchandise).__name__} | "
f"{raw_type or raw_name} | {compact(raw_output or raw_item)}"
)
In my run, the hint seemed like this:
01 | ToolCallItem | function_call | ResponseFunctionToolCall(arguments='{"question":"World Cup 2026 group stage matches June 23, 2026 stakes"}', title='mcp_tavily__tavily_search', ...)
02 | ToolCallOutputItem | | {'call_id': ..., 'output': ...}
03 | MessageOutputItem | message | ResponseOutputMessage(... ultimate reply ...)
This lets you see the agent’s conduct straight. On this run, the native Gemma mannequin determined to name the Tavily search instrument, and the Brokers SDK executed that instrument name and returned the outcomes to the mannequin. The mannequin then got here up with a ultimate reply.
To see the ultimate response, print:
print(end result.final_output)
Here is what the agent created:
The match with the most important group-stage stakes on June 23, 2026, was Colombia vs. DR Congo.
Why:
In keeping with FIFA reporting, this particular match was highlighted as a crucial second the place Colombia superior into the knockout part of the match.
The article notes that Daniel Munoz scored the primary objective for Colombia throughout this Group Ok fixture, which straight contributed to their development within the competitors.
Proof
- FIFA: An article titled "Colombia v Congo DR Group Ok FIFA World Cup 2026" particularly reviews on a key second from this match, stating that Munoz's objective helped hearth Colombia into the knockout part.
Supply: https://digitalhub.fifa.com/remodel/450614d3-72d7-4c1f-85ff-ea0fbee6f28d/Colombia-v-Congo-DR-Group-Ok-FIFA-World-Cup-2026?focuspoint=0.51
- Yahoo Sports activities: Confirms the fixture and end result for that date: Colombia defeated DR Congo.
Supply: https://sports activities.yahoo.com/soccer/article/2026-world-cup-results-standings-and-schedule-live-scores-group-stage-updates-and-how-to-watch-050724193.html
Word that the agent solely did one search spherical on this run as a result of the search outcomes already include sufficient proof for the mannequin to reply. Extra advanced questions require a number of searches and inferences, which present frameworks naturally assist.
4. Abstract
Native LLMs don’t have to remain as a chat mannequin.
On this put up, you deployed a Gemma 4 E4B mannequin domestically by Ollama, positioned it contained in the agent runtime offered by the OpenAI Brokers SDK, and offered an online search instrument for brokers to search out data on-line and reply consumer questions.
From right here, you possibly can simply lengthen this sample with stronger analysis directions or construct a clearer planning and reflection workflow (if you wish to proceed working within the course of deep analysis). Or you possibly can join the agent to extra MCP instruments for a lot of different use instances.
Completely satisfied constructing!
reference
Ollama: https://ollama.com/
Gemma mannequin household: https://ai.google.dev/gemma
OpenAI Brokers SDK: https://openai.github.io/openai-agents-python/
Agent SDK MCP documentation: https://openai.github.io/openai-agents-python/mcp/
Tavily MCP documentation: https://docs.tavily.com/documentation/mcp

