On this article, you’ll discover ways to remodel a primary tool-calling script right into a resilient agent that gracefully handles failures from misbehaving instruments, malformed mannequin outputs, and unavailable providers.
Matters we’ll cowl embody:
- The right way to construction an iterative agent loop with a security cap on iteration depend.
- The 4 distinct classes of failure an agent encounters when calling instruments, and deal with each.
- The right way to design device error messages that educate the mannequin get better, lowering wasted iterations.
Constructing a Multi-Instrument Gemma 4 Agent with Error Restoration
Introduction
In a earlier article, we wired up Gemma 4 to a handful of Python features utilizing Ollama’s tool-calling API. That gave us a working single-turn dispatcher: the mannequin picks a device, our code runs it, the mannequin solutions. It’s a helpful start line, nevertheless it’s a great distance from an agent.
One of many issues that turns a tool-calling demo into an precise agent is the way it handles issues going incorrect. Instruments fail. The mannequin hallucinates a perform identify, or passes a string the place you needed a quantity, or asks a few metropolis your lookup desk has by no means heard of. An upstream API occasions out. A required argument is lacking. Within the earlier tutorial, any of those would both crash the script or get swallowed by a attempt/besides that prints a message and offers up. That’s tremendous for a single path demo. It’s not tremendous for something you’d need to depart operating.
This text rebuilds the agent across the assumption that issues will go incorrect, and reveals get better gracefully once they do. The sample is easy: catch errors on the boundary, convert them into messages the mannequin can learn, ship them again to the mannequin, and let the mannequin resolve whether or not to retry, route round the issue, or clarify the failure to the consumer. We’ll additionally wrap all the pieces in a correct iterative agent loop with a security cap on iteration depend.
The full script can be found here. This text walks via the components that matter.
Rethinking the Instrument Loop
The unique dispatcher ran a single spherical: ship the consumer question, acquire device calls, run them, ship the outcomes again, print the mannequin’s reply. That’s a one-shot interplay. It really works tremendous when the mannequin’s first response accurately solutions the consumer’s query, nevertheless it has nowhere to go when one thing goes incorrect. If a device fails, the mannequin will get one likelihood to react after which we’re accomplished. If the mannequin desires to name one other device after seeing the primary outcome, too dangerous; we already exited.
A correct agent loop is iterative. The construction is simple:
- Ship the present message historical past to the mannequin.
- If the mannequin produces device calls, execute each, append each outcome to the historical past, and loop once more.
- If the mannequin produces a plain textual content response, that’s the ultimate reply. Return.
- Cap the loop at
MAX_ITERATIONSso a confused mannequin can’t burn via your CPU perpetually.
That final level is non-negotiable. Small fashions often get caught calling the identical device repeatedly, or oscillating between two instruments, and there’s nothing extra demoralizing than strolling again to your terminal to seek out your laptop computer’s followers screaming as a result of Gemma determined to lookup the climate in London thirty occasions in a row.
Right here’s the loop:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
def run_agent(user_query): messages = [{“role”: “user”, “content”: user_query}]
for iteration in vary(1, MAX_ITERATIONS + 1): payload = { “mannequin”: MODEL_NAME, “messages”: messages, “instruments”: available_tools, “stream”: False, }
print(f“[EXECUTION — iteration {iteration}]”) print(” ● Querying mannequin…n”)
attempt: response_data = call_ollama(payload) besides Exception as e: print(f” └─ [ERROR] Error calling Ollama API: {e}”) print(f” └─ Ensure Ollama is operating and {MODEL_NAME} is pulled.”) return
message = response_data.get(“message”, {}) tool_calls = message.get(“tool_calls”) or []
# Department A: the mannequin desires to make use of instruments if tool_calls: print(f“[TOOL EXECUTION — {len(tool_calls)} call(s)]”) messages.append(message) tool_messages = print_tool_calls(tool_calls) messages.lengthen(tool_messages) print() proceed
# Department B: the mannequin produced a last reply print(“[RESPONSE]”) print(message.get(“content material”, “”) + “n”) return
# Security rail: we exhausted MAX_ITERATIONS and not using a last reply print(“[RESPONSE]”) print( f“Hit the {MAX_ITERATIONS}-iteration cap and not using a last reply. “ “This often means the mannequin is caught in a tool-calling loop. “ “Strive simplifying the question.n” ) |
The sample is price committing to reminiscence as a result of it reveals up in each agent framework you’ll ever learn: the message historical past is the state. For every iteration we ship your complete dialog (the unique consumer question, the mannequin’s tool-call request, our device outcomes, any follow-up mannequin messages) again to the mannequin. The mannequin is stateless; the listing is the agent’s reminiscence.
This iterative construction can be what makes error restoration potential. When a device fails and we ship the error again as a device message, the mannequin will get to see that error and react to it on the subsequent iteration. With out the loop, there’s nothing to react into.
Constructing the Instrument Registry
Right here we construct our 4 instruments, all deterministic, all offline. No API keys, no community calls, no flaky exterior providers to debug. The purpose of this text is the error-handling structure, not the instruments themselves, so we wish the instruments to behave predictably so we will give attention to the framework round them, and so we will intentionally set off each failure mode at will.
The instruments are:
get_weather(metropolis): appears to be like up a metropolis in a small dict of canned climate knowledgeget_local_time(metropolis): computes the true present time in that metropolis’s timezone utilizingzoneinfoconvert_currency(quantity, from_currency, to_currency): does the maths towards a hardcoded USD-anchored charge deskget_city_population(metropolis): one other lookup towards a small dict
The static knowledge lives on the prime of the file:
|
CITY_DATA = { “london”: {“timezone”: “Europe/London”, “inhabitants”: 8_982_000}, “tokyo”: {“timezone”: “Asia/Tokyo”, “inhabitants”: 13_960_000}, “sao paulo”: {“timezone”: “America/Sao_Paulo”, “inhabitants”: 12_330_000}, “paris”: {“timezone”: “Europe/Paris”, “inhabitants”: 2_161_000}, “the big apple”: {“timezone”: “America/New_York”, “inhabitants”: 8_336_000}, “sydney”: {“timezone”: “Australia/Sydney”, “inhabitants”: 5_312_000}, “mumbai”: {“timezone”: “Asia/Kolkata”, “inhabitants”: 20_410_000}, }
EXCHANGE_RATES = { “USD”: 1.00, “EUR”: 0.92, “GBP”: 0.79, “JPY”: 156.40, “BRL”: 5.12, “CAD”: 1.37, “AUD”: 1.51, “INR”: 83.20, } |
The features are intentionally easy, however they elevate on dangerous enter reasonably than returning error strings. Right here’s get_weather:
|
def get_weather(metropolis: str) -> str: “”“Returns present climate circumstances for a identified metropolis.”“” key = metropolis.decrease().strip() if key not in WEATHER_DATA: elevate ValueError( f“Unknown metropolis: ‘{metropolis}’. Recognized cities: {‘, ‘.be a part of(sorted(WEATHER_DATA.keys()))}.” ) knowledge = WEATHER_DATA[key] return f“The climate in {metropolis.title()} is {knowledge[‘conditions’]} with a temperature of {knowledge[‘temp_c’]}°C.” |
Two issues to name out about that error message. First, it’s particular: it tells the caller what went incorrect and what the legitimate choices are. Second, the device elevates a ValueError reasonably than returning the error as a string. Don’t catch and string-format errors contained in the device; as an alternative, allow them to propagate. We wish the dispatcher to deal with each form of failure in a single place, and we wish the message the mannequin sees on a nasty enter to be informative sufficient that the mannequin can appropriate itself.
get_local_time does the one actual work — precise timezone-aware datetime arithmetic — and that’s additionally the device we’ll later use to show swish degradation towards a simulated upstream failure:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
def get_local_time(metropolis: str) -> str: “”“Returns the present native time for a metropolis, with a cached fallback.”“” key = metropolis.decrease().strip()
# Simulate an upstream geocoding service which will fail unpredictably if SIMULATE_GEOCODING_OUTAGE and random.random() < 0.6: if key in TIMEZONE_FALLBACK_CACHE: tz_name = TIMEZONE_FALLBACK_CACHE[key] now = datetime.datetime.now(ZoneInfo(tz_name)) return ( f“[cached] The present native time in {metropolis.title()} is “ f“{now.strftime(‘%H:%M on %A, %B %d, %Y’)} ({tz_name}). “ “Be aware: geocoding service is at present unavailable; this worth is from the native cache.” ) elevate ToolUnavailableError( f“Geocoding service is unavailable and ‘{metropolis}’ isn’t within the native cache. “ “Please attempt once more later or use a metropolis from the cache: “ f“{‘, ‘.be a part of(sorted(TIMEZONE_FALLBACK_CACHE.keys()))}.” )
if key not in CITY_DATA: elevate ValueError(f“Unknown metropolis: ‘{metropolis}’. Recognized cities: {‘, ‘.be a part of(sorted(CITY_DATA.keys()))}.”) tz_name = CITY_DATA[key][“timezone”] now = datetime.datetime.now(ZoneInfo(tz_name)) return f“The present native time in {metropolis.title()} is {now.strftime(‘%H:%M on %A, %B %d, %Y’)} ({tz_name}).” That <code>SIMULATE_GEOCODING_OUTAGE</code> flag lets us reproduce a actual–world failure mode with out needing actual infrastructure to fail. We‘ll come again to it.
The device schemas are unchanged from <a href=”https://machinelearningmastery.com/how-to-implement-tool-calling-with-gemma-4-and-python/” goal=”_blank”>the earlier tutorial’s</a> model: commonplace Ollama perform–calling format, with clear descriptions of what every device does and what arguments it expects.
<h2>The 4 Error Restoration Patterns</h2> Time to get severe. There are 4 distinct failure modes you‘ll encounter when an agent talks to instruments, and each wants its personal technique. They’re dealt with in a single dispatcher perform, however it‘s price understanding them as separate ideas.
<h3>Sample 1: Instrument Execution Errors</h3> The primary protection is the dispatcher itself. It wraps each device name in a structured <code>attempt</code>/<code>besides</code> block and converts each form of failure right into a <code>(standing, content material)</code> pair the agent loop can go again to the mannequin:
<pre class=”lang:default decode:true”>def dispatch_tool_call(tool_call): function_name = tool_call[“function”][“name”] arguments = tool_call[“function”][“arguments”] or {}
# Protection 1: validate the device identify towards the registry if function_name not in TOOL_FUNCTIONS: return “error”, ( f”Unknown device ‘{function_name}‘. “ f”Legitimate instruments are: {‘, ‘.be a part of(TOOL_FUNCTIONS.keys())}.“ )
func = TOOL_FUNCTIONS[function_name]
# Protection 2: catch argument errors (incorrect sorts, lacking or further args) attempt: outcome = func(**arguments) return “okay“, str(outcome) besides TypeError as e: return “error“, f”Dangerous arguments for {function_name}: {e}“ besides ValueError as e: return “error“, str(e) besides ToolUnavailableError as e: return “error“, f”Instrument quickly unavailable: {e}“ besides Exception as e: return “error“, f”Surprising error in {function_name}: {sort(e).__name__}: {e}“ |
The important thing perception: return the error to the mannequin as a device outcome as an alternative of elevating it again to the agent loop. The mannequin can learn the error, see that it requested for “Atlantis” and Atlantis isn’t a identified metropolis, and pivot to a special metropolis, or apologize to the consumer. Should you elevate as an alternative, you’ve stripped the mannequin of the power to get better.
Discover the 4 totally different exception sorts and the catch-all on the backside. Every one corresponds to an actual class of failure: area errors (ValueError), signature mismatches (TypeError), infrastructure outages (ToolUnavailableError), and the Don Rumsfeld unknown unknowns (Exception). Separating them offers you cleaner error messages, which give the mannequin higher indicators for restoration.
The catch-all is essential and maybe controversial. Some model guides will inform you by no means to catch a naked Exception. In an agent dispatcher, the choice — letting an surprising exception kill the loop — is worse. The mannequin loses the prospect to get better, the consumer loses the response, and also you lose the dialog historical past you would have used to debug what occurred. Higher to catch, log, and hand the message to the mannequin.
Sample 2: Malformed Instrument Calls From the Mannequin
The mannequin often hallucinates a device identify that doesn’t exist, or sends arguments beneath the incorrect keys (city as an alternative of metropolis, for instance). The primary protection within the snippet above handles the primary case: earlier than we even attempt to dispatch, we verify the identify towards the registry and return a corrective message itemizing the legitimate names.
The incorrect-argument case is dealt with by the second protection. Python’s **arguments unpacking raises TypeError if the mannequin sends a key phrase the perform doesn’t settle for, or omits a required one. We catch the TypeError, format it cleanly, and the mannequin will get a helpful error on the subsequent iteration:
|
[ERROR]: Dangerous arguments for get_weather: get_weather() obtained an surprising key phrase argument ‘city’ |
That message accommodates all the pieces the mannequin must appropriate itself: the device identify, the offending argument, and an implicit sign that the appropriate identify is one thing else. In follow the mannequin often fixes the decision on its subsequent flip.
There’s additionally a extra delicate argument-related failure: sort drift. The mannequin is aware of quantity ought to be a quantity, however in longer conversations it often begins sending "100" as a string. Letting convert_currency elevate on that may drive an additional flip for the mannequin to appropriate itself. A greater strategy is defensive coercion within the device itself:
|
def convert_currency(quantity: float, from_currency: str, to_currency: str) -> str: # Defensive sort coercion: the mannequin typically sends numbers as strings attempt: quantity = float(quantity) besides (TypeError, ValueError): elevate ValueError(f“‘quantity’ have to be a quantity, obtained: {quantity!r}”) # … remainder of the perform |
This silently fixes the frequent case ("100" turns into 100.0) whereas nonetheless elevating a clear error for the genuinely damaged case ("fifty"). The precept: be liberal in what you settle for from the mannequin, and strict in what you complain about.
Sample 3: Area-Stage Errors
These are the errors the device itself raises when the inputs are well-formed however the request can’t be glad, corresponding to asking for the climate in Atlantis, or changing from a foreign money that isn’t within the charge desk. These ought to produce error messages that educate the mannequin get better, not simply say “failed.”
Examine these two error messages:
|
Good: “Unknown metropolis: ‘Atlantis’. Recognized cities: london, mumbai, the big apple, paris, sao paulo, sydney, tokyo.” |
The great model offers the mannequin all the pieces it must both retry with a legitimate enter or clarify the limitation to the consumer. The dangerous model forces the mannequin to guess. Each error message within the device features follows this sample: say what went incorrect, and the place potential, listing the legitimate alternate options.
This isn’t only a UX nicety. It straight impacts what number of iterations the agent loop will burn earlier than attending to a great reply. A obscure error can value you a full further spherical journey whereas the mannequin gropes for a repair. A selected error often will get corrected on the very subsequent flip or, when the enter is genuinely unrecoverable, lets the mannequin produce a clear rationalization with out making an attempt once more in any respect.
Sample 4: Sleek Degradation for Unavailable Instruments
The final sample is for the state of affairs the place a device isn’t damaged, simply gone — a geocoding service is down, an API quota is exhausted, a database is having a nasty day. You will have three choices right here, roughly so as of how a lot you belief the mannequin to deal with the state of affairs:
- Return a cached or default worth and flag it within the outcome. Greatest when the device’s freshness isn’t important.
- Skip the device totally and return a transparent message about what couldn’t be offered. Let the mannequin resolve whether or not to retry or work round it.
- Floor the outage to the consumer by having the agent cease and ask for steering.
get_local_time demonstrates possibility 1. When SIMULATE_GEOCODING_OUTAGE is on and the random verify journeys, the device first tries the native cache:
|
if SIMULATE_GEOCODING_OUTAGE and random.random() < 0.6: if key in TIMEZONE_FALLBACK_CACHE: tz_name = TIMEZONE_FALLBACK_CACHE[key] now = datetime.datetime.now(ZoneInfo(tz_name)) return ( f“[cached] The present native time in {metropolis.title()} is “ f“{now.strftime(‘%H:%M on %A, %B %d, %Y’)} ({tz_name}). “ “Be aware: geocoding service is at present unavailable; this worth is from the native cache.” ) elevate ToolUnavailableError( f“Geocoding service is unavailable and ‘{metropolis}’ isn’t within the native cache. “ “Please attempt once more later or use a metropolis from the cache: “ f“{‘, ‘.be a part of(sorted(TIMEZONE_FALLBACK_CACHE.keys()))}.” ) |
If the town is within the cache, the device returns a profitable outcome tagged with [cached] and a word explaining that the reside service is unavailable. The mannequin sees a wonderfully usable reply and a small caveat it could possibly select to say to the consumer. If the town isn’t within the cache, the device falls via to possibility 2: it raises ToolUnavailableError with a message itemizing what is cached.
That ToolUnavailableError is deliberately a separate exception sort reasonably than a ValueError. The dispatcher offers it its personal catch arm with a definite error prefix (“Instrument quickly unavailable”) so the mannequin can inform the distinction between “you requested for one thing I don’t have” and “the service is down proper now.” These two failures have very totally different applicable responses — retry later versus choose a special enter — and giving the mannequin a transparent sign helps it choose the appropriate one.
In manufacturing, you’d lengthen this sample with a retry-with-backoff coverage earlier than falling via to the fallback. The construction stays the identical: the dispatcher distinguishes recoverable from unrecoverable failures, and the mannequin is advised sufficient about each to make a wise subsequent transfer.
Placing It All Collectively
Time to truly run the factor. Right here’s a question that workouts all the pieces — a number of cities, a number of instruments, and an intentional dangerous enter to set off error restoration in flight:
|
python principal.py “What is the climate in London, Tokyo, and Atlantis proper now? And convert 50 GBP to JPY.” |
The precise iteration depend and tool-call ordering will differ from run to run relying on how Gemma decides to sequence the work, however right here’s a consultant hint, barely trimmed:

Take a look at what occurred in iteration 3. The mannequin requested about Atlantis, the device raised ValueError, the dispatcher transformed it into an error message itemizing the legitimate cities, and the mannequin — on iteration 5 — folded that data right into a clear response. It didn’t retry Atlantis. It didn’t crash. It observed the failure, built-in it with the profitable outcomes, and produced a solution that acknowledged the limitation. That’s your complete payoff of the error-recovery structure in a single hint.
To see swish degradation in motion, flip SIMULATE_GEOCODING_OUTAGE to True and run a question that asks for native time:
|
python principal.py “What is the native time in London and Paris?” |
About 60% of the time you’ll see the [cached] prefix within the device outcome and the mannequin will point out the cached supply in its last response. The remainder of the time the device will return efficiently and the cached path gained’t set off. Both means, the loop completes and the consumer will get a solution.
Conclusion
We constructed three issues on prime of the inspiration from the primary tutorial: an iterative agent loop with a tough iteration cap, a layered dispatcher that catches each class of device failure, and power features whose error messages educate the mannequin get better. Collectively they’re the distinction between a tool-calling demo and an agent you’d truly need to depart operating unsupervised.
A couple of pure subsequent steps embody:
- Persistent reminiscence throughout periods, so the agent can bear in mind what it realized about you final week
- Retry-with-backoff insurance policies for transient upstream failures
- Reincorporating the exterior APIs rather than the static lookup tables, which largely simply means accepting that timeouts and charge limits change into a part of the traditional failure floor
The full script is on GitHub. Clone it, run it, break it intentionally to look at the restoration in motion, and incorporate the subsequent steps above.

