the usual “text-in, text-out” paradigm simply takes you to date.
An actual software that gives actual worth ought to be capable of discover the explanations via visible, complicated issues and produce outcomes that the system can truly use.
On this publish, we are going to design this stack by bringing collectively three highly effective options. Multimodal enter, inference, and Structured output.
As an instance this, we are going to present a sensible instance. Time-series abnormality detection system For e-commerce order knowledge Openai’s O3 mannequin. Particularly, it reveals methods to pair O3’s inference performance with picture enter and emit validated JSON, making it simple to devour by downstream methods.
Lastly, our app is:
- look: Analyze the time sequence chart of e-commerce order volumes
- suppose: Establish irregular patterns
- Combine: Output structured abnormality reviews
It leaves behind a practical code that may be reused for a wide range of use circumstances past mere anomaly detection.
Let’s dive in.
Are you curious about studying extra concerning the broader panorama than how LLM is utilized to anomaly detection? Take a look at my earlier posts: LLMS enhances anomaly detection, Right here we summarized seven new software patterns Do not miss it.
1. Case research
On this publish, we purpose to construct an anomaly detection resolution to determine anomaly patterns in e-commerce order time sequence knowledge.
On this case examine, we generated 3 units Synthesis Every day order knowledge. The dataset represents three totally different profiles of day by day orders over roughly a month. We lined the weekend to disclose seasonality. The X-axis signifies the day of the week.


Every diagram comprises one explicit sort of anomaly (Can you discover them?). Later, we are going to use these numbers to check anomaly detection options and see if they will precisely get better from these anomalies.
2. Our Resolution
2.1 Overview
In contrast to conventional machine studying approaches that require boring practical engineering and mannequin coaching, the present method is far simpler. The next steps work:
- Put together diagrams to visualise e-commerce ordinal time sequence knowledge.
- It prompts the inference mannequin O3, and asks it to look at the time sequence photos provided to it to it to find out whether or not an uncommon sample exists.
- The O3 mannequin outputs the leads to a predefined JSON format.
And that is all. Easy.
After all, to supply this resolution, the O3 mannequin should be capable of purchase picture enter and emit a structured output. You’ll quickly understand how to try this.
2.2 Setting the inference mannequin
As talked about earlier than, we are going to use the O3 mannequin. That is OpenaI’s flagship inference mannequin, permitting you to sort out complicated multi-step issues with cutting-edge efficiency. Particularly, you employ an Azure Openai endpoint to invoke the mannequin.
Ensure you place the endpoint, API key, and deployment identify in .env You may then proceed to the LLM shopper setup.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from openai import AzureOpenAI
from dotenv import load_dotenv
import os
load_dotenv()
# Setup LLM shopper
endpoint = os.getenv("api_base")
api_key = os.getenv("o3_API_KEY")
api_version = "2025-04-01-preview"
model_name = "o3"
deployment = os.getenv("deployment_name")
LLM_client = AzureOpenAI(
api_key=api_key,
api_version=api_version,
azure_endpoint=endpoint
)
Use the next directions as system messages for the O3 mannequin (adjusted with GPT-5):
instruction = f"""
[Role]
You're a meticulous knowledge analyst.
[Task]
You may be given a line chart picture associated to day by day e-commerce orders.
Your job is to determine outstanding anomalies within the knowledge.
[Rules]
The anomaly sorts could be spike, drop, level_shift, or seasonal_outlier.
A level_shift is a sustained baseline change (≥ 5 consecutive days), not a single level.
A seasonal_outlier occurs if a weekend/weekday behaves in contrast to friends in its class.
For instance, weekend orders are normally decrease than the weekdays'.
Learn dates/values from axes; in the event you can’t learn precisely, snap to the closest tick and word uncertainty in rationalization.
The weekends are shaded within the determine.
"""
Within the directions above, we clearly outlined the function of the LLM, the duties that LLM ought to full, and the foundations that LLM ought to comply with.
To restrict the complexity of the case examine, we intentionally specified solely 4 anomaly sorts that LLM should determine. We additionally supplied a transparent definition of those anomaly sorts to eradicate ambiguity.
Lastly, I injected a little bit of area data about e-commerce patterns. Which means we count on a drop in orders over the weekend in comparison with weekdays. Incorporating area know-how is mostly thought-about an excellent observe to information the evaluation strategy of the mannequin.
Now that the mannequin is about up, let’s clarify methods to put together a picture for the O3 mannequin to devour.
2.3 Picture preparation
To allow the multimodal function of O3, you will need to present the determine as a particular format: a printed net URL or a Base64 encoded knowledge URL. Since numbers are generated regionally, we use the second method.
Anyway, what’s base64 encoding? Base64 is a method to signify binary knowledge (corresponding to picture information) utilizing solely textual content characters that may be despatched safely over the Web. Converts binary picture knowledge right into a sequence of letters, numbers, and a number of other symbols.
And what concerning the knowledge URL? Information URLs are kinds of URLs that embed file content material instantly into the URL string, fairly than pointing to the file location.
You should utilize the next capabilities to mechanically deal with this transformation:
import io
import base64
def fig_to_data_url(fig, fmt="png"):
"""
Converts a Matplotlib determine to a base64 knowledge URL with out saving to disk.
Args:
-----
fig (matplotlib.determine.Determine): The determine to transform.
fmt (str): The format of the picture ("png", "jpeg", and so forth.)
Returns:
--------
str: The info URL representing the determine.
"""
buf = io.BytesIO()
fig.savefig(buf, format=fmt, bbox_inches="tight")
buf.search(0)
base64_encoded_data = base64.b64encode(buf.learn()).decode("utf-8")
mime_type = f"picture/{fmt.decrease()}"
return f"knowledge:{mime_type};base64,{base64_encoded_data}"
Primarily, our perform first shops the Matplotlib diagram in a reminiscence buffer. Subsequent, the binary PNG knowledge is encoded as base64 textual content and wrapped within the desired knowledge URL format.
Assuming you’ve gotten entry to artificial day by day order knowledge, you should use the next perform to generate plots and convert them to the suitable knowledge URL format without delay:
def create_fig(df):
"""
Create a Matplotlib determine and convert it to a base64 knowledge URL.
Weekends (Sat–Solar) are shaded.
Args:
-----
df: dataframe comprises one profile of day by day order time sequence.
dataframe has "date" and "orders" columns.
Returns:
--------
image_url: The info URL representing the determine.
"""
df = df.copy()
df['date'] = pd.to_datetime(df['date'])
fig, ax = plt.subplots(figsize=(8, 4.5))
ax.plot(df["date"], df["orders"], linewidth=2)
ax.set_xlabel('Date', fontsize=14)
ax.set_ylabel('Every day Orders', fontsize=14)
# Weekend shading
begin = df["date"].min().normalize()
finish = df["date"].max().normalize()
cur = begin
whereas cur <= finish:
if cur.weekday() == 5: # Saturday 00:00
span_start = cur # Sat 00:00
span_end = cur + pd.Timedelta(days=1) # Mon 00:00
ax.axvspan(span_start, span_end, alpha=0.12, zorder=0)
cur += pd.Timedelta(days=2) # skip Sunday
else:
cur += pd.Timedelta(days=1)
# Title
title = f'Every day Orders: {df["date"].min():%b %d, %Y} - {df["date"].max():%b %d, %Y}'
ax.set_title(title, fontsize=16)
# Format x-axis dates
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b %d'))
ax.xaxis.set_major_locator(mdates.WeekdayLocator(interval=1))
plt.tight_layout()
# Get hold of url
image_url = fig_to_data_url(fig)
return image_url
Figures 1-3 are generated by the plot routines above.
2.4 Structured Output
On this part, we are going to clarify how to make sure that the O3 mannequin outputs a constant JSON format as a substitute of freeform textual content. This is called “structured output” and is without doubt one of the key enablers for integrating LLM into present automated workflows.
To attain this, we begin by defining a schema that manages the anticipated output construction. Use the Pydantic mannequin.
from pydantic import BaseModel, Discipline
from typing import Literal
from datetime import date
AnomalyKind = Literal["spike", "drop", "level_shift", "seasonal_outlier"]
class DateWindow(BaseModel):
begin: date = Discipline(description="Earliest believable date the anomaly begins (ISO YYYY-MM-DD)")
finish: date = Discipline(description="Newest believable date the anomaly ends, inclusive (ISO YYYY-MM-DD)")
class AnomalyReport(BaseModel):
when: DateWindow = Discipline(
description=(
"Minimal window that comprises the anomaly. "
"For single-point anomalies, use the interval that covers studying uncertainty, if the tick labels are unclear"
)
)
y: int = Discipline(description="Approx worth on the anomaly’s most consultant day (peak/lowest), rounded")
form: AnomalyKind = Discipline(description="The kind of the anomaly")
why: str = Discipline(description="One-sentence cause for why this window is uncommon")
date_confidence: Literal["low","medium","high"] = Discipline(
default="medium", description="Confidence that the window localization is appropriate"
)
Our Pydantic schema makes an attempt to seize each quantitative and qualitative facets of the detected anomalies. For every discipline, specify its knowledge sort (for instance: int For numbers, Literal (for instance, a hard and fast set of choices).
I will additionally use it Discipline Means to supply an in depth description of every key. These explanations are significantly necessary as they work successfully as inline directions for O3, and due to this fact perceive the semantic which means of every part.
This covers multimodal inputs and structured outputs. That is the time to place them collectively in a single LLM name.
2.5 Calling the O3 mannequin
Use to work together with O3 utilizing multimodal inputs and structured outputs LLM_client.beta.chat.completions.parse() API. Vital arguments embody:
mannequin:Develop identify.messages: Message object despatched to the O3 mannequin.max_completion_token: The utmost variety of tokens the mannequin can generate within the ultimate response. Word that for inference fashions like O3, they internally generate Reasoning_Tokens to “suppose” the issue. the presentmax_completion_tokenLimits solely seen output tokens that customers obtain.response_format: A Pydantic mannequin that defines the anticipated JSON schema construction.reasoning_effort: A management knob that signifies the quantity of calculation effort that O3 makes use of for inference. Accessible choices embody low, medium and excessive.
You may outline helper capabilities to work together with the O3 mannequin.
def anomaly_detection(instruction, fig_path,
response_format, immediate=None,
deployment="o3", reasoning_effort="excessive"):
# Compose messages
messages=[
{ "role": "system", "content": instruction},
{ "role": "user", "content": [
{
"type": "image_url",
"image_url": {
"url": fig_path,
"detail": "high"
}
},
]}
]
# Add immediate whether it is given
if immediate shouldn't be None:
messages[1]["content"].append({"sort": "textual content", "textual content": immediate})
# Invoke LLM API
response = LLM_client.beta.chat.completions.parse(
mannequin=deployment,
messages=messages,
max_completion_tokens=4000,
reasoning_effort=reasoning_effort,
response_format=response_format
)
return response.decisions[0].message.parsed.model_dump()
Please watch out about messages The item accepts each textual content and picture content material. Textual content prompts are non-obligatory as they solely use numbers to immediate the mannequin.
Set "element": "excessive" To allow excessive decision picture processing. That is essential in present case research, as O3 is required to higher learn nice particulars corresponding to Axis Tick labels, knowledge level values, and refined visible patterns. Nonetheless, word that high-determined processing leads to elevated tokens and better API prices.
Lastly, use .parsed.model_dump()converts the JSON output to an everyday Python dictionary.
That is for implementation. Let’s take a look at among the outcomes subsequent.
3. end result
On this part, you may be requested to enter beforehand generated numbers into the O3 mannequin and determine potential anomalies.
3.1 Spike abnormality
# df_spike_anomaly is the dataframe of the primary set of artificial knowledge (Determine 1)
spike_anomaly_url = create_fig(df_spike_anomaly)
# Anomaly detection
end result = anomaly_detection(instruction,
spike_anomaly_url,
response_format=AnomalyReport,
reasoning_effort="medium")
print(end result)
With the above name, spike_anomaly_url That is the info URL in Determine 1. The ensuing output is proven under.
{
'when': {'begin': datetime.date(2025, 8, 19), 'finish': datetime.date(2025, 8, 21)},
'y': 166,
'form': 'spike',
'why': 'Single day orders bounce to ~166, far above adjoining days that sit close to 120–130.',
'date_confidence': 'medium'
}
You may see that the O3 mannequin returns the output precisely within the designed format. Now you’ll be able to seize this end result and generate a visualization programmatically.
# Create picture
fig, ax = plt.subplots(figsize=(8, 4.5))
df_spike_anomaly['date'] = pd.to_datetime(df_spike_anomaly['date'])
ax.plot(df_spike_anomaly["date"], df_spike_anomaly["orders"], linewidth=2)
ax.set_xlabel('Date', fontsize=14)
ax.set_ylabel('Every day Orders', fontsize=14)
# Format x-axis dates
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b %d'))
ax.xaxis.set_major_locator(mdates.WeekdayLocator(interval=1))
# Add anomaly overlay
start_date = pd.to_datetime(end result['when']['start'])
end_date = pd.to_datetime(end result['when']['end'])
# Add shaded area
ax.axvspan(start_date, end_date, alpha=0.3, colour='crimson', label=f"Anomaly ({end result['kind']})")
# Add textual content annotation
mid_date = start_date + (end_date - start_date) / 2 # Center of anomaly window
ax.annotate(
end result['why'],
xy=(mid_date, end result['y']),
xytext=(10, 20), # Offset from the purpose
textcoords='offset factors',
bbox=dict(boxstyle='spherical,pad=0.5', fc='yellow', alpha=0.7),
arrowprops=dict(arrowstyle='->', connectionstyle='arc3,rad=0.1'),
fontsize=10,
wrap=True
)
# Add legend
ax.legend()
plt.xticks(rotation=0)
plt.tight_layout()
The generated visualization appears like this:

It may be seen that the O3 mannequin appropriately recognized the spike anomaly introduced on this first artificial dataset.
It isn’t dangerous. Specifically, think about the truth that it solely inspired LLM and didn’t carry out conventional mannequin coaching.
3.2 Stage shift abnormality
# df_level_shift_anomaly is the dataframe of the 2nd set of artificial knowledge (Determine 2)
level_shift_anomaly_url = create_fig(df_level_shift_anomaly)
# Anomaly detection
end result = anomaly_detection(instruction,
level_shift_anomaly_url,
response_format=AnomalyReport,
reasoning_effort="medium")
print(end result)
The ensuing output is proven under.
{
'when': {'begin': datetime.date(2025, 8, 26), 'finish': datetime.date(2025, 9, 2)},
'y': 150,
'form': 'level_shift',
'why': 'Orders abruptly bounce from the 120-135 vary to ~150 on Aug 26 and stay elevated for all subsequent days, indicating a sustained baseline change.',
'date_confidence': 'excessive'
}
Once more, we will see that the mannequin has precisely recognized the presence of an anomaly of “level_shift” within the plot.

3.3 Seasonal abnormalities
# df_seasonality_anomaly is the dataframe of the third set of artificial knowledge (Determine 3)
seasonality_anomaly_url = create_fig(df_seasonality_anomaly)
# Anomaly detection
end result = anomaly_detection(instruction,
seasonality_anomaly_url,
response_format=AnomalyReport,
reasoning_effort="medium")
print(end result)
The ensuing output is proven under.
{
'when': {'begin': datetime.date(2025, 8, 23), 'finish': datetime.date(2025, 8, 24)},
'y': 132,
'form': 'seasonal_outlier',
'why': 'Weekend of Aug 23-24 reveals order volumes (~130+) on par with surrounding weekdays, whereas different weekends constantly drop to ~115, making it an out-of-season spike.',
'date_confidence': 'excessive'
}
This can be a difficult case. Nonetheless, our O3 mannequin was capable of correctly sort out it with exact localization and clear inference traces. Fairly spectacular:

4. overview
Congratulations! We have now efficiently constructed an anomaly detection resolution for time sequence knowledge that’s absolutely practical with visualization and prompts.
By feeding day by day order plots to the O3 inference mannequin and constraining its output to the JSON schema, LLM was capable of determine three totally different anomaly sorts with exact localization. All this was achieved with out coaching the ML mannequin. Spectacular!
A step backwards, we see that the answer we constructed displays a broader sample of mixing three options.
- look: Multimodal inputs enable the mannequin to devour numbers instantly.
- suppose: A step-by-step inference perform for tackling complicated issues.
- Combine: Structured output (eg, producing visualizations) that may be simply consumed by downstream methods.
The mix of multimodal enter + inference + structured output creates a flexible basis for virtually helpful LLM functions.
Now your constructing blocks are prepared. What do you need to make subsequent?

