On this tutorial, you’ll construct a workflow utilizing: overview Produce structured, type-safe output from language fashions. Use typed constraints similar to Literal, int, and bool, design immediate templates utilizing outlines.Template, and implement strict schema validation utilizing Pydantic fashions. It additionally implements sturdy JSON restoration and a operate calling model that generates validated arguments to soundly execute Python capabilities. All through the tutorial, we’ll give attention to reliability, constraint enforcement, and production-grade structured era.
import os, sys, subprocess, json, textwrap, re
subprocess.check_call([sys.executable, "-m", "pip", "install", "-q",
"outlines", "transformers", "accelerate", "sentencepiece", "pydantic"])
import torch
import outlines
from transformers import AutoTokenizer, AutoModelForCausalLM
from typing import Literal, Record, Union, Annotated
from pydantic import BaseModel, Subject
from enum import Enum
print("Torch:", torch.__version__)
print("CUDA out there:", torch.cuda.is_available())
print("Outlines:", getattr(outlines, "__version__", "unknown"))
gadget = "cuda" if torch.cuda.is_available() else "cpu"
print("Utilizing gadget:", gadget)
MODEL_NAME = "HuggingFaceTB/SmolLM2-135M-Instruct"
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, use_fast=True)
hf_model = AutoModelForCausalLM.from_pretrained(
MODEL_NAME,
torch_dtype=torch.float16 if gadget == "cuda" else torch.float32,
device_map="auto" if gadget == "cuda" else None,
)
if gadget == "cpu":
hf_model = hf_model.to(gadget)
mannequin = outlines.from_transformers(hf_model, tokenizer)
def build_chat(user_text: str, system_text: str = "You're a exact assistant. Observe directions precisely.") -> str:
attempt:
msgs = [{"role": "system", "content": system_text}, {"role": "user", "content": user_text}]
return tokenizer.apply_chat_template(msgs, tokenize=False, add_generation_prompt=True)
besides Exception:
return f"{system_text}nnUser: {user_text}nAssistant:"
def banner(title: str):
print("n" + "=" * 90)
print(title)
print("=" * 90)
Set up all required dependencies and initialize the define pipeline utilizing a light-weight instruction mannequin. Configure gadget processing in order that the system robotically switches between CPUs and GPUs based mostly on availability. We’ll additionally construct reusable helper capabilities for chat formatting and clear part banners to construction your workflow.
def extract_json_object(s: str) -> str:
s = s.strip()
begin = s.discover("{")
if begin == -1:
return s
depth = 0
in_str = False
esc = False
for i in vary(begin, len(s)):
ch = s[i]
if in_str:
if esc:
esc = False
elif ch == "":
esc = True
elif ch == '"':
in_str = False
else:
if ch == '"':
in_str = True
elif ch == "{":
depth += 1
elif ch == "}":
depth -= 1
if depth == 0:
return s[start:i + 1]
return s[start:]
def json_repair_minimal(dangerous: str) -> str:
dangerous = dangerous.strip()
final = dangerous.rfind("}")
if final != -1:
return dangerous[:last + 1]
return dangerous
def safe_validate(model_cls, raw_text: str):
uncooked = extract_json_object(raw_text)
attempt:
return model_cls.model_validate_json(uncooked)
besides Exception:
raw2 = json_repair_minimal(uncooked)
return model_cls.model_validate_json(raw2)
banner("2) Typed outputs (Literal / int / bool)")
sentiment = mannequin(
build_chat("Analyze the sentiment: 'This product utterly modified my life!'. Return one label solely."),
Literal["Positive", "Negative", "Neutral"],
max_new_tokens=8,
)
print("Sentiment:", sentiment)
bp = mannequin(build_chat("What is the boiling level of water in Celsius? Return integer solely."), int, max_new_tokens=8)
print("Boiling level (int):", bp)
prime = mannequin(build_chat("Is 29 a chief quantity? Return true or false solely."), bool, max_new_tokens=6)
print("Is prime (bool):", prime)
We implement sturdy JSON extraction and minimal restore utilities to soundly recuperate structured output from incomplete generations. Subsequent, display strongly typed era utilizing Literals, ints, and bools in order that the mannequin returns tightly constrained values. Study how outlines can instantly implement deterministic type-safe output at era time.
banner("3) Immediate templating (outlines.Template)")
tmpl = outlines.Template.from_string(textwrap.dedent("""
<|system|>
You're a strict classifier. Return ONLY one label.
<|person|>
Classify sentiment of this textual content:
{{ textual content }}
Labels: Constructive, Damaging, Impartial
<|assistant|>
""").strip())
templated = mannequin(tmpl(textual content="The meals was chilly however the workers had been type."), Literal["Positive","Negative","Neutral"], max_new_tokens=8)
print("Template sentiment:", templated)
Use outlines.Template to construct structured immediate templates with strict output management. Dynamically insert person enter into templates whereas preserving position formatting and classification constraints. We present how templates enhance reusability and guarantee constant and constrained responses.
banner("4) Pydantic structured output (superior constraints)")
class TicketPriority(str, Enum):
low = "low"
medium = "medium"
excessive = "excessive"
pressing = "pressing"
IPv4 = Annotated[str, Field(pattern=r"^((25[0-5]|2[0-4]d|[01]?dd?).){3}(25[0-5]|2[0-4]d|[01]?dd?)$")]
ISODate = Annotated[str, Field(pattern=r"^d{4}-d{2}-d{2}$")]
class ServiceTicket(BaseModel):
precedence: TicketPriority
class: Literal["billing", "login", "bug", "feature_request", "other"]
requires_manager: bool
abstract: str = Subject(min_length=10, max_length=220)
action_items: Record[str] = Subject(min_length=1, max_length=6)
class NetworkIncident(BaseModel):
affected_service: Literal["dns", "vpn", "api", "website", "database"]
severity: Literal["sev1", "sev2", "sev3"]
public_ip: IPv4
start_date: ISODate
mitigation: Record[str] = Subject(min_length=2, max_length=6)
electronic mail = """
Topic: URGENT - Can not entry my account after fee
I paid for the premium plan 3 hours in the past and nonetheless cannot entry any options.
I've a shopper presentation in an hour and wish the analytics dashboard.
Please repair this instantly or refund my fee.
""".strip()
ticket_text = mannequin(
build_chat(
"Extract a ServiceTicket from this message.n"
"Return JSON ONLY matching the ServiceTicket schema.n"
"Motion objects should be distinct.nnMESSAGE:n" + electronic mail
),
ServiceTicket,
max_new_tokens=240,
)
ticket = safe_validate(ServiceTicket, ticket_text) if isinstance(ticket_text, str) else ticket_text
print("ServiceTicket JSON:n", ticket.model_dump_json(indent=2))
Outline superior Pydantic schemas utilizing enums, common expression constraints, discipline restrictions, and structured lists. Extract advanced ServiceTicket objects from uncooked electronic mail textual content and validate them utilizing schema-driven decoding. It additionally applies safe validation logic to handle edge circumstances and guarantee robustness at manufacturing scale.
banner("5) Operate-calling model (schema -> args -> name)")
class AddArgs(BaseModel):
a: int = Subject(ge=-1000, le=1000)
b: int = Subject(ge=-1000, le=1000)
def add(a: int, b: int) -> int:
return a + b
args_text = mannequin(
build_chat("Return JSON ONLY with two integers a and b. Make a odd and b even."),
AddArgs,
max_new_tokens=80,
)
args = safe_validate(AddArgs, args_text) if isinstance(args_text, str) else args_text
print("Args:", args.model_dump())
print("add(a,b) =", add(args.a, args.b))
print("Tip: For greatest velocity and fewer truncations, swap Colab Runtime → GPU.")
Implement function-call model workflows by producing structured arguments that conform to an outlined schema. Validate generated arguments and safely run Python capabilities with validated inputs. We display how schema-first era allows managed device invocation and dependable LLM-driven computation.
In conclusion, we’ve carried out a totally structured manufacturing pipeline utilizing outlines with robust typing, schema validation, and managed decoding. We demonstrated find out how to transfer from easy typed output to superior Pydantic-based extraction and functional-style execution patterns. We additionally constructed resilience by means of JSON salvage and validation mechanisms to make the system sturdy to incomplete mannequin outputs. General, we’ve created a sensible, production-oriented framework for deterministic, safe, schema-driven LLM purposes.
take a look at Full code here. Additionally, be happy to comply with us Twitter Do not forget to affix us 120,000+ ML subreddits and subscribe our newsletter. cling on! Are you on telegram? You can now also participate by telegram.

