On this tutorial, diffuser library. We first stabilize the atmosphere and use steady diffusion with an optimized scheduler to generate high-quality pictures from textual content prompts. We speed up inference with a LoRA-based latent consistency strategy, information synthesis with ControlNet beneath edge conditioning, and eventually carry out localized edits with restore. We additionally place emphasis on sensible strategies that keep in mind the stability between picture high quality, velocity, and operability.
!pip -q uninstall -y pillow Pillow || true
!pip -q set up --upgrade --force-reinstall "pillow<12.0"
!pip -q set up --upgrade diffusers transformers speed up safetensors huggingface_hub opencv-python
import os, math, random
import torch
import numpy as np
import cv2
from PIL import Picture, ImageDraw, ImageFilter
from diffusers import (
StableDiffusionPipeline,
StableDiffusionInpaintPipeline,
ControlNetModel,
StableDiffusionControlNetPipeline,
UniPCMultistepScheduler,
)
Put together a clear and suitable runtime by resolving dependency conflicts and putting in all required libraries. By pinning the proper Pillow model and loading the Diffusers ecosystem, we be sure that picture processing works reliably. It additionally imports all of the core modules wanted to generate, management, and restore workflows.
def seed_everything(seed=42):
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
def to_grid(pictures, cols=2, bg=255):
if isinstance(pictures, Picture.Picture):
pictures = [images]
w, h = pictures[0].measurement
rows = math.ceil(len(pictures) / cols)
grid = Picture.new("RGB", (cols*w, rows*h), (bg, bg, bg))
for i, im in enumerate(pictures):
grid.paste(im, ((i % cols)*w, (i // cols)*h))
return grid
system = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.float16 if system == "cuda" else torch.float32
print("system:", system, "| dtype:", dtype)
Outline utility capabilities to make sure reproducibility and effectively manage visible output. Setting a world random seed ensures technology consistency between runs. It additionally detects accessible {hardware} and units precision to optimize GPU or CPU efficiency.
seed_everything(7)
BASE_MODEL = "runwayml/stable-diffusion-v1-5"
pipe = StableDiffusionPipeline.from_pretrained(
BASE_MODEL,
torch_dtype=dtype,
safety_checker=None,
).to(system)
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
if system == "cuda":
pipe.enable_attention_slicing()
pipe.enable_vae_slicing()
immediate = "a cinematic picture of a futuristic road market at nightfall, ultra-detailed, 35mm, volumetric lighting"
negative_prompt = "blurry, low high quality, deformed, watermark, textual content"
img_text = pipe(
immediate=immediate,
negative_prompt=negative_prompt,
num_inference_steps=25,
guidance_scale=6.5,
width=768,
top=512,
).pictures[0]
Initialize the essential Secure Diffusion pipeline and swap to the extra environment friendly UniPC scheduler. Generate high-quality pictures straight from textual content prompts utilizing fastidiously chosen steerage and determination settings. This establishes a robust baseline for subsequent enhancements in velocity and management.
LCM_LORA = "latent-consistency/lcm-lora-sdv1-5"
pipe.load_lora_weights(LCM_LORA)
attempt:
pipe.fuse_lora()
lora_fused = True
besides Exception as e:
lora_fused = False
print("LoRA fuse skipped:", e)
fast_prompt = "a clear product picture of a minimal smartwatch on a reflective floor, studio lighting"
fast_images = []
for steps in [4, 6, 8]:
fast_images.append(
pipe(
immediate=fast_prompt,
negative_prompt=negative_prompt,
num_inference_steps=steps,
guidance_scale=1.5,
width=768,
top=512,
).pictures[0]
)
grid_fast = to_grid(fast_images, cols=3)
print("LoRA fused:", lora_fused)
W, H = 768, 512
format = Picture.new("RGB", (W, H), "white")
draw = ImageDraw.Draw(format)
draw.rectangle([40, 80, 340, 460], define="black", width=6)
draw.ellipse([430, 110, 720, 400], define="black", width=6)
draw.line([0, 420, W, 420], fill="black", width=5)
edges = cv2.Canny(np.array(format), 80, 160)
edges = np.stack([edges]*3, axis=-1)
canny_image = Picture.fromarray(edges)
CONTROLNET = "lllyasviel/sd-controlnet-canny"
controlnet = ControlNetModel.from_pretrained(
CONTROLNET,
torch_dtype=dtype,
).to(system)
cn_pipe = StableDiffusionControlNetPipeline.from_pretrained(
BASE_MODEL,
controlnet=controlnet,
torch_dtype=dtype,
safety_checker=None,
).to(system)
cn_pipe.scheduler = UniPCMultistepScheduler.from_config(cn_pipe.scheduler.config)
if system == "cuda":
cn_pipe.enable_attention_slicing()
cn_pipe.enable_vae_slicing()
cn_prompt = "a contemporary cafe inside, architectural render, mushy daylight, excessive element"
img_controlnet = cn_pipe(
immediate=cn_prompt,
negative_prompt=negative_prompt,
picture=canny_image,
num_inference_steps=25,
guidance_scale=6.5,
controlnet_conditioning_scale=1.0,
).pictures[0]
We speed up inference by loading and fusing LoRA adapters and display quick sampling with only a few diffusion steps. Subsequent, we apply ControlNet to construct a structural alignment picture and information the format of the generated scene. This lets you keep construction whereas nonetheless benefiting from artistic textual content steerage.
masks = Picture.new("L", img_controlnet.measurement, 0)
mask_draw = ImageDraw.Draw(masks)
mask_draw.rectangle([60, 90, 320, 170], fill=255)
masks = masks.filter(ImageFilter.GaussianBlur(2))
inpaint_pipe = StableDiffusionInpaintPipeline.from_pretrained(
BASE_MODEL,
torch_dtype=dtype,
safety_checker=None,
).to(system)
inpaint_pipe.scheduler = UniPCMultistepScheduler.from_config(inpaint_pipe.scheduler.config)
if system == "cuda":
inpaint_pipe.enable_attention_slicing()
inpaint_pipe.enable_vae_slicing()
inpaint_prompt = "a glowing neon signal that claims 'CAFÉ', cyberpunk model, real looking lighting"
img_inpaint = inpaint_pipe(
immediate=inpaint_prompt,
negative_prompt=negative_prompt,
picture=img_controlnet,
mask_image=masks,
num_inference_steps=30,
guidance_scale=7.0,
).pictures[0]
os.makedirs("outputs", exist_ok=True)
img_text.save("outputs/text2img.png")
grid_fast.save("outputs/lora_fast_grid.png")
format.save("outputs/format.png")
canny_image.save("outputs/canny.png")
img_controlnet.save("outputs/controlnet.png")
masks.save("outputs/masks.png")
img_inpaint.save("outputs/inpaint.png")
print("Saved outputs:", sorted(os.listdir("outputs")))
print("Finished.")
Create masks to isolate particular areas and apply inpainting to vary solely that a part of the picture. Alter chosen areas utilizing focused prompts whereas leaving the remainder as is. Lastly, save all intermediate and remaining outputs to disk for inspection and reuse.
In conclusion, we have now demonstrated how a single Diffuser pipeline can evolve into a versatile, production-ready picture technology system. We confirmed you learn how to transfer from pure text-to-image technology to quick sampling, structural management, and focused picture modifying with out altering frameworks or instruments. This tutorial reveals learn how to mix a scheduler, LoRA adapter, ControlNet, and remediation to create a controllable and environment friendly manufacturing pipeline that may simply be prolonged to extra superior artistic or utilized use circumstances.
Please test Full code here. Additionally, be at liberty to comply with us Twitter Do not forget to hitch us 100,000+ ML subreddits and subscribe our newsletter. hold on! Are you on telegram? You can now also participate by telegram.

