Coding information for producing, controlling, and modifying high-quality pictures utilizing HuggingFace diffusers

by root February 21, 2026

written by root February 21, 2026 0 comment 59 views

On this tutorial, diffuser library. We first stabilize the atmosphere and use steady diffusion with an optimized scheduler to generate high-quality pictures from textual content prompts. We speed up inference with a LoRA-based latent consistency strategy, information synthesis with ControlNet beneath edge conditioning, and eventually carry out localized edits with restore. We additionally place emphasis on sensible strategies that keep in mind the stability between picture high quality, velocity, and operability.

!pip -q uninstall -y pillow Pillow || true
!pip -q set up --upgrade --force-reinstall "pillow<12.0"
!pip -q set up --upgrade diffusers transformers speed up safetensors huggingface_hub opencv-python


import os, math, random
import torch
import numpy as np
import cv2
from PIL import Picture, ImageDraw, ImageFilter
from diffusers import (
   StableDiffusionPipeline,
   StableDiffusionInpaintPipeline,
   ControlNetModel,
   StableDiffusionControlNetPipeline,
   UniPCMultistepScheduler,
)

Put together a clear and suitable runtime by resolving dependency conflicts and putting in all required libraries. By pinning the proper Pillow model and loading the Diffusers ecosystem, we be sure that picture processing works reliably. It additionally imports all of the core modules wanted to generate, management, and restore workflows.

def seed_everything(seed=42):
   random.seed(seed)
   np.random.seed(seed)
   torch.manual_seed(seed)
   torch.cuda.manual_seed_all(seed)


def to_grid(pictures, cols=2, bg=255):
   if isinstance(pictures, Picture.Picture):
       pictures = [images]
   w, h = pictures[0].measurement
   rows = math.ceil(len(pictures) / cols)
   grid = Picture.new("RGB", (cols*w, rows*h), (bg, bg, bg))
   for i, im in enumerate(pictures):
       grid.paste(im, ((i % cols)*w, (i // cols)*h))
   return grid


system = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.float16 if system == "cuda" else torch.float32
print("system:", system, "| dtype:", dtype)

Outline utility capabilities to make sure reproducibility and effectively manage visible output. Setting a world random seed ensures technology consistency between runs. It additionally detects accessible {hardware} and units precision to optimize GPU or CPU efficiency.

seed_everything(7)
BASE_MODEL = "runwayml/stable-diffusion-v1-5"


pipe = StableDiffusionPipeline.from_pretrained(
   BASE_MODEL,
   torch_dtype=dtype,
   safety_checker=None,
).to(system)


pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)


if system == "cuda":
   pipe.enable_attention_slicing()
   pipe.enable_vae_slicing()


immediate = "a cinematic picture of a futuristic road market at nightfall, ultra-detailed, 35mm, volumetric lighting"
negative_prompt = "blurry, low high quality, deformed, watermark, textual content"


img_text = pipe(
   immediate=immediate,
   negative_prompt=negative_prompt,
   num_inference_steps=25,
   guidance_scale=6.5,
   width=768,
   top=512,
).pictures[0]

Initialize the essential Secure Diffusion pipeline and swap to the extra environment friendly UniPC scheduler. Generate high-quality pictures straight from textual content prompts utilizing fastidiously chosen steerage and determination settings. This establishes a robust baseline for subsequent enhancements in velocity and management.

LCM_LORA = "latent-consistency/lcm-lora-sdv1-5"
pipe.load_lora_weights(LCM_LORA)


attempt:
   pipe.fuse_lora()
   lora_fused = True
besides Exception as e:
   lora_fused = False
   print("LoRA fuse skipped:", e)


fast_prompt = "a clear product picture of a minimal smartwatch on a reflective floor, studio lighting"
fast_images = []
for steps in [4, 6, 8]:
   fast_images.append(
       pipe(
           immediate=fast_prompt,
           negative_prompt=negative_prompt,
           num_inference_steps=steps,
           guidance_scale=1.5,
           width=768,
           top=512,
       ).pictures[0]
   )


grid_fast = to_grid(fast_images, cols=3)
print("LoRA fused:", lora_fused)


W, H = 768, 512
format = Picture.new("RGB", (W, H), "white")
draw = ImageDraw.Draw(format)
draw.rectangle([40, 80, 340, 460], define="black", width=6)
draw.ellipse([430, 110, 720, 400], define="black", width=6)
draw.line([0, 420, W, 420], fill="black", width=5)


edges = cv2.Canny(np.array(format), 80, 160)
edges = np.stack([edges]*3, axis=-1)
canny_image = Picture.fromarray(edges)


CONTROLNET = "lllyasviel/sd-controlnet-canny"
controlnet = ControlNetModel.from_pretrained(
   CONTROLNET,
   torch_dtype=dtype,
).to(system)


cn_pipe = StableDiffusionControlNetPipeline.from_pretrained(
   BASE_MODEL,
   controlnet=controlnet,
   torch_dtype=dtype,
   safety_checker=None,
).to(system)


cn_pipe.scheduler = UniPCMultistepScheduler.from_config(cn_pipe.scheduler.config)


if system == "cuda":
   cn_pipe.enable_attention_slicing()
   cn_pipe.enable_vae_slicing()


cn_prompt = "a contemporary cafe inside, architectural render, mushy daylight, excessive element"
img_controlnet = cn_pipe(
   immediate=cn_prompt,
   negative_prompt=negative_prompt,
   picture=canny_image,
   num_inference_steps=25,
   guidance_scale=6.5,
   controlnet_conditioning_scale=1.0,
).pictures[0]

We speed up inference by loading and fusing LoRA adapters and display quick sampling with only a few diffusion steps. Subsequent, we apply ControlNet to construct a structural alignment picture and information the format of the generated scene. This lets you keep construction whereas nonetheless benefiting from artistic textual content steerage.

masks = Picture.new("L", img_controlnet.measurement, 0)
mask_draw = ImageDraw.Draw(masks)
mask_draw.rectangle([60, 90, 320, 170], fill=255)
masks = masks.filter(ImageFilter.GaussianBlur(2))


inpaint_pipe = StableDiffusionInpaintPipeline.from_pretrained(
   BASE_MODEL,
   torch_dtype=dtype,
   safety_checker=None,
).to(system)


inpaint_pipe.scheduler = UniPCMultistepScheduler.from_config(inpaint_pipe.scheduler.config)


if system == "cuda":
   inpaint_pipe.enable_attention_slicing()
   inpaint_pipe.enable_vae_slicing()


inpaint_prompt = "a glowing neon signal that claims 'CAFÉ', cyberpunk model, real looking lighting"


img_inpaint = inpaint_pipe(
   immediate=inpaint_prompt,
   negative_prompt=negative_prompt,
   picture=img_controlnet,
   mask_image=masks,
   num_inference_steps=30,
   guidance_scale=7.0,
).pictures[0]


os.makedirs("outputs", exist_ok=True)
img_text.save("outputs/text2img.png")
grid_fast.save("outputs/lora_fast_grid.png")
format.save("outputs/format.png")
canny_image.save("outputs/canny.png")
img_controlnet.save("outputs/controlnet.png")
masks.save("outputs/masks.png")
img_inpaint.save("outputs/inpaint.png")


print("Saved outputs:", sorted(os.listdir("outputs")))
print("Finished.")

Create masks to isolate particular areas and apply inpainting to vary solely that a part of the picture. Alter chosen areas utilizing focused prompts whereas leaving the remainder as is. Lastly, save all intermediate and remaining outputs to disk for inspection and reuse.

In conclusion, we have now demonstrated how a single Diffuser pipeline can evolve into a versatile, production-ready picture technology system. We confirmed you learn how to transfer from pure text-to-image technology to quick sampling, structural management, and focused picture modifying with out altering frameworks or instruments. This tutorial reveals learn how to mix a scheduler, LoRA adapter, ControlNet, and remediation to create a controllable and environment friendly manufacturing pipeline that may simply be prolonged to extra superior artistic or utilized use circumstances.

Please test Full code here. Additionally, be at liberty to comply with us Twitter Do not forget to hitch us 100,000+ ML subreddits and subscribe our newsletter. hold on! Are you on telegram? You can now also participate by telegram.

Welcome to Ivugangingo!

At Ivugangingo, we're passionate about delivering insightful content that empowers and informs our readers across a spectrum of crucial topics. Whether you're delving into the world of insurance, navigating the complexities of cryptocurrency, or seeking wellness tips in health and fitness, we've got you covered.

Coding information for producing, controlling, and modifying high-quality pictures utilizing HuggingFace diffusers

Allstate Publicizes Most well-liked Dividend Payable on April 15, 2026

Former Sony govt says he known as President Obama and threw ‘The Interview’ within the trash after huge hack

Converter

Editors Pick

Newsletter

Categories

Related Posts

Leave a Comment Cancel Reply

Latest

Best selling

Top rated

Products

Latest Posts

Welcome to Ivugangingo!

Random Picks