Master Z-Image-Turbo: ControlNet, Inpainting & 4K Upscale Guide

Z-Image Advanced Features: What You Can Do Beyond Text Prompts

Recently, this new image generator called Z-Image was released. This is by far the best open-source model you can use right now. I already did a full installation tutorial when it first came out, but there's way more you can do than just making images with a text prompt.

You can control the composition of your generation with a reference image. You can also control the pose of your character with another reference photo. You can already kind of edit photos even though the main editing model isn't out yet. You can generate photos over 4K in resolution with incredible detail using Z-Image.

If you're new to Z-Image, I would highly recommend that you go through the full installation tutorial first. I’ll make everything here as easy as possible to follow.

Z-Image Advanced Features: ControlNet for Composition and Pose

The first thing I’m going to show you is how you can control the composition of your generation or the pose of your character with a reference image. This is called ControlNet, and we already have a working ControlNet for Z-Image.

Download the Workflow and Required Models

There is a Z-Image Turbo ControlNet workflow you need to download as a JSON file. If it opens in a new tab, right-click and choose Save link as. Save the workflow anywhere, like your root Comfy folder.

Master Z-Image: ControlNet, Inpainting & 4K Upscale Guide screenshot 1

Open ComfyUI. Before you do anything, update ComfyUI:

Click Manager
Click Update ComfyUI
Click Restart, then Confirm

Master Z-Image: ControlNet, Inpainting & 4K Upscale Guide screenshot 2

Drag and drop the downloaded workflow onto the interface.

Master Z-Image: ControlNet, Inpainting & 4K Upscale Guide screenshot 3

If you haven’t downloaded Z-Image before, there are a few models to grab:

Qwen 3 text encoder - put this in ComfyUI/models/text encoders - about 7.8 GB
Z-Image Turbo model - put this in ComfyUI/models/diffusion models - about 11.4 GB
AE.safetensors VAE - put this in ComfyUI/models/VAE - about 327 MB

Master Z-Image: ControlNet, Inpainting & 4K Upscale Guide screenshot 5

Master Z-Image: ControlNet, Inpainting & 4K Upscale Guide screenshot 4

You should already have these three if you installed Z-Image earlier. The only new model you need is the ControlNet Union safetensors file:

Put it in ComfyUI/models/model patches - about 2.9 GB

Master Z-Image: ControlNet, Inpainting & 4K Upscale Guide screenshot 6

Press R to refresh your models list.

Master Z-Image: ControlNet, Inpainting & 4K Upscale Guide screenshot 7

Using Canny Edges to Match Composition

Upload an image to use as a reference. It goes to a scaler node, which is off by default. Turn it on if you want to resize the largest edge to a specific value. For example, if the original longest edge is 1365 and you set it to 1024, it scales to 1024 while keeping aspect ratio.

Master Z-Image: ControlNet, Inpainting & 4K Upscale Guide screenshot 8

The image then passes through a Canny node to detect edges and preview an edge map. That edge map, plus your prompt, is sent into hidden nodes you can expand. This is the normal Z-Image workflow.

Master Z-Image: ControlNet, Inpainting & 4K Upscale Guide screenshot 9

Load your models:

Press R to refresh the model list
For CLIP, select the Qwen 3 encoder
For VAE, select AE.safetensors
For Model Patch Loader, select the ControlNet Union file

Master Z-Image: ControlNet, Inpainting & 4K Upscale Guide screenshot 10

There is a node that lets you control how much influence your reference image has. At 100 percent it follows the edge map fully. Lower values reduce its influence. For example, set it to 80 percent.

Master Z-Image: ControlNet, Inpainting & 4K Upscale Guide screenshot 11

The workflow runs through Shift and KSampler. If you’re not familiar with these settings, the defaults work well. You can also set the number of images to generate at once.

Master Z-Image: ControlNet, Inpainting & 4K Upscale Guide screenshot 12

Click Run. On a 16 GB VRAM GPU, 9 steps took under 10 seconds with the full Z-Image Turbo model. I uploaded a photo of a room, resized the longest edge to 1024, generated a Canny edge map, set edge influence to 80 percent, and prompted: a cozy bedroom at night with Christmas lights and decorations. The result followed the composition of the reference but matched the prompt.

Master Z-Image: ControlNet, Inpainting & 4K Upscale Guide screenshot 14

You can bypass the scaler to keep the original dimensions by selecting it and pressing Ctrl+B.

Master Z-Image: ControlNet, Inpainting & 4K Upscale Guide screenshot 13

Using OpenPose to Match Character Pose

Instead of edges, you can estimate the pose of a reference image and apply it to a new character.

Install a custom node to get more ControlNet detectors:

Click Manager
Click Custom Nodes Manager
Search for ControlNet and install ComfyUI ControlNet Auxiliary by Fannovel16

Master Z-Image: ControlNet, Inpainting & 4K Upscale Guide screenshot 15

Replace the Canny node with an OpenPose node:

Double-click the canvas and search for pose
Choose OpenPose
Connect the input image to OpenPose and route its outputs to preview and the prompt

Master Z-Image: ControlNet, Inpainting & 4K Upscale Guide screenshot 16

Enable hands, body, and face detections. Prompt something like: an evil sorceress in a black robe in a dark forest. Set pose influence to 70 percent. Set the step count to 8. Run it. The output maintains the pose while matching your new subject and prompt.

Master Z-Image: ControlNet, Inpainting & 4K Upscale Guide screenshot 19

Using Depth Maps for Stronger Composition Control

You can also use depth maps. Replace the detector node with a depth estimator:

Double-click and search for depth
Choose Depth Anything v2 from the detectors list

Master Z-Image: ControlNet, Inpainting & 4K Upscale Guide screenshot 18 This may download an additional model automatically the first time you run it. Use a prompt like: a cosplayer girl at a convention. Run it. The workflow extracts a depth map, which guides the composition of the result.

ControlNet makes it easy to control composition from a reference image. You can extract edges, pose, or depth. There are other detectors like line art, anime line art, and scribble, but I mostly use edge, pose, or depth.

Master Z-Image: ControlNet, Inpainting & 4K Upscale Guide screenshot 17

Z-Image Advanced Features: Inpainting - Edit Existing Photos With Z-Image Turbo

They are planning to release a separate model called Z-Image Edit, which allows you to edit an image using natural language, but it’s not out yet. You can still edit existing images using the current Z-Image Turbo model. Here’s a quick hack.

Load and Encode the Image

Start with the standard Z-Image workflow. Upload the image you want to edit:

Double-click and type Load Image
Load your photo
Replace the previous image node with this Load Image node

You need to convert pixels to latent space:

Double-click and add VAE Encode
Connect the image pixels to VAE Encode
Connect your VAE model to the VAE input

Paint a Mask That Blends Naturally

Find the node that lets you draw a mask on the image. Right-click it and open Mask Editor. Draw over the area you want to replace. Adjust brush size and hardness. A harder brush makes sharp edges. A softer brush makes feathered edges that blend more smoothly.

For example, replace a diary with a sleeping cat. Draw over the diary. Add softer brush strokes around the edges so the change blends with the photo. Click Save to apply the mask.

Set the Latent Noise Mask and Denoise Strength

Transfer the mask to the latent image before KSampler:

Double-click and add Set Latent Noise Mask
Connect the mask output to its mask input
Connect the latent from VAE Encode into Set Latent Noise Mask
Connect the result to KSampler
Bypass any redundant latent image node

Prompt something like: sleeping cat.

Adjust Denoise. This controls how much to change the masked area:

1.0 means change 100 percent of the masked region
Lower values keep more of the original

Run it. You’ll get a replacement in the masked area.

Try another example. Open the Mask Editor, clear the mask, and paint over another object. Feather the edges so the new content blends with the original image. Prompt: a vase of flowers. Run it. This works, but it’s not as good as a real image editor like Nano Banana or the upcoming Z-Image Edit. If you can’t wait, this is a quick way to edit images with the existing model.

Z-Image Advanced Features: Upscaling to 4K and Beyond

I’m going to go over two main ways to upscale.

Method 1: A multi-pass workflow that only uses Z-Image. No additional models required.
Method 2: Use an external upscaler model called SeedVR2. The file is about 16 GB, but it adds a lot of detail.

Method 1 - Multi-pass Upscale Inside Z-Image

Start from the standard Z-Image workflow. Normally, it generates an image in one go. Instead, generate a smaller image first, then pass it through Z-Image again at a higher resolution.

Steps:

Select the KSampler and related nodes, then Ctrl+C and Ctrl+V to duplicate them
Convert the first pass image to latent space with VAE Encode
Add Upscale Latent By and set scale to 2x
Connect the upscaled latent to the second KSampler
Connect the model, positive, and negative prompt nodes to the second pass

Settings:

Initial resolution: 768 x 1024
Seed 42, Fixed
First pass step count: 6
Second pass Denoise: 0.5 to retain about half of the original details while adding more

Run it. The first pass creates the base image. The second pass regenerates at 2x size and adds detail. Compare with generating a single large image in one pass at 1536 x 2048 with 9 steps. Even with the same seed, changing resolution alters the image, but in general the two-pass method sharpens faces and hair better. I prefer generating a smaller image first, then upscaling if I like it.

Method 2 - SeedVR2 Model Upscaler

Standalone Upscaling Workflow

Load the SeedVR2 4K image upscale workflow JSON. If you get missing nodes:

Click Manager
Click Install Missing Custom Nodes
Install what it detects
Restart

Torch compile can improve speed after the first run, but it’s not needed for single images. Upload your image, keep the default 4K settings, and run. The first time, ComfyUI downloads the SeedVR2 upscaler - around 15 GB - and a VAE - about 478 MB. The output is saved to your output folder.

Side-by-side with the original, details in the shirt, face, and hair are much sharper. You can zoom in a lot, and it still looks good.

Link It Directly After Z-Image Generation

If you want Z-Image to create an image and immediately upscale it:

Load the standard Z-Image workflow
Copy all nodes from the SeedVR2 workflow into the same canvas
Delete the Load Image node in the upscaler flow
Add Split Image With Alpha between Z-Image’s output and the upscaler inputs
Connect image to Split Image With Alpha, then route image and alpha to the SeedVR2 inputs

Prompt example: Latina female with thick wavy hair. Width x Height: 768 x 1280. Step count: 7. Run it. Z-Image first generates the base image, then SeedVR2 upscales it to 4K. Compared side-by-side, the upscaled version has much clearer eyes, eyelashes, eyebrows, skin, hair, and clothing details.

Final Thoughts

Z-Image is not just a text-to-image model. With ControlNet, you can drive composition with edges, pose, or depth. You can inpaint by masking, encoding to latent, and controlling denoise to replace objects cleanly. For resolution, a two-pass Z-Image upscale adds noticeable sharpness, and the SeedVR2 upscaler pushes detail much further into 4K and beyond. These Z-Image Advanced Features let you guide structure, make edits, and scale quality in a practical workflow.