WAN 2.2 Text to Image ComfyUI workflow for Videcool

The WAN 2.2 Text-to-Image workflow in Videcool provides a powerful and flexible way to generate high-quality images directly from text prompts. Designed for speed, clarity, and creative control, this workflow is served by ComfyUI and uses the WAN 2.2 AI text to image model provided in a Comfy-Org ComfyUI repack on Hugging Face.

What can this ComfyUI workflow do?

In short: Text to image conversion.

This workflow converts written text prompts into fully generated images using diffusion technology. It interprets your prompt and outputs detailed, coherent visuals with high fidelity by combining WAN 2.2 diffusion weights, a dedicated VAE, and a large UMT5 text encoder. The base WAN 2.2 text-to-image model is derived from a 14B text-to-video backbone but is repackaged for image generation in ComfyUI.

Example usage in Videcool

Figure 1 - WAN 2.2 Text to Image ComfyUI workflow in Videcool

Download the ComfyUI workflow

Download ComfyUI Workflow file: Wan2.2_Text-To-Image_api.json

Image of the ComfyUI workflow

This figure provides a visual overview of the WAN 2.2 text-to-image workflow layout inside ComfyUI. Each node is placed in logical order to establish a clean and efficient generation pipeline, starting from the latent image creation through model loading, text encoding, sampling, and VAE decoding. Users can modify or expand parts of the workflow to create custom variations or integrate WAN 2.2 into other pipelines.

Figure 2 - WAN 2.2 Text to Image ComfyUI workflow

Installation steps

Step 1: Download umt5_xxl_fp8_e4m3fn_scaled.safetensors into /ComfyUI/models/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors.
Step 2: Download wan_2.1_vae.safetensors into /ComfyUI/models/vae/wan_2.1_vae.safetensors.
Step 3: Download wan2.2_t2v_high_noise_14B_fp8_scaled.safetensors into /ComfyUI/models/diffusion_models/.
Step 4: Download wan2.2_t2v_low_noise_14B_fp8_scaled.safetensors into /ComfyUI/models/diffusion_models/.
Step 5: Download the Wan2.2_Text-To-Image_api.json workflow file into your home directory.
Step 6: Install the RES4LYF custom node in ComfyUI Manager: Manage custom nodes → search “RES4LYF” → Install.
Step 7: Restart ComfyUI.
Step 8: Open the ComfyUI graphical user interface (ComfyUI GUI).
Step 9: Load Wan2.2_Text-To-Image_api.json in the ComfyUI GUI.
Step 10: Enter a text prompt into the “Clip Text Encode (Positive Prompt)” node and hit run to generate an image.
Step 11: Open Videcool in your browser, select text to image, and choose the WAN 2.2 model preset to generate an image.

Installation video

The workflow requires only a text prompt and a few basic parameter adjustments to begin generating images. After loading the JSON file, users can select guidance parameters, sampling steps, resolution, and prompt text, while the underlying WAN 2.2 diffusion weights and VAE handle the heavy lifting. Once executed, the sampler processes the latent representation and produces a final decoded image that can be saved and reused across other Videcool tools.

Prerequisites

To run the workflow correctly, download the following model files and place them into your ComfyUI directory. These files ensure the model can interpret language, convert prompts into latent embeddings, and decode the final images. Proper installation into the following location is essential before running the workflow: {your ComfyUI directory}/models.

ComfyUI\models\text_encoders\umt5_xxl_fp8_e4m3fn_scaled.safetensors
https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors

ComfyUI\models\vae\wan_2.1_vae.safetensors
https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/vae/wan_2.1_vae.safetensors

ComfyUI\models\diffusion_models\wan2.2_t2v_high_noise_14B_fp8_scaled.safetensors
https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/diffusion_models/wan2.2_t2v_high_noise_14B_fp8_scaled.safetensors

ComfyUI\models\diffusion_models\wan2.2_t2v_low_noise_14B_fp8_scaled.safetensors
https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/diffusion_models/wan2.2_t2v_low_noise_14B_fp8_scaled.safetensors

For use in Videcool, the WAN 2.2 high and low noise diffusion weights may be renamed to wan2.2_t2v_high_noise_model.safetensors and wan2.2_t2v_low_noise_model.safetensors in the diffusion_models folder, which aligns with the referenced model names in some Videcool presets.

How to use this workflow in Videcool

Videcool integrates seamlessly with ComfyUI, allowing users to load WAN 2.2 workflows directly and generate images without external complexity. After importing the workflow file into ComfyUI and confirming the model paths, simply enter your prompt and click generate, while Videcool handles backend interactions. This makes WAN 2.2 image creation intuitive and accessible, even for users who are not keen on learning how ComfyUI works.

ComfyUI nodes used

This workflow uses the following nodes. Each node performs a specific role, such as loading models, encoding text, sampling, and finally decoding the output. Together they create a reliable and modular pipeline that can be easily extended or customized.

Base AI model

This workflow is built on the WAN 2.2 family of models, which originate from a large 14B-parameter text-to-video backbone repackaged for ComfyUI. The Comfy-Org Wan 2.2 ComfyUI Repackaged weights expose diffusion, text encoder, and VAE components that can be used for both image and video generation. The combination of high-capacity diffusion weights and a powerful UMT5 XXL text encoder provides strong prompt adherence and rich visual detail for many styles and subjects.

Hugging face repository:

https://huggingface.co/Wan-AI/Wan2.2-T2V-A14B

Official GitHub repository:

https://github.com/Wan-Video/Wan2.2

WAN 2.x text encoder pack:

https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged

Image resolution

WAN 2.2 diffusion models in ComfyUI repacks are derived from text-to-video models that typically operate around a 720p-like resolution, but the ComfyUI image workflow can generate a wide range of resolutions depending on the EmptySD3LatentImage settings. For best results, use resolutions that are multiples of 32 pixels in width and height, and stay near standard aspect ratios such as 1:1, 16:9, or 9:16 for stable compositions.

Conclusion

The WAN 2.2 Text-to-Image workflow is a robust, powerful, and user-friendly solution for generating AI-driven visuals in Videcool. With its combination of high-capacity WAN 2.2 models, a modular ComfyUI pipeline, and seamless platform integration, it enables beginners and professionals alike to produce creative and production-ready images with ease. By understanding the workflow components and installation steps, users can unlock the full potential of WAN 2.2 based image generation inside Videcool.

More information