Qwen Text to Image ComfyUI workflow for Videcool
The Qwen Text-to-Image workflow in Videcool provides a powerful and flexible way to generate high-quality images directly from text prompts. Designed for speed, clarity, and creative control, this workflow is served by ComfyUI and uses the Qwen AI text to image model developed by Alibaba.
What can this ComfyUI workflow do?
In short: Text to image conversion.
This workflow converts written text prompts into fully generated images using diffusion technology. It interprets your prompt and outputs detailed, coherent visuals with high fidelity. The base AI model it uses is optimized for multiple resolutions with flexible aspect ratios, making it suitable for various creative applications.
Example usage in Videcool
Download the ComfyUI workflow
Download ComfyUI Workflow file: qwen_text2image_api.jsonImage of the ComfyUI workflow
This figure provides a visual overview of the workflow layout inside ComfyUI. Each node is placed in logical order to establish a clean and efficient generation pipeline. The structure makes it easy to understand how the text encoders, model loader, sampler, and VAE interact. Users can modify or expand parts of the workflow to create custom variations.
Installation steps
Step 1: Download qwen_image_fp8_e4m3fn.safetensors into /ComfyUI/models/diffusion_models/qwen_image_fp8_e4m3fn.safetensorsStep 2: Rename the qwen_image_fp8_e4m3fn.safetensors file to qwen_text2image_model.safetensors
Step 2: Download qwen_image_vae.safetensors into ComfyUI/models/vae/qwen_image_vae.safetensors
Step 3: Download qwen_text_encoder.safetensors into /ComfyUI/models/text_encoders/qwen_text_encoder.safetensors
Step 4: Download the qwen_text2image_api.json workflow file into your home directory
Step 5: Restart ComfyUI
Step 6: Open the ComfyUI graphical user interface (ComfyUI GUI)
Step 7: Load the qwen_text2image_api.json in the ComfyUI GUI
Step 8: Enter a text prompt into the "Clip Text Encode (Positive Prompt)" node and hit run to generate an image
Step 9: Open Videcool in your browser, select text to image, and choose Flux1-Dev to generate an image
Installation video
The workflow requires only a text prompt and a few basic parameter adjustments to begin generating images. After loading the JSON file, users can select guidance scale, sampling steps, resolution, and prompt text. Once executed, the sampler processes the latent representation and produces a final decoded image. The result can be saved and reused across other Videcool tools. Check out the following video to see the model in action:
Prerequisites
To run the workflow correctly, download the following model files and place them into your ComfyUI directory. These files ensure the model can interpret language, convert prompts into latent embeddings, and decode the final images. Proper installation into the following location is essential before running the workflow: {your ComfyUI directory}/models.
ComfyUI\models\diffusion_models\qwen_text2image_model.safetensors
https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/resolve/main/split_files/diffusion_models/qwen_image_fp8_e4m3fn.safetensors
ComfyUI\models\vae\qwen_image_vae.safetensors
https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/resolve/main/split_files/vae/qwen_image_vae.safetensors
ComfyUI\models\text_encoders\qwen_text_encoder.safetensors
https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/resolve/main/split_files/text_encoders/qwen_2.5_vl_7b_fp8_scaled.safetensors
How to use this workflow in Videcool
Videcool integrates seamlessly with ComfyUI, allowing users to load workflows directly and generate images without external complexity. After importing the workflow file, simply enter your prompt and click generate. The system handles all backend interactions with ComfyUI. This makes image creation intuitive and accessible, even for users who are not keen on learning how ComfyUI works. The following video shows how this model can be used in Videcool:
ComfyUI nodes used
This workflow uses the following nodes. Each node performs a specific role, such as loading models, encoding text, sampling, and finally decoding the output. Together they create a reliable and modular pipeline that can be easily extended or customized.
- EmptySD3LatentImage
- Load Diffusion Model
- DualCLIPLoader
- Load VAE
- Clip Text Encode
- FluxGuidance
- KSampler
- VAE Decode
- Save Image
- ModelSamplingAuraFlow
Base AI model
This workflow is built on Alibaba's Qwen model, a modern and highly capable diffusion-based text-to-image generator. Qwen provides excellent image quality, versatile style support, and efficient generation with flexible resolution handling. The model is optimized for both artistic and commercial applications.
Hugging Face repository:
https://huggingface.co/Qwen/Qwen-Image
Developer: Alibaba
https://qwen.ai/home
Image resolution
AI text to image models perform best when they generate images in their native resolution, that was used for training. For this model, information about the best resolution can be found below:
Native image size: 1024x1024px
The model supports other resolutions. Best resolutions are multiples of 64px.
Conclusion
The Qwen Text-to-Image workflow is a robust, powerful, and user-friendly solution for generating AI-driven visuals in Videcool. With its combination of high-quality models, a modular ComfyUI pipeline, and seamless platform integration, it enables beginners and professionals alike to produce creative and commercial-grade images with ease. By understanding the workflow components and advantages, users can unlock the full potential of AI-assisted image generation in Videcool.