跑ZhipuAI/CogView4-6B的效果

https://modelscope.cn/models/ZhipuAI/CogView4-6B


展开代码
docker run -it --gpus '"device=1,2,3,4,5,6,7"' --shm-size=64g -v /data/xiedong:/data/xiedong --net host kevinchina/deeplearning:2.5.1-cuda12.4-cudnn9-devel-vlmr1 bash

cd /data/xiedong

pip install git+https://github.com/huggingface/diffusers.git

运行python：

python
展开代码
import time

from diffusers import CogView4Pipeline
# from modelscope import snapshot_download
import torch
from torch.xpu import device

# model_dir = snapshot_download("ZhipuAI/CogView4-6B")
pipe = CogView4Pipeline.from_pretrained("./ZhipuAI/CogView4-6B", torch_dtype=torch.bfloat16)

# Open it for reduce GPU memory usage
# pipe.enable_model_cpu_offload()
# pipe.vae.enable_slicing()
# pipe.vae.enable_tiling()

# cuda
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
pipe = pipe.to(device)

prompts = [
    "千树万树梨花开",
    "老虎",
    "老鼠",
    "cat"
]

for prompt in prompts:
    time1 = time.time()
    image = pipe(
        prompt=prompt,
        negative_prompt="paintings, sketches, (worst quality, low quality, normal quality:1.7), ",
        guidance_scale=3.5,
        num_images_per_prompt=1,
        num_inference_steps=50,
        width=1024,
        height=1024,
    ).images[0]
    time2 = time.time()
    print(f"Time taken: {time2 - time1} seconds")
    image.save(f"{prompt}_1.png")

for prompt in prompts:
    time1 = time.time()
    image = pipe(
        prompt=prompt,
        negative_prompt="paintings, sketches, (worst quality, low quality, normal quality:1.7), ",
        guidance_scale=3.5,
        num_images_per_prompt=1,
        num_inference_steps=25,
        width=1024,
        height=1024,
    ).images[0]
    time2 = time.time()
    print(f"Time taken: {time2 - time1} seconds")
    image.save(f"{prompt}_2.png")


展开代码
docker run -it --gpus '"device=2"' --shm-size=64g -v /data/xiedong:/data/xiedong --net host kevinchina/deeplearning:CogView4-6B bash


展开代码
ZhipuAI/CogView4-6B的全部权重放GPU运行推理1024*1024，占用显存资源：
NVIDIA A800-SXM4-80GB 38G显存
每张图耗时17秒。
guidance_scale=3.5,
num_images_per_prompt=1,
num_inference_steps=25,
width=1024,
height=1024,


ZhipuAI/CogView4-6B的全部权重放GPU运行推理512*512，占用显存资源：
NVIDIA A800-SXM4-80GB 32G显存
每张图耗时5秒。
guidance_scale=3.5,
num_images_per_prompt=1,
num_inference_steps=25,
width=512,
height=512,