https://modelscope.cn/models/ZhipuAI/CogView4-6B
展开代码docker run -it --gpus '"device=1,2,3,4,5,6,7"' --shm-size=64g -v /data/xiedong:/data/xiedong --net host kevinchina/deeplearning:2.5.1-cuda12.4-cudnn9-devel-vlmr1 bash cd /data/xiedong pip install git+https://github.com/huggingface/diffusers.git
运行python:
python展开代码import time
from diffusers import CogView4Pipeline
# from modelscope import snapshot_download
import torch
from torch.xpu import device
# model_dir = snapshot_download("ZhipuAI/CogView4-6B")
pipe = CogView4Pipeline.from_pretrained("./ZhipuAI/CogView4-6B", torch_dtype=torch.bfloat16)
# Open it for reduce GPU memory usage
# pipe.enable_model_cpu_offload()
# pipe.vae.enable_slicing()
# pipe.vae.enable_tiling()
# cuda
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
pipe = pipe.to(device)
prompts = [
    "千树万树梨花开",
    "老虎",
    "老鼠",
    "cat"
]
for prompt in prompts:
    time1 = time.time()
    image = pipe(
        prompt=prompt,
        negative_prompt="paintings, sketches, (worst quality, low quality, normal quality:1.7), ",
        guidance_scale=3.5,
        num_images_per_prompt=1,
        num_inference_steps=50,
        width=1024,
        height=1024,
    ).images[0]
    time2 = time.time()
    print(f"Time taken: {time2 - time1} seconds")
    image.save(f"{prompt}_1.png")
for prompt in prompts:
    time1 = time.time()
    image = pipe(
        prompt=prompt,
        negative_prompt="paintings, sketches, (worst quality, low quality, normal quality:1.7), ",
        guidance_scale=3.5,
        num_images_per_prompt=1,
        num_inference_steps=25,
        width=1024,
        height=1024,
    ).images[0]
    time2 = time.time()
    print(f"Time taken: {time2 - time1} seconds")
    image.save(f"{prompt}_2.png")
展开代码docker run -it --gpus '"device=2"' --shm-size=64g -v /data/xiedong:/data/xiedong --net host kevinchina/deeplearning:CogView4-6B bash
展开代码ZhipuAI/CogView4-6B的全部权重放GPU运行推理1024*1024,占用显存资源: NVIDIA A800-SXM4-80GB 38G显存 每张图耗时17秒。 guidance_scale=3.5, num_images_per_prompt=1, num_inference_steps=25, width=1024, height=1024, ZhipuAI/CogView4-6B的全部权重放GPU运行推理512*512,占用显存资源: NVIDIA A800-SXM4-80GB 32G显存 每张图耗时5秒。 guidance_scale=3.5, num_images_per_prompt=1, num_inference_steps=25, width=512, height=512,


本文作者:Dong
本文链接:
版权声明:本博客所有文章除特别声明外,均采用 CC BY-NC。本作品采用《知识共享署名-非商业性使用 4.0 国际许可协议》进行许可。您可以在非商业用途下自由转载和修改,但必须注明出处并提供原作者链接。 许可协议。转载请注明出处!