【训练】InternVL2_8B VLM-R1GRPO微调

配置

构建新的镜像：


展开代码
docker build --network=host --build-arg http_proxy=http://10.136.19.26:10828 --build-arg  https_proxy=http://10.136.19.26:10828 -f Dockerfile -t kevinchina/deeplearning:vlmr1-0501 .

# 进容器装环境：
apt-get update
apt-get install libibverbs1

pip config set global.index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
pip install babel python-Levenshtein matplotlib pycocotools timm==1.0.15

# Addtional modules
pip install wandb==0.18.3
pip install tensorboardx
pip install qwen_vl_utils torchvision
pip install flash-attn --no-build-isolation
pip install babel
pip install python-Levenshtein
pip install matplotlib
pip install pycocotools
pip install openai
pip install httpx[socks]

pip install json_repair


展开代码
docker commit 4411ba9deb19 kevinchina/deeplearning:vlmr1-0501-1

运行命令：


展开代码
cd src/open-r1-multimodal

export DEBUG_MODE="true"
is_reward_customized_from_vlm_module=False


RUN_NAME="internvl2_5_4b"
export LOG_PATH="./debug_log_$RUN_NAME.txt"

export OPENAI_API_BASE="http://10.136.19.27:7869/v1"
export OPENAI_API_KEY="nsyabBKSDBgiwqd123134xx.."

torchrun --nproc_per_node=8 \
    --nnodes=1 \
    --node_rank="${RANK}" \
    --master_addr="${MASTER_ADDR}" \
    --master_port="${MASTER_PORT}" \
    src/open_r1/grpo_jsonl.py \
    --use_vllm False \
    --output_dir /output_xd/$RUN_NAME \
    --resume_from_checkpoint True \
    --model_name_or_path /InternVL3b \
    --image_folders /imagesdatasets/tasks-json-ui-doctor-smallsize-datasets-28m \
    --data_file_paths /jsondatasets/ui_doctor_dataset0425_28m.jsonl \
    --is_reward_customized_from_vlm_module $is_reward_customized_from_vlm_module \
    --max_anyres_num 3 \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 2 \
    --gradient_checkpointing true \
    --logging_steps 1 \
    --num_train_epochs 2 \
    --bf16 \
    --attn_implementation flash_attention_2 \
    --run_name haha \
    --data_seed 42 \
    --save_steps 100 \
    --num_generations 4 \
    --max_prompt_length 4096 \
    --max_completion_length 2048 \
    --temperature 0.2 \
    --vllm_gpu_memory_utilization 0.8 \
    --reward_funcs accuracy format \
    --beta 0.04 \
    --report_to "tensorboard" \
    --logging_dir "/mnt/cluster1" \
    --deepspeed /workspace/src/open-r1-multimodal/local_scripts/zero2.json

echo "Training completed for ${EXP_NAME}"


    --reward_method "llm"

数据模型挂载：

模型挂载： /InternVL2_8B_GRPO

数据挂载：/jsondatasets 得到：ui_doctor_dataset0424.jsonl

数据挂载：/imagesdatasets 得到 /imagesdatasets/tasks-json-ui-doctor-smallsize-datasets jsonl 是相对这个路径给图。

bug多，无法训练完整。

目录

配置