数据集
本比赛选取arXiv数据集作为重要数据来源,首先对数据集进行系统性分析,筛选出astro-ph、cond-mat.mes-hall、cond-mat.mtrl-sci、cs.CL、cs.CV、cs.LG、gr-qc、hep-ph、hep-th、quant-ph这10个数据量最为丰富的学术类别。在此基础上,进一步追踪采集各对应类别的最新学术论文,通过科学的整合方法构建混合数据集。
所构建的数据集涵盖预训练数据和监督微调(SFT)数据,为模型训练提供多维度的数据支撑。验证集和测试集均从该数据集内科学抽取,确保数据的同源性和实验评估的可靠性,从而为后续研究奠定坚实的数据基础。
Arxiv dataset样例
{"id": "0711.3691",
"submitter": "Eric Laporte",
"authors": "Olivier Blanc (IGM-LabInfo), Matthieu Constant (IGM-LabInfo), Eric Laporte (IGM-LabInfo)",
"title": "Outilex, plate-forme logicielle de traitement de textes 'ecrits",
"comments": null,
"journal-ref": "Dans Verbum ex machina. Proceedings of TALN - Outilex, plate-forme logicielle de traitement de textes 'ecrits, Louvain : Belgique (2006)",
"doi": null,
"report-no": null,
"categories": "cs.CL",
"license": null,
"abstract": "The Outilex software platform, which will be made available to research, development and industry, comprises software components implementing all the fundamental operations of written text processing: processing without lexicons, exploitation of lexicons and grammars, language resource management. All data are structured in XML formats, and also in more compact formats, either readable or binary, whenever necessary; the required format converters are included in the platform; the grammar formats allow for combining statistical approaches with resource-based approaches. Manually constructed lexicons for French and English, originating from the LADL, and of substantial coverage, will be distributed with the platform under LGPL-LR license.",
"versions": [
{"version": "v1", "created": "Fri, 23 Nov 2007 09:45:13 GMT"},
{"version": "v2", "created": "Tue, 27 Nov 2007 10:22:14 GMT"}
],
"update_date": "2007-11-27",
"authors_parsed": [
["Blanc", "Olivier", "", "IGM-LabInfo"],
["Constant", "Matthieu", "", "IGM-LabInfo"],
["Laporte", "Eric", "", "IGM-LabInfo"]
]}
1.训练集获取
训练集下载链接:www.modelscope.cn/datasets/Ji…
modelscope download --dataset JimmyMa99/smartflow-arxiv-dataset swift* --local_dir ./datasets/train
2.验证集获取
www.modelscope.cn/datasets/Ji…
modelscope download --dataset JimmyMa99/smartflow-arxiv-dataset eval_oc_data.csv --local_dir ./datasets/eval
Baseline案例
L1G4- InternLM 论文分类微调实践(Swift 版)
L1G4- InternLM 论文分类微调实践(XTuner 版)
L1G4- InternLM 论文分类微调实践(Ascend+XTuner 版)
注意事项
显存如果小的话,可以调低batch_size为1, 就不有CUDA out of memory 的烦恼了
模型地址
模型提交
swift export \
--model output/vx-xxx/checkpoint-xxx \
--push_to_hub true \
--hub_model_id '<model-id>' \
--hub_token '<sdk-token>' \
--use_hf false
上传到 ModelScope 并在问卷中提交链接,提交后系统会自动拉取权重并测试结果 文档提交入口:aicarrier.feishu.cn/share/base/…
复现流程1(Swift版本)
1.环境配置
在创建开发机界面选择镜像为 Cuda12.2-conda,并选择 GPU 为30% A100。(10%不够)
安装 ms-swift
conda create -n ms-swift python=3.10 -y
conda activate ms-swift
pip install 'ms-swift[all]' -U
pip install wandb
注意:后续所有的操作都需要再 ms-swift 的虚拟环境中执行,在进入新终端时需要先执行 conda activate ms-swift
遇到报错
安装pip install 'ms-swift[all]' -U这一步
需修改为
pip install ms-swift -U
显示安装成功
2.训练
已经预处理好数据 训练集下载链接:www.modelscope.cn/datasets/Ji…
conda activate ms-swift
pip install modelscope
modelscope download --dataset JimmyMa99/smartflow-arxiv-dataset --local_dir ./datasets/train
2.1预训练
使用lora进行预训练
conda activate ms-swift
bash config/internlm3-8b-lora.sh
脚本解析:
- 注意删除注释以及修改模型和数据集的文件地址
#!/bin/bash
# 创建日志目录
LOG_DIR="logs"
mkdir -p $LOG_DIR # 确保日志目录存在,如果不存在则创建
# 获取当前时间戳,用于生成唯一的日志文件名
TIMESTAMP=$(date +"%Y%m%d_%H%M%S")
LOG_FILE="$LOG_DIR/internlm3-8b_lora_sft_${TIMESTAMP}.log" # 设置日志文件路径
# 设置CUDA环境变量
export NPROC_PER_NODE=1 # 设置每个节点使用的进程数为1
export OMP_NUM_THREADS=1 # 限制OpenMP线程数为1,避免过多线程竞争
export CUDA_VISIBLE_DEVICES=0 # 指定使用的GPU编号为0
# 使用nohup命令在后台运行训练任务,即使终端关闭也能继续运行
nohup swift sft \
--model /root/models/Shanghai_AI_Laboratory/internlm3-8b-instruct/Shanghai_AI_Laboratory/internlm3-8b-instruct \ # 指定基础模型路径
--train_type lora \ # 使用LoRA训练方法
--dataset 'data/swift_formatted_data.jsonl' \ # 指定训练数据集
--torch_dtype bfloat16 \ # 使用bfloat16精度以节省显存
--num_train_epochs 1 \ # 设置训练轮数为2
--per_device_train_batch_size 2 \ # 每个设备的训练批次大小为4
--learning_rate 5e-5 \ # 学习率设置为5e-5
--warmup_ratio 0.1 \ # 预热阶段占总训练步数的10%
--split_dataset_ratio 0 \ # 不拆分数据集
--report_to wandb \ # 将训练日志报告到Weights & Biases平台
--lora_rank 8 \ # LoRA的秩设置为8
--lora_alpha 32 \ # LoRA的alpha参数设置为32
--use_chat_template false \ # 不使用聊天模板
--target_modules all-linear \ # 对所有线性层应用LoRA
--gradient_accumulation_steps 2 \ # 梯度累积步数为2,用于增大有效批次大小
--save_steps 2000 \ # 每2000步保存一次模型
--save_total_limit 5 \ # 最多保存5个检查点
--gradient_checkpointing_kwargs '{"use_reentrant": false}' \ # 梯度检查点设置,禁用重入
--logging_steps 5 \ # 每5步记录一次日志
--max_length 2048 \ # 最大序列长度设为2048
--output_dir ./swift_output/InternLM3-8B-Lora \ # 输出目录
--dataloader_num_workers 256 \ # 数据加载器使用256个工作线程
--model_author JimmyMa99 \ # 模型作者信息
--model_name InternLM3-8B-Lora \ # 模型名称
> "$LOG_FILE" 2>&1 & # 将标准输出和错误输出重定向到日志文件,并在后台运行
# 打印进程ID和日志文件位置,便于用户跟踪
echo "Training started with PID $!" # 显示后台进程的PID
echo "Log file: $LOG_FILE" # 显示日志文件位置
# 提示用户如何实时查看日志
echo "To view logs in real-time, use:"
echo "tail -f $LOG_FILE"
注意此处需要自己建一个config文件夹,然后创建脚本文件
如执行脚本的时候关机了怎么办?
- 去logs文件夹查看,有internlm3-8b_lora_sft_20250516_225540.log文件为空
执行脚本报错
咨询群内,说要去掉脚本注释
修改后执行又报错
ValueError: path: '/root/models/Shanghai_AI_Laboratory/internlm3-8b-instruct/Shanghai_AI_Laboratory/internlm3-8b-instruct' not found
- 原来是要自己指定模型地址/root/share/new_models/internlm3/internlm3-8b-instruct
修改后又又报错
执行后又又又报错
[ERROR:modelscope] Repo data/swift_formatted_data.jsonl not exists on either www.modelscope.cn or www.modelscope.ai
没有指定训练数据集
可打印对应日志查看报错
tail -f logs/internlm3-8b_lora_sft_20250517_134419.log
最终修改完善的脚本如下
LOG_DIR="logs"
mkdir -p $LOG_DIR
TIMESTAMP=$(date +"%Y%m%d_%H%M%S")
LOG_FILE="$LOG_DIR/internlm3-8b_lora_sft_${TIMESTAMP}.log"
export NPROC_PER_NODE=1
export OMP_NUM_THREADS=1
export CUDA_VISIBLE_DEVICES=0
nohup swift sft \
--model /root/share/new_models/internlm3/internlm3-8b-instruct \
--train_type lora \
--dataset '/root/datasets/train/swift_formatted_pretrain_data.jsonl' \
--torch_dtype bfloat16 \
--num_train_epochs 1 \
--per_device_train_batch_size 2 \
--learning_rate 5e-5 \
--warmup_ratio 0.1 \
--split_dataset_ratio 0 \
--lora_rank 8 \
--lora_alpha 32 \
--use_chat_template false \
--target_modules all-linear \
--gradient_accumulation_steps 2 \
--save_steps 2000 \
--save_total_limit 5 \
--gradient_checkpointing_kwargs '{"use_reentrant": false}' \
--logging_steps 5 \
--max_length 2048 \
--output_dir ./swift_output/InternLM3-8B-Lora \
--dataloader_num_workers 256 \
--model_author JimmyMa99 \
--model_name InternLM3-8B-Lora \
> "$LOG_FILE" 2>&1 &
echo "Training started with PID $!"
echo "Log file: $LOG_FILE"
echo "To view logs in real-time, use:"
echo "tail -f $LOG_FILE"
合并权重
训练完成后可在output_dir文件夹查看,选择有权重的那个
- 指令如下
swift export --adapters /root/swift_output/InternLM3-8B-Lora/v4-20250517-145158/checkpoint-144 --merge_lora true
- 合并成功
合并后的文件位置在/root/swift_output/InternLM3-8B-Lora/v4-20250517-145158/checkpoint-144-merged
2.2 SFT
2.2.1使用lora进行SFT
conda activate ms-swift
bash config/internlm3-8b-sft-lora.sh
脚本解析:
- 注意删除注释以及修改模型和数据集的文件地址
#!/bin/bash
# 指定使用bash解释器执行脚本
# 创建日志目录
LOG_DIR="logs"
# 定义日志存储目录变量
mkdir -p $LOG_DIR
# 创建日志目录,-p参数确保即使目录已存在也不会报错
# 获取当前时间戳
TIMESTAMP=$(date +"%Y%m%d_%H%M%S")
# 获取当前时间并格式化为年月日_时分秒格式
LOG_FILE="$LOG_DIR/internlm3-8b_lora_sft_${TIMESTAMP}.log"
# 组合日志文件路径,使用时间戳确保文件名唯一
# 设置CUDA设备
export NPROC_PER_NODE=1
# 设置每个节点的进程数为1
export OMP_NUM_THREADS=1
# 设置OpenMP线程数为1,限制并行线程数
export CUDA_VISIBLE_DEVICES=0
# 指定使用的GPU设备为0号设备
nohup swift sft \
# 使用nohup命令在后台运行swift sft命令,即使终端关闭也能继续运行
--model /root/code/camp5_course/swift_output/InternLM3-8B-Lora/v1-20250416-140542/checkpoint-74-merged \
# 指定基础模型路径,使用之前训练的checkpoint-74的合并模型
--train_type lora \
# 设置训练类型为LoRA(低秩适应)
--dataset '/root/code/camp5_course/data/swift_formatted_sft_train_data.jsonl' \
# 指定训练数据集的路径
--torch_dtype bfloat16 \
# 设置模型参数的数据类型为bfloat16,减少内存占用
--num_train_epochs 1 \
# 设置训练轮数为5轮
--per_device_train_batch_size 22 \
# 设置每个设备的训练批次大小为8
--learning_rate 1e-4 \
# 设置学习率为0.0001
--warmup_ratio 0.1 \
# 设置预热比例为0.1,即10%的训练步骤用于学习率从小到大的预热
--split_dataset_ratio 0 \
# 设置数据集分割比例为0,不进行训练/验证分割
--report_to wandb \
# 设置训练报告发送到Weights & Biases平台
--lora_rank 8 \
# 设置LoRA的秩为8,控制可训练参数的数量
--lora_alpha 32 \
# 设置LoRA的alpha为32,影响LoRA更新的缩放程度
--target_modules all-linear \
# 设置LoRA目标模块为所有线性层
--gradient_accumulation_steps 2 \
# 设置梯度累积步数为2,相当于扩大了批次大小
--save_steps 2000 \
# 每2000步保存一次检查点
--save_total_limit 5 \
# 最多保存5个检查点,超过会删除旧的
--gradient_checkpointing_kwargs '{"use_reentrant": false}' \
# 设置梯度检查点的参数,关闭重入功能以提高稳定性
--logging_steps 5 \
# 每5步记录一次日志
--max_length 2048 \
# 设置最大序列长度为2048
--output_dir ./swift_output/InternLM3-8B-Lora \
# 设置输出目录
--dataloader_num_workers 256 \
# 设置数据加载器的工作进程数为256,加速数据加载
--model_author JimmyMa99 \
# 设置模型作者信息
--model_name InternLM3-8B-Lora \
# 设置模型名称
> "$LOG_FILE" 2>&1 &
# 将标准输出和标准错误重定向到日志文件,并在后台运行
# 打印进程ID和日志文件位置
echo "Training started with PID $!"
# 显示训练进程的PID($!代表最近一个后台进程的PID)
echo "Log file: $LOG_FILE"
# 显示日志文件的路径
# 显示查看日志的命令
echo "To view logs in real-time, use:"
echo "tail -f $LOG_FILE"
# 提示用户如何实时查看日志文件内容
修改后的脚本
删除注释,把模型改为合并lora后的,数据集地址修改 记得删除 --report_to wandb \
LOG_DIR="logs"
mkdir -p $LOG_DIR
TIMESTAMP=$(date +"%Y%m%d_%H%M%S")
LOG_FILE="$LOG_DIR/internlm3-8b_lora_sft_${TIMESTAMP}.log"
export NPROC_PER_NODE=1
export OMP_NUM_THREADS=1
export CUDA_VISIBLE_DEVICES=0
nohup swift sft \
--model /root/swift_output/InternLM3-8B-Lora/v4-20250517-145158/checkpoint-144-merged \
--train_type lora \
--dataset '/root/datasets/train/swift_formatted_sft_train_data.jsonl' \
--torch_dtype bfloat16 \
--num_train_epochs 1 \
--per_device_train_batch_size 22 \
--learning_rate 1e-4 \
--warmup_ratio 0.1 \
--split_dataset_ratio 0 \
--report_to wandb \
--lora_rank 8 \
--lora_alpha 32 \
--target_modules all-linear \
--gradient_accumulation_steps 2 \
--save_steps 2000 \
--save_total_limit 5 \
--gradient_checkpointing_kwargs '{"use_reentrant": false}' \
--logging_steps 5 \
--max_length 2048 \
--output_dir ./swift_output/InternLM3-8B-SFT-Lora \
--dataloader_num_workers 256 \
--model_author JimmyMa99 \
--model_name InternLM3-8B-Lora \
> "$LOG_FILE" 2>&1 &
echo "Training started with PID $!"
echo "Log file: $LOG_FILE"
echo "To view logs in real-time, use:"
echo "tail -f $LOG_FILE"
爆显存
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 590.00 MiB. GPU 0 has a total capacity of 23.99 GiB of which 280.00 MiB is free. Process 971648 has 23.72 GiB memory in use. Of the allocated memory 20.29 GiB is allocated by PyTorch, and 2.48 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (pytorch.org/docs/stable…)
参数设置建议
关键优化说明
-
降低批量大小和增加梯度累积
- 将
per_device_train_batch_size从 22 降低到 1,同时将gradient_accumulation_steps从 2 增加到 4,这样有效批量大小保持为 4(1×4),但内存占用显著减少。
- 将
-
调整 LoRA 参数
- 将
lora_rank从 8 降低到 4,减少可训练参数数量 - 将
lora_alpha从 32 降低到 16,减小缩放因子
- 将
-
启用更高效的梯度检查点
- 将
use_reentrant设为true,使用更节省内存的重入式梯度检查点
- 将
-
减少序列长度
- 将
max_length从 2048 降低到 1024,减少每个样本的内存占用
- 将
-
优化数据加载
- 将
dataloader_num_workers从 256 降低到合理值(CPU 核心数的 1-2 倍),避免过多线程占用内存
- 将
修改后的参数
per_device_train_batch_size 22->2
lora_rank 8->16
lora_alpha 6->32
LOG_DIR="logs"
mkdir -p $LOG_DIR
TIMESTAMP=$(date +"%Y%m%d_%H%M%S")
LOG_FILE="$LOG_DIR/internlm3-8b_lora_sft_${TIMESTAMP}.log"
CPU_CORES=$(nproc)
WORKER_NUM=$((CPU_CORES > 8 ? 8 : CPU_CORES))
export NPROC_PER_NODE=1
export OMP_NUM_THREADS=1
export CUDA_VISIBLE_DEVICES=0
nohup swift sft \
--model /root/swift_output/InternLM3-8B-Lora/v4-20250517-145158/checkpoint-144-merged \
--train_type lora \
--dataset '/root/datasets/train/swift_formatted_sft_train_data.jsonl' \
--torch_dtype bfloat16 \
--num_train_epochs 1 \
--per_device_train_batch_size 2 \
--learning_rate 1e-4 \
--warmup_ratio 0.1 \
--split_dataset_ratio 0 \
--lora_rank 16 \
--lora_alpha 64 \
--target_modules all-linear \
--gradient_accumulation_steps 4 \
--save_steps 1000 \
--save_total_limit 5 \
--gradient_checkpointing_kwargs '{"use_reentrant": false}' \
--logging_steps 5 \
--max_length 2048 \
--output_dir ./swift_output/InternLM3-8B-SFT-Lora \
--dataloader_num_workers $WORKER_NUM \
--model_author JimmyMa99 \
--model_name InternLM3-8B-Lora \
> "$LOG_FILE" 2>&1 &
echo "Training started with PID $!"
echo "Log file: $LOG_FILE"
echo "To view logs in real-time, use:"
echo "tail -f $LOG_FILE"
2.2.2合并权重
运行完毕后
SFT权重结果地址:last_model_checkpoint: /root/swift_output/InternLM3-8B-SFT-Lora/v5-20250517-164316/checkpoint-62
conda activate ms-swift
swift export --adapters /root/swift_output/InternLM3-8B-SFT-Lora/v5-20250517-164316/checkpoint-62 --merge_lora true
合并权重完成地址: Successfully merged LoRA and saved in /root/swift_output/InternLM3-8B-SFT-Lora/v5-20250517-164316/checkpoint-62-merged.
3.推理测试
3.1新建 streamlit 代码
创建一个infer.py文件
import copy
import warnings
from dataclasses import asdict, dataclass
from typing import Callable, List, Optional
import streamlit as st
import torch
from torch import nn
from transformers.generation.utils import (LogitsProcessorList,
StoppingCriteriaList)
from transformers.utils import logging
from transformers import AutoTokenizer, AutoModelForCausalLM # isort: skip
# 加载模型和分词器
logger = logging.get_logger(__name__)
model_name_or_path="/root/swift_output/InternLM3-8B-SFT-Lora/v5-20250517-164316/checkpoint-62-merged"
# model_name_or_path="/root/swift_output/InternLM3-8B-SFT-Lora/v5-20250517-164316/checkpoint-62-merged"
print(f"加载模型:{model_path}")
@dataclass
class GenerationConfig:
# this config is used for chat to provide more diversity
max_length: int = 32768
top_p: float = 0.8
temperature: float = 0.8
do_sample: bool = True
repetition_penalty: float = 1.005
@torch.inference_mode()
def generate_interactive(
model,
tokenizer,
prompt,
generation_config: Optional[GenerationConfig] = None,
logits_processor: Optional[LogitsProcessorList] = None,
stopping_criteria: Optional[StoppingCriteriaList] = None,
prefix_allowed_tokens_fn: Optional[Callable[[int, torch.Tensor],
List[int]]] = None,
additional_eos_token_id: Optional[int] = None,
**kwargs,
):
inputs = tokenizer([prompt], padding=True, return_tensors='pt')
input_length = len(inputs['input_ids'][0])
for k, v in inputs.items():
inputs[k] = v.cuda()
input_ids = inputs['input_ids']
_, input_ids_seq_length = input_ids.shape[0], input_ids.shape[-1]
if generation_config is None:
generation_config = model.generation_config
generation_config = copy.deepcopy(generation_config)
model_kwargs = generation_config.update(**kwargs)
bos_token_id, eos_token_id = ( # noqa: F841 # pylint: disable=W0612
generation_config.bos_token_id,
generation_config.eos_token_id,
)
if isinstance(eos_token_id, int):
eos_token_id = [eos_token_id]
if additional_eos_token_id is not None:
eos_token_id.append(additional_eos_token_id)
has_default_max_length = kwargs.get(
'max_length') is None and generation_config.max_length is not None
if has_default_max_length and generation_config.max_new_tokens is None:
warnings.warn(
f"Using 'max_length''s default \
({repr(generation_config.max_length)}) \
to control the generation length. "
'This behaviour is deprecated and will be removed from the \
config in v5 of Transformers -- we'
' recommend using `max_new_tokens` to control the maximum \
length of the generation.',
UserWarning,
)
elif generation_config.max_new_tokens is not None:
generation_config.max_length = generation_config.max_new_tokens + \
input_ids_seq_length
if not has_default_max_length:
logger.warn( # pylint: disable=W4902
f"Both 'max_new_tokens' (={generation_config.max_new_tokens}) "
f"and 'max_length'(={generation_config.max_length}) seem to "
"have been set. 'max_new_tokens' will take precedence. "
'Please refer to the documentation for more information. '
'(https://huggingface.co/docs/transformers/main/'
'en/main_classes/text_generation)',
UserWarning,
)
if input_ids_seq_length >= generation_config.max_length:
input_ids_string = 'input_ids'
logger.warning(
f'Input length of {input_ids_string} is {input_ids_seq_length}, '
f"but 'max_length' is set to {generation_config.max_length}. "
'This can lead to unexpected behavior. You should consider'
" increasing 'max_new_tokens'.")
# 2. Set generation parameters if not already defined
logits_processor = logits_processor if logits_processor is not None \
else LogitsProcessorList()
stopping_criteria = stopping_criteria if stopping_criteria is not None \
else StoppingCriteriaList()
logits_processor = model._get_logits_processor(
generation_config=generation_config,
input_ids_seq_length=input_ids_seq_length,
encoder_input_ids=input_ids,
prefix_allowed_tokens_fn=prefix_allowed_tokens_fn,
logits_processor=logits_processor,
)
stopping_criteria = model._get_stopping_criteria(
generation_config=generation_config,
stopping_criteria=stopping_criteria)
logits_warper = model._get_logits_warper(generation_config)
unfinished_sequences = input_ids.new(input_ids.shape[0]).fill_(1)
scores = None
while True:
model_inputs = model.prepare_inputs_for_generation(
input_ids, **model_kwargs)
# forward pass to get next token
outputs = model(
**model_inputs,
return_dict=True,
output_attentions=False,
output_hidden_states=False,
)
next_token_logits = outputs.logits[:, -1, :]
# pre-process distribution
next_token_scores = logits_processor(input_ids, next_token_logits)
next_token_scores = logits_warper(input_ids, next_token_scores)
# sample
probs = nn.functional.softmax(next_token_scores, dim=-1)
if generation_config.do_sample:
next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
else:
next_tokens = torch.argmax(probs, dim=-1)
# update generated ids, model inputs, and length for next step
input_ids = torch.cat([input_ids, next_tokens[:, None]], dim=-1)
model_kwargs = model._update_model_kwargs_for_generation(
outputs, model_kwargs, is_encoder_decoder=False)
unfinished_sequences = unfinished_sequences.mul(
(min(next_tokens != i for i in eos_token_id)).long())
output_token_ids = input_ids[0].cpu().tolist()
output_token_ids = output_token_ids[input_length:]
for each_eos_token_id in eos_token_id:
if output_token_ids[-1] == each_eos_token_id:
output_token_ids = output_token_ids[:-1]
response = tokenizer.decode(output_token_ids)
yield response
# stop when each sentence is finished
# or if we exceed the maximum length
if unfinished_sequences.max() == 0 or stopping_criteria(
input_ids, scores):
break
def on_btn_click():
del st.session_state.messages
@st.cache_resource
def load_model():
model = (AutoModelForCausalLM.from_pretrained(
model_name_or_path,
trust_remote_code=True).to(torch.bfloat16).cuda())
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path,
trust_remote_code=True)
return model, tokenizer
def prepare_generation_config():
with st.sidebar:
max_length = st.slider('Max Length',
min_value=8,
max_value=32768,
value=32768)
top_p = st.slider('Top P', 0.0, 1.0, 0.8, step=0.01)
temperature = st.slider('Temperature', 0.0, 1.0, 0.7, step=0.01)
st.button('Clear Chat History', on_click=on_btn_click)
generation_config = GenerationConfig(max_length=max_length,
top_p=top_p,
temperature=temperature)
return generation_config
user_prompt = '<|im_start|>user\n{user}<|im_end|>\n'
robot_prompt = '<|im_start|>assistant\n{robot}<|im_end|>\n'
cur_query_prompt = '<|im_start|>user\n{user}<|im_end|>\n\
<|im_start|>assistant\n'
def combine_history(prompt):
messages = st.session_state.messages
meta_instruction = ('You are a helpful, honest, '
'and harmless AI assistant.')
total_prompt = f'<s><|im_start|>system\n{meta_instruction}<|im_end|>\n'
for message in messages:
cur_content = message['content']
if message['role'] == 'user':
cur_prompt = user_prompt.format(user=cur_content)
elif message['role'] == 'robot':
cur_prompt = robot_prompt.format(robot=cur_content)
else:
raise RuntimeError
total_prompt += cur_prompt
total_prompt = total_prompt + cur_query_prompt.format(user=prompt)
return total_prompt
def main():
st.title('internlm2_5-7b-chat-assistant')
# torch.cuda.empty_cache()
print('load model begin.')
model, tokenizer = load_model()
print('load model end.')
generation_config = prepare_generation_config()
# Initialize chat history
if 'messages' not in st.session_state:
st.session_state.messages = []
# Display chat messages from history on app rerun
for message in st.session_state.messages:
with st.chat_message(message['role'], avatar=message.get('avatar')):
st.markdown(message['content'])
# Accept user input
if prompt := st.chat_input('What is up?'):
# Display user message in chat message container
with st.chat_message('user', avatar='user'):
st.markdown(prompt)
real_prompt = combine_history(prompt)
# Add user message to chat history
st.session_state.messages.append({
'role': 'user',
'content': prompt,
'avatar': 'user'
})
with st.chat_message('robot', avatar='assistant'):
message_placeholder = st.empty()
for cur_response in generate_interactive(
model=model,
tokenizer=tokenizer,
prompt=real_prompt,
additional_eos_token_id=92542,
device='cuda:0',
**asdict(generation_config),
):
# Display robot response in chat message container
message_placeholder.markdown(cur_response + '▌')
message_placeholder.markdown(cur_response)
# Add robot response to chat history
st.session_state.messages.append({
'role': 'robot',
'content': cur_response, # pylint: disable=undefined-loop-variable
'avatar': 'assistant',
})
torch.cuda.empty_cache()
if __name__ == '__main__':
main()
3.2 运行
映射配置
ssh -p xxxxx root@ssh.intern-ai.org.cn -CNg -L {本地机器_PORT}:127.0.0.1:{开发机_PORT} -o StrictHostKeyChecking=no
在win的Powellshell输入
ssh -p xxxxxx root@ssh.intern-ai.org.cn -CNg -L 8502:127.0.0.1:8503 -o StrictHostKeyChecking=no
xxxx换成自己的开发机
-L 8502:127.0.0.1:8503 意思就是将开发机的127.0.0.1:8503映射到本机win的8502端口
然后win下访问127.0.0.1:8502
运行
conda activate ms-swift
pip install streamlit==1.31.0
streamlit run /root/infer.py
streamlit run /root/infer.py --browser.gatherUsageStats false
4.模型评测(可跳)
环境配置
git clone https://github.moeyy.xyz/https://github.com/open-compass/opencompass opencompass
cd opencompass
conda create -n opencompass python=3.10 -y
conda activate opencompass
pip install -e .
pip install sentencepiece
5.模型提交
提交入口
aicarrier.feishu.cn/share/base/…
swift export \
--model output/vx-xxx/checkpoint-xxx \
--push_to_hub true \
--hub_model_id '<model-id>' \
--hub_token '<sdk-token>' \
--use_hf false
上传配置
- 创建仓库
- 我的仓库地址:www.modelscope.cn/models/Rave…
- model_id
Raven10086/gmz-camp5
- 访问令牌
5.1魔搭上传
下载git
apt-get install git-lfs
git lfs install
swift提交
swift export \
--model /root/paper/config/swift_output/InternLM3-8B-Lora-SFT/v3-20250510-231854/checkpoint-21-merged\
--push_to_hub true \
--hub_model_id 'zhangfc12345678/zfc-camp5' #替换成自己的仓库id \
--hub_token '03fb4fxx' \ #替换成自己的令牌
--use_hf false
model upload
www.modelscope.cn/docs/models…
参考代码
运行
conda activate ms-swift
python /root/upload.py
from modelscope.hub.api import HubApi
YOUR_ACCESS_TOKEN='自己的令牌'
api=HubApi()
api.login(YOUR_ACCESS_TOKEN)
from modelscope.hub.constants import Licenses, ModelVisibility
owner_name='Raven10086'
model_name='InternLM-gmz-camp5'
model_id=f"{owner_name}/{model_name}"
api.create_model(
model_id,
visibility=ModelVisibility.PUBLIC,
license=Licenses.APACHE_V2,
chinese_name="gmz文本分类微调端侧小模型"
)
api.upload_folder(
repo_id=f"{owner_name}/{model_name}",
folder_path='/root/swift_output/InternLM3-8B-SFT-Lora/v5-20250517-164316/checkpoint-62-merged',
commit_message='fast commit',
)
提交问卷
提交表单:aicarrier.feishu.cn/share/base/…
成绩表单:aicarrier.feishu.cn/share/base/…