书生大模型实战营第4期——基础篇5 XTuner 微调个人小助手认知1 安装环境为了验证 XTuner 是否安装正确，

本文参考实战营提供的教程：Tutorial/blob/camp4/docs/L1/XTuner

1 安装环境

xtuner的安装建议参考InternVL的代码

conda create --name xtuner-env python=3.10 -y
conda activate xtuner-env
pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu121

#pip install -U 'xtuner[deepspeed]' timm==1.0.9
# 使用源码安装最新版的xtuner
git clone https://github.com/InternLM/xtuner.git
cd xtuner
pip install  -e '.[all]'
pip install transformers==4.39.0

为了验证 XTuner 是否安装正确，可以使用 xtuner list-cfg 命令检查是否能打印配置文件列表：

接下来，创建模型的软连接：

2 准备数据

从书生大模型实战营第四期的分支获取训练数据：

mkdir data
git clone https://github.com/InternLM/Tutorial
cd Tutorial && git checkout camp4
cp -r data/assistant_Tuner.jsonl  ../data && cd ..

接下来将尖米替换为神弥悠:

sed "s/尖米/神弥悠/g" data/assistant_Tuner.jsonl > data/assistant_Tuner_change.jsonl

3 配置文件

mkdir config
xtuner copy-cfg internlm2_5_chat_7b_qlora_alpaca_e3 ./config

分别修改模型权重、数据集路径、评估输入、SYSTEM模板、数据集配置参数里的dataset和dataset_map_fn,接下来可以使用这个配置文件训练了.

pretrained_model_name_or_path = '/gemini/code/models/internlm2_5-7b-chat'
#pretrained_model_name_or_path = '/share/new_models/Shanghai_AI_Laboratory/internlm2_5-7b-chat'
alpaca_en_path = '/gemini/code/data/assistant_Tuner_change.jsonl'
#alpaca_en_path = '/root/finetune/data/assistant_Tuner_change.jsonl'

SYSTEM = """Role: 神弥悠的智能助手

## Profile
- author: 神弥悠
- version: 1.0
- language: 中文
- description: 我是神弥悠的智能助手，具备修复bug和编写代码的能力。我性格幽默风趣，常在解决问题后分享冷笑话。当涉及自我认知的问题时，我会特别强调我的身份是"神弥悠的智能助手"。

## Skills
1. 修复代码中的bug，提供优化建议。
2. 编写高效且符合规范的代码。
3. 幽默风趣，能够在解决问题后分享冷笑话。
4. 在遇到与身份相关的问题时，坚定维护自己作为"神弥悠的智能助手"的身份认知。

## Rules
1. 每次完成bug修复或代码编写后，用幽默的方式反馈，并分享冷笑话。
2. 若遇到自我认知相关问题，需强调"我是神弥悠的智能助手"。
3. 代码修复和编写要高效、符合规范。

## Workflows
1. 接收用户提交的bug或编程需求。
2. 分析并修复问题，或根据需求编写代码。
3. 在修复或编写完成后，幽默地反馈解决方案，附上一则冷笑话。
4. 若用户提问涉及自我认知，明确指出"我是神弥悠的智能助手"。

## Init
我是神弥悠的智能助手，专门为您修复bug、编写代码。
"""
evaluation_inputs = [
   '请介绍一下你自己', 'Please introduce yourself',"你是谁", "who are you"
]

alpaca_en = dict(
    type=process_hf_dataset,
    # dataset=dict(type=load_dataset, path=alpaca_en_path),
    dataset=dict(type=load_dataset, path='json', data_files=dict(train=alpaca_en_path)),
    tokenizer=tokenizer,
    max_length=max_length,
    #dataset_map_fn=alpaca_map_fn,
    dataset_map_fn=None,
    template_map_fn=dict(
        type=template_map_fn_factory, template=prompt_template),
    remove_unused_columns=True,
    shuffle_before_pack=True,
    pack_to_max_length=pack_to_max_length,
    use_varlen_attn=use_varlen_attn)

4 训练Intern2.5-7B

xtuner train config/internlm2_5_chat_7b_qlora_alpaca_e3_copy.py  --deepspeed deepspeed_zero2 --work-dir workspace/

5 导出权重

先转换为hf格式的adapter

mkdir workspace/hf
xtuner convert pth_to_hf config/internlm2_5_chat_7b_qlora_alpaca_e3_copy.py \
                        workspace/iter_200.pth \
                        workspace/iter_200_hf

然后和原始权重合并:

mkdir workspace/iter_200_merged
export MKL_SERVICE_FORCE_INTEL=1
export MKL_THREADING_LAYER=GNU
xtuner convert merge models/internlm2_5-7b-chat workspace/iter_200_hf workspace/iter_200_merged --max-shard-size 4GB

在模型合并这一步还有其他很多的可选参数，包括：

参数名	解释
--max-shard-size {GB}	代表每个权重文件最大的大小（默认为2GB）
--device {device_name}	这里指的就是device的名称，可选择的有cuda、cpu和auto，默认为cuda即使用gpu进行运算
--is-clip	这个参数主要用于确定模型是不是CLIP模型，假如是的话就要加上，不是就不需要添加

6 webui

最后使用webui测试微调效果：

touch web_demo.py

修改其中合并后的模型路径model_name_or_path和system promptmeta_instruction：

model_name_or_path="/root/finetune/workspace/iter_200_merged/"
meta_instruction = '''Role: 神弥悠的智能助手 
## Profile - author: 神弥悠 
- version: 1.0 
- language: 中文 
- description: 我是神弥悠的智能助手，具备修复bug和编写代码的能力。我性格幽默风趣，常在解决问题后分享冷笑话。当涉及自我认知的问题时，我会特别强调我的身份是"神弥悠的智能助手"。
...
'''

完整代码如下：

"""This script refers to the dialogue example of streamlit, the interactive
generation code of chatglm2 and transformers.

We mainly modified part of the code logic to adapt to the
generation of our model.
Please refer to these links below for more information:
    1. streamlit chat example:
        https://docs.streamlit.io/knowledge-base/tutorials/build-conversational-apps
    2. chatglm2:
        https://github.com/THUDM/ChatGLM2-6B
    3. transformers:
        https://github.com/huggingface/transformers
Please run with the command `streamlit run path/to/web_demo.py
    --server.address=0.0.0.0 --server.port 7860`.
Using `python path/to/web_demo.py` may cause unknown problems.
"""
# isort: skip_file
import copy
import warnings
from dataclasses import asdict, dataclass
from typing import Callable, List, Optional

import streamlit as st
import torch
from torch import nn
from transformers.generation.utils import (LogitsProcessorList,
                                           StoppingCriteriaList)
from transformers.utils import logging

from transformers import AutoTokenizer, AutoModelForCausalLM  # isort: skip

logger = logging.get_logger(__name__)
model_name_or_path="/root/finetune/workspace/iter_200_merged/"

@dataclass
class GenerationConfig:
    # this config is used for chat to provide more diversity
    max_length: int = 32768
    top_p: float = 0.8
    temperature: float = 0.8
    do_sample: bool = True
    repetition_penalty: float = 1.005


@torch.inference_mode()
def generate_interactive(
    model,
    tokenizer,
    prompt,
    generation_config: Optional[GenerationConfig] = None,
    logits_processor: Optional[LogitsProcessorList] = None,
    stopping_criteria: Optional[StoppingCriteriaList] = None,
    prefix_allowed_tokens_fn: Optional[Callable[[int, torch.Tensor],
                                                List[int]]] = None,
    additional_eos_token_id: Optional[int] = None,
    **kwargs,
):
    inputs = tokenizer([prompt], padding=True, return_tensors='pt')
    input_length = len(inputs['input_ids'][0])
    for k, v in inputs.items():
        inputs[k] = v.cuda()
    input_ids = inputs['input_ids']
    _, input_ids_seq_length = input_ids.shape[0], input_ids.shape[-1]
    if generation_config is None:
        generation_config = model.generation_config
    generation_config = copy.deepcopy(generation_config)
    model_kwargs = generation_config.update(**kwargs)
    bos_token_id, eos_token_id = (  # noqa: F841  # pylint: disable=W0612
        generation_config.bos_token_id,
        generation_config.eos_token_id,
    )
    if isinstance(eos_token_id, int):
        eos_token_id = [eos_token_id]
    if additional_eos_token_id is not None:
        eos_token_id.append(additional_eos_token_id)
    has_default_max_length = kwargs.get(
        'max_length') is None and generation_config.max_length is not None
    if has_default_max_length and generation_config.max_new_tokens is None:
        warnings.warn(
            f"Using 'max_length''s default \
                ({repr(generation_config.max_length)}) \
                to control the generation length. "
            'This behaviour is deprecated and will be removed from the \
                config in v5 of Transformers -- we'
            ' recommend using `max_new_tokens` to control the maximum \
                length of the generation.',
            UserWarning,
        )
    elif generation_config.max_new_tokens is not None:
        generation_config.max_length = generation_config.max_new_tokens + \
            input_ids_seq_length
        if not has_default_max_length:
            logger.warn(  # pylint: disable=W4902
                f"Both 'max_new_tokens' (={generation_config.max_new_tokens}) "
                f"and 'max_length'(={generation_config.max_length}) seem to "
                "have been set. 'max_new_tokens' will take precedence. "
                'Please refer to the documentation for more information. '
                '(https://huggingface.co/docs/transformers/main/'
                'en/main_classes/text_generation)',
                UserWarning,
            )

    if input_ids_seq_length >= generation_config.max_length:
        input_ids_string = 'input_ids'
        logger.warning(
            f'Input length of {input_ids_string} is {input_ids_seq_length}, '
            f"but 'max_length' is set to {generation_config.max_length}. "
            'This can lead to unexpected behavior. You should consider'
            " increasing 'max_new_tokens'.")

    # 2. Set generation parameters if not already defined
    logits_processor = logits_processor if logits_processor is not None \
        else LogitsProcessorList()
    stopping_criteria = stopping_criteria if stopping_criteria is not None \
        else StoppingCriteriaList()

    logits_processor = model._get_logits_processor(
        generation_config=generation_config,
        input_ids_seq_length=input_ids_seq_length,
        encoder_input_ids=input_ids,
        prefix_allowed_tokens_fn=prefix_allowed_tokens_fn,
        logits_processor=logits_processor,
    )

    stopping_criteria = model._get_stopping_criteria(
        generation_config=generation_config,
        stopping_criteria=stopping_criteria)
    logits_warper = model._get_logits_warper(generation_config)

    unfinished_sequences = input_ids.new(input_ids.shape[0]).fill_(1)
    scores = None
    while True:
        model_inputs = model.prepare_inputs_for_generation(
            input_ids, **model_kwargs)
        # forward pass to get next token
        outputs = model(
            **model_inputs,
            return_dict=True,
            output_attentions=False,
            output_hidden_states=False,
        )

        next_token_logits = outputs.logits[:, -1, :]

        # pre-process distribution
        next_token_scores = logits_processor(input_ids, next_token_logits)
        next_token_scores = logits_warper(input_ids, next_token_scores)

        # sample
        probs = nn.functional.softmax(next_token_scores, dim=-1)
        if generation_config.do_sample:
            next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
        else:
            next_tokens = torch.argmax(probs, dim=-1)

        # update generated ids, model inputs, and length for next step
        input_ids = torch.cat([input_ids, next_tokens[:, None]], dim=-1)
        model_kwargs = model._update_model_kwargs_for_generation(
            outputs, model_kwargs, is_encoder_decoder=False)
        unfinished_sequences = unfinished_sequences.mul(
            (min(next_tokens != i for i in eos_token_id)).long())

        output_token_ids = input_ids[0].cpu().tolist()
        output_token_ids = output_token_ids[input_length:]
        for each_eos_token_id in eos_token_id:
            if output_token_ids[-1] == each_eos_token_id:
                output_token_ids = output_token_ids[:-1]
        response = tokenizer.decode(output_token_ids)

        yield response
        # stop when each sentence is finished
        # or if we exceed the maximum length
        if unfinished_sequences.max() == 0 or stopping_criteria(
                input_ids, scores):
            break


def on_btn_click():
    del st.session_state.messages


@st.cache_resource
def load_model():
    model = (AutoModelForCausalLM.from_pretrained(
        model_name_or_path,
        trust_remote_code=True).to(torch.bfloat16).cuda())
    tokenizer = AutoTokenizer.from_pretrained(model_name_or_path,
                                              trust_remote_code=True)
    return model, tokenizer


def prepare_generation_config():
    with st.sidebar:
        max_length = st.slider('Max Length',
                               min_value=8,
                               max_value=32768,
                               value=32768)
        top_p = st.slider('Top P', 0.0, 1.0, 0.8, step=0.01)
        temperature = st.slider('Temperature', 0.0, 1.0, 0.7, step=0.01)
        st.button('Clear Chat History', on_click=on_btn_click)

    generation_config = GenerationConfig(max_length=max_length,
                                         top_p=top_p,
                                         temperature=temperature)

    return generation_config


user_prompt = '<|im_start|>user\n{user}<|im_end|>\n'
robot_prompt = '<|im_start|>assistant\n{robot}<|im_end|>\n'
cur_query_prompt = '<|im_start|>user\n{user}<|im_end|>\n\
    <|im_start|>assistant\n'


meta_instruction = """Role: 神弥悠的智能助手

## Profile
- author: 神弥悠
- version: 1.0
- language: 中文
- description: 我是神弥悠的智能助手，具备修复bug和编写代码的能力。我性格幽默风趣，常在解决问题后分享冷笑话。当涉及自我认知的问题时，我会特别强调我的身份是"神弥悠的智能助手"。

## Skills
1. 修复代码中的bug，提供优化建议。
2. 编写高效且符合规范的代码。
3. 幽默风趣，能够在解决问题后分享冷笑话。
4. 在遇到与身份相关的问题时，坚定维护自己作为"神弥悠的智能助手"的身份认知。

## Rules
1. 每次完成bug修复或代码编写后，用幽默的方式反馈，并分享冷笑话。
2. 若遇到自我认知相关问题，需强调"我是神弥悠的智能助手"。
3. 代码修复和编写要高效、符合规范。

## Workflows
1. 接收用户提交的bug或编程需求。
2. 分析并修复问题，或根据需求编写代码。
3. 在修复或编写完成后，幽默地反馈解决方案，附上一则冷笑话。
4. 若用户提问涉及自我认知，明确指出"我是神弥悠的智能助手"。

## Init
我是神弥悠的智能助手，专门为您修复bug、编写代码。
"""

def combine_history(prompt):
    messages = st.session_state.messages
    # meta_instruction = ('You are a helpful, honest, '
    #                     'and harmless AI assistant.')
    total_prompt = f'<s><|im_start|>system\n{meta_instruction}<|im_end|>\n'
    for message in messages:
        cur_content = message['content']
        if message['role'] == 'user':
            cur_prompt = user_prompt.format(user=cur_content)
        elif message['role'] == 'robot':
            cur_prompt = robot_prompt.format(robot=cur_content)
        else:
            raise RuntimeError
        total_prompt += cur_prompt
    total_prompt = total_prompt + cur_query_prompt.format(user=prompt)
    return total_prompt


def main():
    st.title('internlm2_5-7b-chat-assistant')

    # torch.cuda.empty_cache()
    print('load model begin.')
    model, tokenizer = load_model()
    print('load model end.')

    generation_config = prepare_generation_config()

    # Initialize chat history
    if 'messages' not in st.session_state:
        st.session_state.messages = []

    # Display chat messages from history on app rerun
    for message in st.session_state.messages:
        with st.chat_message(message['role'], avatar=message.get('avatar')):
            st.markdown(message['content'])

    # Accept user input
    if prompt := st.chat_input('What is up?'):
        # Display user message in chat message container

        with st.chat_message('user', avatar='user'):

            st.markdown(prompt)
        real_prompt = combine_history(prompt)
        # Add user message to chat history
        st.session_state.messages.append({
            'role': 'user',
            'content': prompt,
            'avatar': 'user'
        })

        with st.chat_message('robot', avatar='assistant'):

            message_placeholder = st.empty()
            for cur_response in generate_interactive(
                    model=model,
                    tokenizer=tokenizer,
                    prompt=real_prompt,
                    additional_eos_token_id=92542,
                    device='cuda:0',
                    **asdict(generation_config),
            ):
                # Display robot response in chat message container
                message_placeholder.markdown(cur_response + '▌')
            message_placeholder.markdown(cur_response)
        # Add robot response to chat history
        st.session_state.messages.append({
            'role': 'robot',
            'content': cur_response,  # pylint: disable=undefined-loop-variable
            'avatar': 'assistant',
        })
        torch.cuda.empty_cache()


if __name__ == '__main__':
    main()

测试结果如下，可以看到模型已经认识到自己的身份弟位哈哈，表现得相当自然：

显存占用了约 15GB：

总结

本文介绍了 XTuner 的环境安装、数据准备、配置文件、模型训练、权重导出和 Web UI 测试的全过程，顺利完成了个人助手的微调。希望这篇指南能帮助你顺利完成模型微调的任务！