在单个 GPU 上微调量化大型语言模型 (Falcon-7B)本笔记本演示了如何在单个 GPU 上微调最先进的大型语言模

本笔记本演示了如何在单个 GPU 上微调最先进的大型语言模型 (LLM) 。此示例使用 Falcon-7B，因为它已获得 Apache 许可。本笔记本中使用的数据仅供参考，除非您已获得许可，否则请勿使用此数据。

关于模型

该笔记本使用阿联酋TII 的 Falcon-7B LLM 。它是一个仅包含 70 亿参数的解码器变压器模型，使用来自清理、整理的Refined Web数据集的 1.5 万亿个令牌进行训练。他们认为，他们最先进的性能很大程度上归功于训练数据的质量。

关于数据

我选择使用 Falcon 模型的原始预训练版本而不是聊天训练模型来简化微调数据；即没有预期的问答格式。

这些数据的目标是帮助模型生成新的歌词，但数据集非常小，只有数百个示例长。需要包含更多示例的数据集才能将其转化为有用的东西。请注意，除非您获得许可，否则请勿使用这些数据。

先决条件

该笔记本是针对V100Google Colab 中的机器开发的。A100它也应该适用于 a ，但不适用于T4. 请注意，向训练包装器添加评估将使用过多的 GPU 内存。

Python 依赖项

该bitsandbytes库提供了量化包装器，以帮助将模型装入我们微薄的 GPU RAM 中。
transformers，accelerate并且datasets都提供了骨架训练代码。
peft提供微调适配器，因此您不必微调整个模型。

!pip install -q bitsandbytes==0.41.1 transformers==4.33.3 accelerate==0.23.0 datasets==2.14.5 einops==0.6.1
!pip install -q -U git+https://github.com/huggingface/peft.git@69665f24e98dc5f20a430637a31f196158b6e0da


import os
import re
import bitsandbytes as bnb
import pandas as pd
import torch
import torch.nn as nn
import transformers
from datasets import  load_dataset, Dataset, Value
from peft import (
    LoraConfig,
    PeftConfig,
    get_peft_model,
    prepare_model_for_kbit_training,
)
from transformers import (
    AutoConfig,
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
)

模型

下一批代码将下载并导入模型和分词器。

model_id = "tiiuae/falcon-7b"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    load_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    trust_remote_code=True,
    quantization_config=bnb_config,
)
model = prepare_model_for_kbit_training(model)

tokenizer = AutoTokenizer.from_pretrained(model_id, add_eos_token=True)
tokenizer.pad_token = tokenizer.eos_token

示例生成

让我们生成一个初始示例，看看模型在没有微调的情况下如何执行。以下是以正确方式调用推理函数的辅助函数。请记下此处的生成设置。

def generate(prompt="[Intro]") -> str:
  inputs = tokenizer(prompt, padding=True, truncation=True, return_tensors="pt").to("cuda:0")
  # More info about generation options: https://huggingface.co/blog/how-to-generate
  outputs = model.generate(
      input_ids=inputs['input_ids'],
      attention_mask=inputs['attention_mask'],
      do_sample=True,
      top_p=0.92,
      top_k=0,
      max_new_tokens=50)
  return tokenizer.decode(outputs[0], skip_special_tokens=True)

在接下来的测试生成中，我删除了模型生成的一些令人反感的内容。

print(generate())


[Intro]
Yeah, yeah, yeah
Yeah, yeah, yeah
Yeah, yeah, yeah
Yeah, yeah, yeah
[Verse 1]
I'm a young $#¡!&$, I'm a young $#¡!&$
I'm

LoRapeft配置

以下配置控制用于微调模型的适配器。

config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["query_key_value"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, config)

数据

接下来，您应该加载并格式化您的微调数据。记下预期的格式。您可以在下面看到训练数据的示例。

data = load_dataset(PATH_TO_LYRICS_DATASET, split="train")
data


Dataset({
    features: ['Unnamed: 0', 'number', 'title', 'artist', 'lyrics', 'album', 'lyrics_length'],
    num_rows: 180
})
lyrics = data["lyrics"]
lyrics[0]


'[Intro]
Shoot me
Shoot me
Shoot me
Shoot me

[Verse 1]
Here come old flat-top, he come groovin' up slowly'
...

清理数据

这部分代码获取原始数据并生成可供训练的干净版本。我发现当我将歌词分成不同的诗句并在开头使用一个键来表示它是什么类型的诗句（例如诗句、副歌、前奏等）时，我获得了最好的结果。这些键已经存在于原始数据中。

def raw_lyrics():
  for lyrics in data["lyrics"]:
    full_prompt = lyrics + tokenizer.eos_token
    tokenized_full_prompt = tokenizer(full_prompt, padding=True, truncation=True)
    yield {"lyrics": lyrics, **tokenized_full_prompt}

def split_all():
  for lyrics in data["lyrics"]:
    verses = re.split('[.*]', lyrics)
    verses = filter(lambda a: len(a.strip()) > 0, verses)
    for v in verses:
      full_prompt = create_prompt(v + tokenizer.eos_token)
      tokenized_full_prompt = tokenizer(full_prompt, padding=True, truncation=True)
      yield {"verse": v, **tokenized_full_prompt}

def split_verses():
  for lyrics in data["lyrics"]:
    verses = re.findall(r"[\S\n\t\v ]*?(?:\n(?=[)|$)", lyrics)
    verses = filter(lambda a: len(a.strip()) > 0, verses)
    for v in verses:
      full_prompt = v + tokenizer.eos_token
      tokenized_full_prompt = tokenizer(full_prompt, padding=True, truncation=True)
      yield {"verse": v, **tokenized_full_prompt}

dataset = Dataset.from_generator(split_verses)
print(dataset[0]["verse"])
print(dataset[1]["verse"])
print(dataset[999]["verse"])


Generating train split: 0 examples [00:00, ? examples/s]


[Intro]
Shoot me
Shoot me
Shoot me
Shoot me


[Verse 1]
Here come old flat-top, he come groovin' up slowly
He got ju-ju eyeball, he one holy roller
He got hair down to his knee
Got to be a joker, he just do what he please


[Bridge]
In a couple of years, they have built a home sweet home
With a couple of kids running in the yard
Of Desmond and Molly Jones (Ha, ha, ha, ha, ha, ha)

训练

下面配置微调参数。请注意，这个“帮助器”函数有无限多个参数，因此请仔细阅读文档。

这里的关键设置是训练步骤/时期的数量和批量大小。Transformers通常足够聪明，可以自行找出最佳设置，但有时您需要更严格的控制（例如，如果您使用的是小型 GPU）。

我发现从损失的角度来看，30 个 epoch 是最好的。我在这里没有使用任何有用的评估措施（以节省 GPU RAM），因此我无法建议这是否是最佳的。

30 个 epoch 大约需要一个小时V100；这不是一件很快的事。;-)

training_args = transformers.TrainingArguments(
    auto_find_batch_size=True, # Try to auto-find a batch size. 
    # Also see https://huggingface.co/google/flan-ul2/discussions/16#64c8bdaf4cc48498134a0271
    learning_rate=2e-4,
    # bf16=True, # Only on A100
    fp16=True, # On V100
    save_total_limit=4,
    # warmup_steps=2,
    num_train_epochs=30, # Total number of training epochs.It stablised after 30.
    output_dir='checkpoints',
    save_strategy='epoch',
    report_to="none",
    logging_steps=25, # Number of steps between logs.
    save_safetensors=True,
    load_best_model_at_end=True,
    metric_for_best_model='accuracy',
)
trainer = transformers.Trainer(
    model=model,
    train_dataset=dataset,
    # eval_dataset=dataset["test"], # 16GB GPU not big enough
    args=training_args,
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
    # compute_metrics=compute_metrics,
)
model.config.use_cache = False


trainer.train(resume_from_checkpoint=False) # Set to true if resuming


trainer.save_model("final_model")


transformers.logging.set_verbosity_error()

贮存

以下部分将训练后的模型导出到我的个人 GDrive，以便在另一个推理笔记本中使用。

import tarfile
import os.path

def make_tarfile(output_filename, source_dir):
    with tarfile.open(output_filename, "w:gz") as tar:
        tar.add(source_dir, arcname=os.path.basename(source_dir))

make_tarfile("final_model.tar.gz", "final_model")

import locale
locale.getpreferredencoding = lambda: "UTF-8"
!cp final_model.tar.gz ./path/to/safe/location

推理

现在是时候尝试我们的模型了。让我们加载保存的模型权重并重新创建必要的辅助函数。

解压缩微调权重

!cp ./drive/MyDrive/Demos/230927_beatles/final_model.tar.gz .
import tarfile
import os
tar = tarfile.open("final_model.tar.gz")
tar.extractall()
tar.close()

安装先决条件

!pip install -q bitsandbytes==0.41.1 transformers==4.33.3 accelerate==0.23.0 datasets==2.14.5 einops==0.6.1
!pip install -q -U git+https://github.com/huggingface/peft.git@69665f24e98dc5f20a430637a31f196158b6e0da


import os
import re
import bitsandbytes as bnb
import pandas as pd
import torch
import torch.nn as nn
import transformers
from datasets import  load_dataset, Dataset, Value
from peft import (
    LoraConfig,
    PeftConfig,
    get_peft_model,
    PeftModel,
    prepare_model_for_kbit_training,
)
from transformers import (
    AutoConfig,
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
)

加载基础模型

model_id = "tiiuae/falcon-7b"
adapters_name = "final_model"

print(f"Starting to load the model {model_id} into memory")

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    load_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    trust_remote_code=True,
    quantization_config=bnb_config,
)
model = prepare_model_for_kbit_training(model)
tokenizer = AutoTokenizer.from_pretrained(model_id, add_eos_token=True)
tokenizer.pad_token = tokenizer.eos_token

model = PeftModel.from_pretrained(model, adapters_name)
model = model.merge_and_unload()

相同的推理函数

def generate(prompt="[Intro]") -> str:
  inputs = tokenizer(prompt, padding=True, truncation=True, return_tensors="pt").to("cuda:0")
  # More info about generation options: https://huggingface.co/blog/how-to-generate
  outputs = model.generate(
      input_ids=inputs['input_ids'],
      attention_mask=inputs['attention_mask'],
      do_sample=True,
      top_p=0.92,
      top_k=0,
      max_new_tokens=50)
  return tokenizer.decode(outputs[0], skip_special_tokens=True)

歌词生成

我们走吧！请注意我如何使用训练数据中的键来提示模型。永远记住，法学硕士只是“预测”下一个单词。

让我们从通用的内容开始，然后尝试设计提示以产生更相关的内容。

transformers.logging.set_verbosity_error()
print("\n[" + generate("[Intro]\n").split('[')[1])
print("\n[" + generate("[Verse 1]\n").split('[')[1])
print("\n[" + generate("[Bridge]\n").split('[')[1])
print("\n[" + generate("[Chorus]\n").split('[')[1])
print("\n[" + generate("[Verse 2]\n").split('[')[1])
print("\n[" + generate("[Outro]\n").split('[')[1])


[Intro]
One, two, three, four
One, two...  (One, two, three, four)

(Yahoo)

(I wanna be your dog)

(Yahoo)

(I wanna be your dog)

[Verse 1]
And the band played on
And the people came, and they saw that it was good
And they were satisfied
They were satisfied

(I say the word)
(She says the word)

(And they'll understand)


[Bridge]
Oh how long will it take
Till she sees the mistake she has made
Till she sees the mistake she has made

(One, two, three, four, five, six, seven, eight, nine, ten, eleven!)

[Chorus]
Come on (Come on), Come on (Come on)
Come on (Come on), Come on (Come on)
Please please me, whoa yeah, like I please you
Like I please you

(Come

[Verse 2]
Ring, my friend I said you'd call
Doctor Robert
Early morning rain
Doctor Robert
Fool, you don't need him does he fool you does he?
Doctor Robert
Doctor Robert
Doctor Robert

(Ring

[Outro]
I don't want to leave her now
You know I believe and how
I hope she will forgive me somehow
When I see her, I start to sing

(Sing it again)

(Oh yeah, sing it again)
print("\n[" + generate("[Intro]\nThis is a song about language models\n").split('[')[1])
print("\n[" + generate("[Verse 1]\nIn a deep dive, you learned how they work\n").split('[')[1])
print("\n[" + generate("[Bridge]\nBut wait, the data\n").split('[')[1])
print("\n[" + generate("[Chorus]\nThis is a language model\n").split('[')[1])
print("\n[" + generate("[Verse 2]\nNext you want to deploy\n").split('[')[1])
print("\n[" + generate("[Outro]\nI hoped you enjoyed this talk\n").split('[')[1])


[Intro]
This is a song about language models
And the structures that they contain
And the way that they repeat themselves
And the words that they surround



[Verse 1]
In a deep dive, you learned how they work
And now you're part of the corporation
They'll take you in, screwed up or torn
Solve all your problems for a price or a song

(Oh!)

'Cause they'll be there, awaiting your call


[Bridge]
But wait, the data
Presents another approach
You may observe the people passing by
And you'll soon realize
That they're all living lives that are prescribed
And you'll see that they're all the same
And you know it's

[Chorus]
This is a language model
It's called Smokey Tongue
It can help you if you let it
It can help you if you let it
It can help you if you let it
It can help you if you let it

(Instrumental Break

[Verse 2]
Next you want to deploy
The same old thing again
If I've said it once I've said it a hundred times
It's no use, you know, you'll never get it in your mind
If I've said it once I'

[Outro]
I hoped you enjoyed this talk
And trust you will come again
The next time we'll talk
We'll have another tea and scone
But for now it's time to say good-bye
Good-bye

(She's leaving home)