【compshare】（4）：使用UCloud(优刻得)的compshare算力平台，新增加Llama-factory镜像，可以快速启动，非常方便，部署特别简单

支持直接创建

在这里插入图片描述

2，登陆服务器启动

ssh ubuntu@xxx.xxx.xxx.xxx

当前镜像已预装nvidia-driver,cuda,conda,LLaMA-Factory。
LLaMA-Factory启动方式:
conda activate llama_factory
cd /home/ubuntu/LLaMA-Factory
export CUDA_VISIBLE_DEVICES=0
nohup python src/train_web.py > train_web.log 2>&1 &
启动好之后，通过浏览器访问: http://ip:7860即可，注意ip需要替换为云主机的外网ip，


python src/train_web.py
Traceback (most recent call last):
  File "/home/ubuntu/LLaMA-Factory/src/train_web.py", line 1, in <module>
    from llmtuner import create_ui
  File "/home/ubuntu/LLaMA-Factory/src/llmtuner/__init__.py", line 3, in <module>
    from .api import create_app
  File "/home/ubuntu/LLaMA-Factory/src/llmtuner/api/__init__.py", line 1, in <module>
    from .app import create_app
  File "/home/ubuntu/LLaMA-Factory/src/llmtuner/api/app.py", line 6, in <module>
    from pydantic import BaseModel
ModuleNotFoundError: No module named 'pydantic'


python src/train_web.py
Traceback (most recent call last):
  File "/home/ubuntu/LLaMA-Factory/src/train_web.py", line 1, in <module>
    from llmtuner import create_ui
  File "/home/ubuntu/LLaMA-Factory/src/llmtuner/__init__.py", line 3, in <module>
    from .api import create_app
  File "/home/ubuntu/LLaMA-Factory/src/llmtuner/api/__init__.py", line 1, in <module>
    from .app import create_app
  File "/home/ubuntu/LLaMA-Factory/src/llmtuner/api/app.py", line 8, in <module>
    from ..chat import ChatModel
  File "/home/ubuntu/LLaMA-Factory/src/llmtuner/chat/__init__.py", line 2, in <module>
    from .chat_model import ChatModel
  File "/home/ubuntu/LLaMA-Factory/src/llmtuner/chat/chat_model.py", line 5, in <module>
    from ..hparams import get_infer_args
  File "/home/ubuntu/LLaMA-Factory/src/llmtuner/hparams/__init__.py", line 2, in <module>
    from .evaluation_args import EvaluationArguments
  File "/home/ubuntu/LLaMA-Factory/src/llmtuner/hparams/evaluation_args.py", line 5, in <module>
    from datasets import DownloadMode
ModuleNotFoundError: No module named 'datasets'

Traceback (most recent call last):
  File "/home/ubuntu/LLaMA-Factory/src/train_web.py", line 1, in <module>
    from llmtuner import create_ui
  File "/home/ubuntu/LLaMA-Factory/src/llmtuner/__init__.py", line 3, in <module>
    from .api import create_app
  File "/home/ubuntu/LLaMA-Factory/src/llmtuner/api/__init__.py", line 1, in <module>
    from .app import create_app
  File "/home/ubuntu/LLaMA-Factory/src/llmtuner/api/app.py", line 8, in <module>
    from ..chat import ChatModel
  File "/home/ubuntu/LLaMA-Factory/src/llmtuner/chat/__init__.py", line 2, in <module>
    from .chat_model import ChatModel
  File "/home/ubuntu/LLaMA-Factory/src/llmtuner/chat/chat_model.py", line 5, in <module>
    from ..hparams import get_infer_args
  File "/home/ubuntu/LLaMA-Factory/src/llmtuner/hparams/__init__.py", line 6, in <module>
    from .parser import get_eval_args, get_infer_args, get_train_args
  File "/home/ubuntu/LLaMA-Factory/src/llmtuner/hparams/parser.py", line 7, in <module>
    import transformers
ModuleNotFoundError: No module named 'transformers'

# 还有几个报错，不黏贴了。以下是需要补安装的包：
pip3 install pydantic datasets torch transformers peft 
pip install trl>=0.8.1 
pip install gradio==4.21.0
# 训练时候需要
pip install SentencePiece

安装完成缺少的包之后，就可以启动成功了。

启动效果：

$ python src/train_web.py
Running on local URL:  http://0.0.0.0:7860

To create a public link, set `share=True` in `launch()`.
IMPORTANT: You are using gradio version 4.21.0, however version 4.29.0 is available, please upgrade.
--------

界面终于运行成功啦，选择qwen 0.5 测试下

在这里插入图片描述

可以选择中文，进行参数修改。然后运行训练啦。库少了几个，可以向 compshare 反馈下。


okenizer_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 1.29k/1.29k [00:00<00:00, 13.4MB/s]
vocab.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 2.78M/2.78M [00:01<00:00, 2.49MB/s]
merges.txt: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.67M/1.67M [00:01<00:00, 1.65MB/s]
tokenizer.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 7.03M/7.03M [00:01<00:00, 4.67MB/s]
[INFO|tokenization_utils_base.py:2108] 2024-06-05 22:37:37,620 >> loading file vocab.json from cache at /home/ubuntu/.cache/huggingface/hub/models--Qwen--Qwen1.5-0.5B/snapshots/8f445e3628f3500ee69f24e1303c9f10f5342a39/vocab.json
[INFO|tokenization_utils_base.py:2108] 2024-06-05 22:37:37,620 >> loading file merges.txt from cache at /home/ubuntu/.cache/huggingface/hub/models--Qwen--Qwen1.5-0.5B/snapshots/8f445e3628f3500ee69f24e1303c9f10f5342a39/merges.txt
[INFO|tokenization_utils_base.py:2108] 2024-06-05 22:37:37,620 >> loading file added_tokens.json from cache at None
[INFO|tokenization_utils_base.py:2108] 2024-06-05 22:37:37,620 >> loading file special_tokens_map.json from cache at None
[INFO|tokenization_utils_base.py:2108] 2024-06-05 22:37:37,620 >> loading file tokenizer_config.json from cache at /home/ubuntu/.cache/huggingface/hub/models--Qwen--Qwen1.5-0.5B/snapshots/8f445e3628f3500ee69f24e1303c9f10f5342a39/tokenizer_config.json
[INFO|tokenization_utils_base.py:2108] 2024-06-05 22:37:37,620 >> loading file tokenizer.json from cache at /home/ubuntu/.cache/huggingface/hub/models--Qwen--Qwen1.5-0.5B/snapshots/8f445e3628f3500ee69f24e1303c9f10f5342a39/tokenizer.json
[WARNING|logging.py:314] 2024-06-05 22:37:37,803 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
06/05/2024 22:37:37 - INFO - llmtuner.data.loader

开始训练，然后就是等待了：

在这里插入图片描述训练效果：

显卡使用情况：

 nvidia-smi 
Wed Jun  5 22:49:48 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 4090        Off | 00000000:00:03.0 Off |                  Off |
|  0%   39C    P2              79W / 450W |   1869MiB / 24564MiB |     19%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      1565      C   python                                     1860MiB |
+---------------------------------------------------------------------------------------+