LLM模型下载

538 阅读3分钟

LLaMa2开源基模型申请、下载

1 申请模型

访问meta 申请模型

注意地区限制,选择HK或其他国家;组织可以填no organization。

2 下载模型

提交申请后一般几分钟就会通过,邮件或者meta官网会显示下载指导。

image.png

访问llama github项目
git clone github.com/facebookres…

设置+x权限
chmod +x download.sh

执行download.sh,按照提示操作,填入下载指导中的URL

(liyan_gptq_p38) root@ubuntu:/home/liyan/llm_datas/models/llama2-7b/llama# ls
CODE_OF_CONDUCT.md  download.sh                 example_text_completion.py  llama          README.md         Responsible-Use-Guide.pdf  UPDATES.md
CONTRIBUTING.md     example_chat_completion.py  LICENSE                     MODEL_CARD.md  requirements.txt  setup.py                   USE_POLICY.md
(liyan_gptq_p38) root@ubuntu:/home/liyan/llm_datas/models/llama2-7b/llama# ./download.sh
Enter the URL from email: https://download.llamameta.net/*?XXXXXX

Enter the list of models to download without spaces (7B,13B,70B,7B-chat,13B-chat,70B-chat), or press Enter for all: 7B
Downloading LICENSE and Acceptable Usage Policy
--2024-05-30 21:39:58--  https://download.llamameta.net/LICENSE?Policy=XXXXXX
Resolving proxy.huawei.com (proxy.huawei.com)... 172.19.90.131
Connecting to proxy.huawei.com (proxy.huawei.com)|172.19.90.131|:8080... connected.
ERROR: cannot verify download.llamameta.net's certificate, issued by ‘CN=Huawei Web Secure Internet Gateway CA,OU=IT,O=Huawei,L=Shenzhen,ST=GuangDong,C=cn’:
  Self-signed certificate encountered.
To connect to download.llamameta.net insecurely, use `--no-check-certificate'.

根据报错提示,修改download.sh脚本,添加--no-check-certificate参数,重新执行。

快速下载huggingface模型

step1: 安装依赖

pip install -U huggingface_hub

pip install -U hf-transfer

注意:huggingface_hub依赖于 Python>=3.8,hf_transfer 依附并兼容 huggingface-cli。

step2:设置ENV

export HF_ENDPOINT=https://hf-mirror.com

# 开启
export HF_HUB_ENABLE_HF_TRANSFER=1
# 关闭
export HF_HUB_ENABLE_HF_TRANSFER=0

step3: 基本用法

huggingface-cli命令行方式

# 获取相关信息
huggingface-cli download --help
# 下载单个文件(repo_id + 文件名)
huggingface-cli download gpt2 config.json
# 下载整个存储库(repo_id)
huggingface-cli download HuggingFaceH4/zephyr-7b-beta
# 下载多个文件 
huggingface-cli download gpt2 config.json model.safetensors
# 下载dataset/space
huggingface-cli download HuggingFaceH4/ultrachat_200k --repo-type dataset
huggingface-cli download HuggingFaceH4/zephyr-chat --repo-type space
# 下载存储库里单个文件夹下的单个文件
huggingface-cli download stabilityai/stable-diffusion-xl-base-1.0 tokenizer/vocab.json
# 常用命令,保存到本地,--local-dir-use-symlinks 可以设置为True
huggingface-cli download --resume-download --local-dir-use-symlinks False stabilityai/stable-diffusion-xl-base-1.0 tokenizer/vocab.json --local-dir models

下载脚本方式

可以配置内网代理,关闭SSL校验

# coding: utf-8
import os
 
import urllib3
 
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
import requests
from huggingface_hub import configure_http_backend, snapshot_download
 
os.environ["HF_ENDPOINT"] = "https://hf-mirror.com" # 镜像站加速
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"
 
 
def backend_factory() -> requests.Session:
    session = requests.Session()
    proxies = {"http": "https://工号:密码@proxyhk.xxx.com:8080",
               "https": "http://工号:密码@proxyhk.xxxx.com:8080"}
    session.verify = False
    session.trust_env = False
    session.proxies.update(proxies)
    return session
 
 
# Set it as the default session factory
configure_http_backend(backend_factory=backend_factory)
snapshot_download(
    repo_type=None,  # 要下载的类型是模型还是数据集,可选[None, 'model', 'dataset', 'space']
    repo_id="openbmb/MiniCPM-2B-sft-bf16",  # 仓库名
    revision="main",  # 分支名
    local_dir="./model/MiniCPM-2B-sft-bf16",  # 本地存储地址
    token="xxx"  # huggingface登录获取
)
 

实际使用时,发现还会报错[SSL: CERTIFICATE_VERIFY_FAILED]

下载huggingface报错[SSL: CERTIFICATE_VERIFY_FAILED]

执行下载时报错:ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1131) 解决措施:关掉证书校验

/lib/python3.11/site-packages/requests/adapters.py, 在adapters.py中找到 send方法,修改默认verify=False。

<     def send(self, request, stream=False, timeout=None, verify=True,
---
>     def send(self, request, stream=False, timeout=None, verify=False,

同理对sessions.py request()方法,修改默认verify=False。

<             hooks=None, stream=None, verify=None, cert=None, json=None):
---
>             hooks=None, stream=None, verify=False, cert=None, json=None):