Python深度学习的cuda管理

29 阅读5分钟

诸神缄默不语-个人CSDN博文目录

本文介绍:

  1. 在Python深度学习代码运行的过程中,如何设置GPU卡号(包括PyTorch和TensorFlow适用的写法),主要适用于单卡场景,以后可能会增加多卡场景。 常见适用场景:在多卡机子上,代码往往默认适用GPU 0,但有时需要使用1、2等其他GPU,因此需要手动设置。
  2. 如何用Linux命令行查看当前cuda占用情况
  3. 正在建设:显存优化

1. 在深度学习中设置GPU卡号

1. CUDA_VISIBLE_DEVICES

设置之后,代码运行时就仅能看到这个被设置的GPU序号。如宏观逻辑号为1的GPU,设置后,代码运行时cuda:0就会直接将逻辑号为0的GPU定位到真实的1卡上。
就可以直接这么写:device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

  1. 在代码中设置(注意需要写在深度学习代码之前):
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '1'
  1. 在代码运行命令之前设置:CUDA_VISIBLE_DEVICES=1 python run.py
    多卡的写法就是:CUDA_VISIBLE_DEVICES=0,3,7 python run.py

2. PyTorch直接转移张量的device

device="cuda:1"(数字就是GPU序号)

一般来说输入就直接把每个张量都to(device)
模型中,已经注册好的张量,可以直接通过将模型实例to(device)就自动实现转换;而模型中未注册的张量(如在forward()等函数中新建的、辅助模型实现更多操作的张量)

2. 用Linux命令行查看当前cuda情况

  1. nvidia-smi:查看GPU运行情况
    输出示例:
Mon Jul 24 12:17:46 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05    Driver Version: 520.61.05    CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0 Off |                  N/A |
| 64%   76C    P2   332W / 350W |   5349MiB / 24576MiB |     69%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  On   | 00000000:25:00.0 Off |                  N/A |
| 76%   66C    P2   309W / 350W |   4775MiB / 24576MiB |     99%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
omit
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A   3323021      C   python                           5346MiB |
|    1   N/A  N/A   3360508      C   python                           4772MiB |
omit
+-----------------------------------------------------------------------------+

2. [peci1/nvidia-htop: A tool for enriching the output of nvidia-smi.] (github.com/peci1/nvidi…
安装方式:pip install nvidia-htop
运行代码:nvidia-htop.py(如果加上-l将不限制运行命令长度(意思是一行能输出多少就输出多少,也不是能完整输出运行命令的意思), -c就能用红-黄-绿指示当前GPU占用量)
输出示例:

Mon Jul 24 12:22:34 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05    Driver Version: 520.61.05    CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0 Off |                  N/A |
| 62%   77C    P2   338W / 350W |   5349MiB / 24576MiB |     99%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  On   | 00000000:25:00.0 Off |                  N/A |
| 70%   62C    P2   312W / 350W |   4775MiB / 24576MiB |     99%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
omit

+-------------------------------------------------------------------------------+
|  GPU     PID     USER    GPU MEM  %CPU  %MEM      TIME  COMMAND               |
|    0 3323021   omit    5346MiB   111   0.5  04:16:09  python -u omit  |
|    1 3360508   omit    4772MiB   115   0.1  03:30:58  python -u omit  |
omit
+-------------------------------------------------------------------------------+

3. XuehaiPan/nvitop: An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.:卡掉了一张,nvidia-smi命令用不了了,这个也还可以用。
输出其实应该是彩色的,但是服务器上卡太多了截图截不全,所以我还是只复制文字了:
1. 查看所有设备的状态:nvitop -1

<!---->

    Thu Aug 03 15:07:47 2023
    ╒═════════════════════════════════════════════════════════════════════════════╕
     NVITOP 1.2.0       Driver Version: 520.61.05      CUDA Driver Version: 11.8 
    ├───────────────────────────────┬──────────────────────┬──────────────────────┤
     GPU  Name        Persistence-M│ Bus-Id        Disp.A  Volatile Uncorr. ECC 
     Fan  Temp  Perf  Pwr:Usage/Cap│         Memory-Usage  GPU-Util  Compute M. 
    ╞═══════════════════════════════╪══════════════════════╪══════════════════════╪══════════════════════════════════════════════════════════════════════════════════════════════════╕
       0  GeForce RTX 3090    On    00000000:01:00.0 Off                   N/A  MEM: ███████████████▏ 17.4%                                                                      
     37%   50C    P2   117W / 350W    4278MiB / 24.00GiB       0%      Default  UTL:  0%                                                                                        
    ├───────────────────────────────┼──────────────────────┼──────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────┤
       1  GeForce RTX 3090    On    00000000:25:00.0 Off                   N/A  MEM: █████████████████████████████████████████▋ 47.9%                                            
     41%   45C    P2   106W / 350W   11760MiB / 24.00GiB       0%      Default  UTL:  0%                                                                                        
    ╘═══════════════════════════════╧══════════════════════╧══════════════════════╧══════════════════════════════════════════════════════════════════════════════════════════════════╛
    [ CPU: █████████████████▌ 13.7%                                                                                            UPTIME: 9.0 days ]  ( Load Average: 34.97 59.87 81.07 )
    [ MEM: ████████████████████████████████▎ 25.2%                                                                               USED: 185.4GiB ]  [ SWP: █▋ 7.3%                    ]

    ╒════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╕
     Processes:                                                                                                                                                     wanghuijuan@zju 
     GPU     PID      USER  GPU-MEM %SM  %CPU  %MEM      TIME  COMMAND                                                                                                              
    ╞════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
       0 4083851 C     user1  4274MiB   0 102.2   0.0  20:47:34  Zombie Process                                                                                                       
    ├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
       1  900475 C     user2 11756MiB   1 103.7   2.4  46:16:48  python run.py                      
    ╘════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╛

4. 如何查看电脑上的GPU是什么型号,可以参考这篇博文:Linux查看Nvidia显卡型号_linux查看显卡型号-CSDN博客

3. 显存优化

  1. 调用gc:
import gc
del obj
gc.collect()
  1. 使用PyTorch时,如果不需要积累任何梯度(一般就是测试时),可以使用with torch.no_grad()(在循环语句里面正常运算即可),可以有效降低梯度占据的内存(梯度可以占相当大的一部分)。
    如果仅不需要积累特定张量的梯度,可以将对应张量的requires_grad属性置False。(这是没有注册的张量的默认属性)
    (注意,仅使用model.eval()不能达到这个效果)
  2. 使用PyTorch清cuda上的缓存:torch.cuda.empty_cache()(官方文档:torch.cuda.empty_cache — PyTorch 1.11.0 documentation,此外可参考【pytorch】torch.cuda.empty_cache()==>释放缓存分配器当前持有的且未占用的缓存显存_马鹏森的博客-CSDN博客_empty_cache
  3. 我还没看但或许可以成为参考资料的内容:
    1. 官方笔记 CUDA semantics - Memory management
    2. 科普帖:深度学习中GPU和显存分析 - 知乎
    3. Transformer性能优化:运算和显存 - 知乎

本文撰写过程中使用的参考资料

  1. 使用CUDA_VISIBLE_DEVICES设置显卡_华科附小第一名的博客-CSDN博客_cuda_visible_devices

在这里插入图片描述