CUDA在ubuntu多版本切换共存

1,820 阅读3分钟

本文已参与「新人创作礼」活动,一起开启掘金创作之路。

一、参考资料

CUDA Toolkit 官方文档

ubuntu安装多个版本的CUDA并随时切换

Ubuntu多版本CUDA,GCC切换

CUDA多版本共存和实时切换

关于CUDA,cuDNN,TF,CUDA驱动版本兼容问题

二、系统环境

系统:Ubuntu16.04
显卡:GeForce GTX1650,4GB
已安装CUDA:10.2
待安装CUDA:11.0

三、重要说明

  1. 准备工作,参考博客 显卡/cudn/cuDNN相关查询
  2. 尽量保持最新的显卡驱动。
  3. CUDA需要与gcc版本对齐,参考资料: Ubuntu18.04安装cuda10.0 NVIDIA官网版本对齐
  4. 维护多个cuda版本:cuda安装到/usr/local/目录下,可以通过命令切换不同版本。
    lrwxrwxrwx  1 root root    9 9月   4 19:58 cuda -> cuda-11.0/
    drwxr-xr-x 18 root root 4096 9月   4 18:50 cuda-10.2/
    drwxr-xr-x 16 root root 4096 9月   4 19:54 cuda-11.0/
    drwxr-xr-x 17 root root 4096 8月  12 10:40 cuda-8.0/
    drwxr-xr-x 18 root root 4096 9月   4 14:33 cuda-9.1/
    drwxr-xr-x 18 root root 4096 9月   4 16:17 cuda-9.2/
    
  5. TF版本别用太新的:使用pip install tensorflow-gpu=1.x.0安装。
  6. 遇到问题不要无脑google:先自行分析原因,尝试办法,然后再google。

四、关键步骤

  1. 安装依赖

    sudo apt-get install freeglut3-dev build-essential libx11-dev libxmu-dev libxi-dev libgl1-mesa-glx libglu1-mesa libglu1-mesa-dev 
    
  2. 安装cuda,参考下文 ==安装过程选择选项==

    sudo sh cuda_11.0.2_450.51.05_linux.run
    
  3. CUDA安装成功

    ===========
    = Summary =
    ===========
    
    Driver:   Not Selected
    Toolkit:  Installed in /usr/local/cuda-11.0/
    Samples:  Installed in /home/yichao/
    
    Please make sure that
     -   PATH includes /usr/local/cuda-11.0/bin
     -   LD_LIBRARY_PATH includes /usr/local/cuda-11.0/lib64, or, add /usr/local/cuda-11.0/lib64 to /etc/ld.so.conf and run ldconfig as root
    
    To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-11.0/bin
    
    Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-11.0/doc/pdf for detailed information on setting up CUDA.
    ***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least .00 is required for CUDA 11.0 functionality to work.
    To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
        sudo <CudaInstaller>.run --silent --driver
    
    Logfile is /var/log/cuda-installer.log
    
  4. 安装cuDNN 参考资料 CUDA、CUDNN在Ubuntu下的安装及配置

  5. 配置CUDA相关环境变量

    Tensorflow官方安装例程要求注意的是:配置PATH和LD_LIBRARY_PATH和CUDA_HOME环境变量.
    
    #修改配置文件
    sudo gedit ~/.bashrc 
    
    #在文件结尾处添加
    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64
    export PATH=$PATH:/usr/local/cuda/bin
    export CUDA_HOME=$CUDA_HOME:/usr/local/cuda
    
    # 更新配置
    source ~/.bashrc
    
  6. cuda多个版本的切换

    # 查看当前cuda软链接,显示当前CUDA版本为10.0
    ls -lh /usr/local
    
    # 删除之前创建的 cuda 软链接
    sudo rm -rf /usr/local/cuda
    
    # 创建新 cuda 软链接
    sudo ln -s /usr/local/cuda-11.0 /usr/local/cuda
    
    # 查看当前cuda软链接,显示当前CUDA版本为11.0
    ls -lh /usr/local
    

安装过程选择选项

  1. 存在驱动,是否删除之前的驱动继续下面的操作?

    Existing package manager installation of the driver found. It is strongly recommended that you remove this before continuing.           Abort
    Continue
    

    选择 [Continue],回车

  2. 是否接受协议

    Do you accept the above EULA? (accept/decline/quit): 
    accept
    

    选择 [accept],回车

  3. 选择安装选项

    CUDA Installer                               
     - [ ] Driver                               
          [ ] 450.51.05                         
     + [X] CUDA Toolkit 11.0                     
       [X] CUDA Samples 11.0                     
       [X] CUDA Demo Suite 11.0                 
       [X] CUDA Documentation 11.0               
       Options                                   
       Install
    

    不选驱动,选择 [Install],回车

    Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 387.26?
    (y)es/(n)o/(q)uit:
    no
    
  4. 是否创建软链接

    A symlink already exists at /usr/local/cuda. Update to this installation?
    Yes
    No 
    
    # 首次安装,选Yes,安装额外的版本,选No
    

    选择 [No],回车