本文介绍在Ubuntu Server 20.04.1操作系统下,为NVIDIA最新显卡RTX 3090进行Tensorflow开发环境的配置及编译安装。
编译采用的CUDA版本为11.1,cuDNN版本为8.0.5,Tensorflow版本为v2.3.0。
1. 系统安装配置
1.1 系统安装
安装Ubuntu Server 20.04.1,安装过程略。
1.2 软件配置
1.2.1 更新软件
sudo apt update
sudo apt upgrade
1.2.2 安装ssh和系统工具
sudo apt install net-tools ssh tree zip
1.2.3 安装编译相关工具
sudo apt install vim git
sudo apt install clang cmake gcc g++ make
2. 显卡驱动安装
2.1 nouveau驱动屏蔽
使用命令 lsmod |grep nouveau,查看是否在使用nouveau驱动,输出如下:
nouveau 1949696 0
mxm_wmi 16384 1 nouveau
wmi 32768 2 mxm_wmi,nouveau
video 49152 1 nouveau
ttm 106496 2 drm_vram_helper,nouveau
drm_kms_helper 184320 4 ast,nouveau
i2c_algo_bit 16384 3 igb,ast,nouveau
drm 491520 6 drm_kms_helper,drm_vram_helper,ast,ttm,nouveau
使用如下命令,屏蔽nouveau驱动,并重启机器。
sudo -s
echo -e "blacklist nouveau\noptions nouveau modeset=0" > /etc/modprobe.d/blacklist-nouveau.conf
update-initramfs -u
reboot
重启机器后,再次使用lsmod |grep nouveau命令查看,确认nouveau驱动已被屏蔽。
2.2 nvidia驱动安装
2.2.1 下载驱动
wget -c "https://us.download.nvidia.com/XFree86/Linux-x86_64/455.23.04/NVIDIA-Linux-x86_64-455.23.04.run"
2.2.2 安装驱动
chmod a+x NVIDIA-Linux-x86_64-455.23.04.run
sudo ./NVIDIA-Linux-x86_64-455.23.04.run
安装过程中会出现如下选择或提示信息,依次如图中选择即可。
2.2.3 查看驱动是否安装成功
使用nvidia-smi命令查看驱动是否安装成功,输出如下:
Tue Nov 24 06:20:33 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.23.04 Driver Version: 455.23.04 CUDA Version: 11.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce RTX 3090 Off | 00000000:24:00.0 Off | N/A |
| 30% 32C P0 101W / 350W | 0MiB / 24268MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 GeForce RTX 3090 Off | 00000000:41:00.0 Off | N/A |
| 30% 33C P0 105W / 350W | 0MiB / 24268MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 GeForce RTX 3090 Off | 00000000:81:00.0 Off | N/A |
| 30% 32C P0 103W / 350W | 0MiB / 24268MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 GeForce RTX 3090 Off | 00000000:E1:00.0 Off | N/A |
| 30% 32C P0 111W / 350W | 0MiB / 24268MiB | 52% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
3. CUDA安装配置
3.1 下载cuda
wget -c "https://developer.download.nvidia.com/compute/cuda/11.1.0/local_installers/cuda_11.1.0_455.23.05_linux.run"
说明:wget下载可能会比较慢,可以先用迅雷等工具下载完成后,再用scp命令上传到服务器上。
3.2 安装cuda
chmod a+x cuda_11.1.0_455.23.05_linux.run
sudo ./cuda_11.1.0_455.23.05_linux.run
安装参数调整可以参考3.3,安装成功会出现如下提示。
===========
= Summary =
===========
Driver: Not Selected
Toolkit: Installed in /usr/local/cuda-11.1/
Samples: Not Selected
Please make sure that
- PATH includes /usr/local/cuda-11.1/bin
- LD_LIBRARY_PATH includes /usr/local/cuda-11.1/lib64, or, add /usr/local/cuda-11.1/lib64 to /etc/ld.so.conf and run ldconfig as root
To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-11.1/bin
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least .00 is required for CUDA 11.1 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
sudo <CudaInstaller>.run --silent --driver
Logfile is /var/log/cuda-installer.log
3.3 安装参数调整
- 接受协议,accept。
- 通过空格键和上下左右键,取消除
CUDA Toolkit 11.1之外的其他安装项。 - 调整
Options中Toolkit Options,将其内的全部项取消勾选。
说明:如果读者想快速安装,可以在接受协议之后,仅取消Driver项即可,其他选用默认。
3.4 添加环境变量
3.4.1 打开配置文件
使用vim打开bashrc文件。
vim ~/.bashrc
3.4.2 写入环境配置
在bashrc文件末尾追加如下内容。
# cuda 11.1
export PATH="/usr/local/cuda-11.1/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda-11.1/lib64:$LD_LIBRARY_PATH"
3.4.3 生效环境配置
source ~/.bashrc
3.4.4 测试环境变量
使用nvcc -V命令测试环境变量是否配置成功,成功则输出如下内容。
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Tue_Sep_15_19:10:02_PDT_2020
Cuda compilation tools, release 11.1, V11.1.74
Build cuda_11.1.TC455_06.29069683_0
3.5 手动建立链接
这里为了方便后期安装其他cuda版本,采用手动建立软链接,命令如下:
sudo ln -s /usr/local/cuda-11.1 /usr/local/cuda
4. cuDNN安装配置
需要到官网注册并下载对应版本,这里使用的版本是8.0.5,安装包为cudnn-11.1-linux-x64-v8.0.5.39.tgz。下载过程可参考下图。
4.1 解压
tar -zxvf cudnn-11.1-linux-x64-v8.0.5.39.tgz
解压为当前目录的cuda文件夹,可以使用tree cuda查看解压文件夹目录结构,其目录结构如下:
cuda
├── include
│ ├── cudnn_adv_infer.h
│ ├── cudnn_adv_train.h
│ ├── cudnn_backend.h
│ ├── cudnn_cnn_infer.h
│ ├── cudnn_cnn_train.h
│ ├── cudnn.h
│ ├── cudnn_ops_infer.h
│ ├── cudnn_ops_train.h
│ └── cudnn_version.h
├── lib64
│ ├── libcudnn_adv_infer.so -> libcudnn_adv_infer.so.8
│ ├── libcudnn_adv_infer.so.8 -> libcudnn_adv_infer.so.8.0.5
│ ├── libcudnn_adv_infer.so.8.0.5
│ ├── libcudnn_adv_train.so -> libcudnn_adv_train.so.8
│ ├── libcudnn_adv_train.so.8 -> libcudnn_adv_train.so.8.0.5
│ ├── libcudnn_adv_train.so.8.0.5
│ ├── libcudnn_cnn_infer.so -> libcudnn_cnn_infer.so.8
│ ├── libcudnn_cnn_infer.so.8 -> libcudnn_cnn_infer.so.8.0.5
│ ├── libcudnn_cnn_infer.so.8.0.5
│ ├── libcudnn_cnn_train.so -> libcudnn_cnn_train.so.8
│ ├── libcudnn_cnn_train.so.8 -> libcudnn_cnn_train.so.8.0.5
│ ├── libcudnn_cnn_train.so.8.0.5
│ ├── libcudnn_ops_infer.so -> libcudnn_ops_infer.so.8
│ ├── libcudnn_ops_infer.so.8 -> libcudnn_ops_infer.so.8.0.5
│ ├── libcudnn_ops_infer.so.8.0.5
│ ├── libcudnn_ops_train.so -> libcudnn_ops_train.so.8
│ ├── libcudnn_ops_train.so.8 -> libcudnn_ops_train.so.8.0.5
│ ├── libcudnn_ops_train.so.8.0.5
│ ├── libcudnn.so -> libcudnn.so.8
│ ├── libcudnn.so.8 -> libcudnn.so.8.0.5
│ ├── libcudnn.so.8.0.5
│ └── libcudnn_static.a
└── NVIDIA_SLA_cuDNN_Support.txt
2 directories, 32 files
4.2 安装
sudo cp cuda/include/*.h /usr/local/cuda-11.1/include/
sudo cp cuda/lib64/libcudnn.so.8.0.5 /usr/local/cuda-11.1/lib64/
cd /usr/local/cuda-11.1/lib64
sudo ln -s libcudnn.so.8.0.5 libcudnn.so.8
sudo ln -s libcudnn.so.8 libcudnn.so
5. Java安装配置
5.1 下载Java
进入JDK8下载页面,下载jdk-8u271-linux-x64.tar.gz并上传到服务器,需要注册账号才能下载。下载过程简略图示见下图。
5.2 安装Java
解压安装包,移动到/opt目录下,并重命名为java-8-sun。
tar -zxvf jdk-8u271-linux-x64.tar.gz
sudo mv jdk1.8.0_271 /opt/java-8-sun
5.3 添加环境变量
5.3.1 打开配置文件
vim ~/.bashrc
5.3.2 写入环境变量
在.bashrc文件末尾追加写入如下内容,使用wq保存退出。
# JDK8
export JAVA_HOME=/opt/java-8-sun
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH
5.3.3 生效环境变量
source ~/.bashrc
5.3.4 测试环境配置
使用java -version测试环境变量配置是否成功,成功输出如下内容:
java version "1.8.0_271"
Java(TM) SE Runtime Environment (build 1.8.0_271-b09)
Java HotSpot(TM) 64-Bit Server VM (build 25.271-b09, mixed mode)
6. Android环境配置
编译Tensorflow Lite的话需要设置Android环境,读者如果没有该需求,可以直接跳过本节。
在用户目录下,新建Android文件夹及相关文件夹。
mkdir ~/Android
mkdir ~/Android/SDK
mkdir ~/Android/NDK
6.1 安装Android SDK
6.1.1 安装命令行工具
进入SDK目录,下载并解压命令行工具。
cd ~/Android/SDK
wget -c "https://dl.google.com/android/repository/commandlinetools-linux-6858069_latest.zip"
unzip -x commandlinetools-linux-6858069_latest.zip
解压后的文件夹为cmdline-tools,使用tree -L 2 cmdline-tools命令查看文件夹目录结构,其目录结构如下。其中要用到的sdkmanager位于cmdline-tools/bin/目录下。
cmdline-tools/
├── bin
│ ├── apkanalyzer
│ ├── avdmanager
│ ├── lint
│ ├── screenshot2
│ └── sdkmanager
├── lib
│ ├── analytics-library
│ ├── annotations
│ ├── apkanalyzer-classpath.jar
│ ├── apkparser
│ ├── avdmanager-classpath.jar
│ ├── build-system
│ ├── common
│ ├── ddmlib
│ ├── device_validator
│ ├── external
│ ├── layoutlib-api
│ ├── lint
│ ├── lint-classpath.jar
│ ├── misc
│ ├── README
│ ├── repository
│ ├── screenshot2-classpath.jar
│ ├── sdk-common
│ ├── sdklib
│ └── sdkmanager-classpath.jar
├── NOTICE.txt
└── source.properties
6.1.2 认识sdkmanager
sdkmanager的相关介绍可以查看官网说明,这里仅做简单介绍。sdkmanager是一个命令行工具,您可以用它来查看、安装、更新和卸载 Android SDK 的软件包。
- 列出所有已安装和可用安装包,以及可升级的包
这里需要指定
sdk_root参数,尽量采用绝对路径。注意替换下面的用户名。
cmdline-tools/bin/sdkmanager --sdk_root=/home/{user_name}/Android/SDK --list
6.1.3 安装需要的包
6.1.3.1 需要安装的包列表
- build-tools;21.1.2
- build-tools;23.0.1
- build-tools;23.0.2
- build-tools;23.0.3
- build-tools;26.0.2
- build-tools;27.0.1
- build-tools;28.0.3
- cmdline-tools;latest
- emulator
- extras;android;m2repository
- extras;google;m2repository
- patcher;v4
- platform-tools
- platforms;android-21
- platforms;android-23
- platforms;android-26
- platforms;android-28
- sources;android-21
- sources;android-23
- sources;android-26
- sources;android-28
如果是macOS上编译的话,可能需要安装
extras;intel;Hardware_Accelerated_Execution_Manager。
6.1.3.2 安装需要的包
使用如下命令进行安装,同样注意替换sdk_root参数的用户名:
cmdline-tools/bin/sdkmanager --sdk_root=/home/ziipin/Android/SDK "build-tools;21.1.2" "build-tools;23.0.1" "build-tools;23.0.2" \
"build-tools;23.0.3" "build-tools;26.0.2" "build-tools;27.0.1" "build-tools;28.0.3" "cmdline-tools;latest" \
"emulator" "extras;android;m2repository" "extras;google;m2repository" "patcher;v4" "platform-tools" \
"platforms;android-21" "platforms;android-23" "platforms;android-26" "platforms;android-28" \
"sources;android-21" "sources;android-23" "sources;android-26" "sources;android-28"
安装过程中需要同意协议,见下图。
下载过程中会出现如下warning,暂时还不清楚是什么原因导致的,可以使用命令rm -rf emulator-2 platform-tools-2,删除emulator-2和platform-tools-2两个文件夹。
Warning: Package "com.android.repository.impl.generated.v1.RemotePackage@862a7b0d" (emulator) should be installed in
"/home/ziipin/Android/SDK/emulator" but
it already exists.
Installing in "/home/ziipin/Android/SDK/emulator-2" instead.
Warning: Package "com.android.repository.impl.generated.v1.RemotePackage@5f48a0aa" (platform-tools) should be installed in
"/home/ziipin/Android/SDK/platform-tools" but
it already exists.
Installing in "/home/ziipin/Android/SDK/platform-tools-2" instead.
6.1.3.3 检查需要的包是否安装成功
使用如下命令检查要安装的包是否安装成功。
cmdline-tools/latest/bin/sdkmanager --list
如果成功,其输出内容如下:
Installed packages:=====================] 100% Computing updates...
| Path | Version | Description | Location |
| --------------------------- | ------- | --------------------------------------- | ---------------------------- |
| build-tools;21.1.2 | 21.1.2 | Android SDK Build-Tools 21.1.2 | build-tools/21.1.2/ |
| build-tools;23.0.1 | 23.0.1 | Android SDK Build-Tools 23.0.1 | build-tools/23.0.1/ |
| build-tools;23.0.2 | 23.0.2 | Android SDK Build-Tools 23.0.2 | build-tools/23.0.2/ |
| build-tools;23.0.3 | 23.0.3 | Android SDK Build-Tools 23.0.3 | build-tools/23.0.3/ |
| build-tools;26.0.2 | 26.0.2 | Android SDK Build-Tools 26.0.2 | build-tools/26.0.2/ |
| build-tools;27.0.1 | 27.0.1 | Android SDK Build-Tools 27.0.1 | build-tools/27.0.1/ |
| build-tools;28.0.3 | 28.0.3 | Android SDK Build-Tools 28.0.3 | build-tools/28.0.3/ |
| cmdline-tools;latest | 3.0 | Android SDK Command-line Tools (latest) | cmdline-tools/latest/ |
| emulator | 30.2.6 | Android Emulator | emulator/ |
| extras;android;m2repository | 47.0.0 | Android Support Repository | extras/android/m2repository/ |
| extras;google;m2repository | 58 | Google Repository | extras/google/m2repository/ |
| patcher;v4 | 1 | SDK Patch Applier v4 | patcher/v4/ |
| platform-tools | 30.0.5 | Android SDK Platform-Tools | platform-tools/ |
| platforms;android-21 | 2 | Android SDK Platform 21 | platforms/android-21/ |
| platforms;android-23 | 3 | Android SDK Platform 23 | platforms/android-23/ |
| platforms;android-26 | 2 | Android SDK Platform 26 | platforms/android-26/ |
| platforms;android-28 | 6 | Android SDK Platform 28 | platforms/android-28/ |
| sources;android-21 | 1 | Sources for Android 21 | sources/android-21/ |
| sources;android-23 | 1 | Sources for Android 23 | sources/android-23/ |
| sources;android-26 | 1 | Sources for Android 26 | sources/android-26/ |
| sources;android-28 | 1 | Sources for Android 28 | sources/android-28/ |
......
6.2 安装Android NDK
进入之前创建的NDK目录,下载ndk-r16b和ndk-r18b。
这里为了后续扩展性,下载了两个ndk版本,其实只下载ndk-r18b也是可以的。
cd ~/Android/NDK/
wget -c "https://dl.google.com/android/repository/android-ndk-r16b-linux-x86_64.zip"
wget -c "https://dl.google.com/android/repository/android-ndk-r18b-linux-x86_64.zip"
在当前目录解压ndk包,命令如下。
unzip -x android-ndk-r16b-linux-x86_64.zip
unzip -x android-ndk-r18b-linux-x86_64.zip
7. 安装Tensorflow v2.3.0
7.1 下载 TensorFlow 源代码
7.1.1 克隆源代码
从Tensorflow仓库克隆源代码。
git clone https://github.com/tensorflow/tensorflow.git
7.1.2 更新代码
克隆完源代码后,进入源码根目录,拉取最新代码。
cd tensorflow
git pull
7.1.3 查看代码的tag信息
用来确定目标拉取分支,使用q退出。
git tag
7.1.4 切换到目标分支
git checkout v2.3.0
git pull
7.1.5 确认切换到目标分支
git branch -a
同样使用q退出,如果切换正确,输出内容如下:
* (HEAD detached at v2.3.0)
7.1.6 查看bazel支持版本
grep "BAZEL_VERSION" configure.py
输出内容如下:
_TF_CURRENT_BAZEL_VERSION = None
_TF_MIN_BAZEL_VERSION = '3.1.0'
_TF_MAX_BAZEL_VERSION = '3.99.0'
'TF_IGNORE_MAX_BAZEL_VERSION' not in os.environ):
global _TF_CURRENT_BAZEL_VERSION
current_bazel_version = check_bazel_version(_TF_MIN_BAZEL_VERSION,
_TF_MAX_BAZEL_VERSION)
_TF_CURRENT_BAZEL_VERSION = convert_version_to_int(current_bazel_version)
可以分析出支持的最小版本为3.1.0,最大版本为3.99.0,下面编译会采用3.1.0版本。
7.2 安装bazel
7.2.1 下载bazel
wget -c "https://github.com/bazelbuild/bazel/releases/download/3.1.0/bazel-3.1.0-installer-linux-x86_64.sh"
7.2.2 安装bazel
chmod a+x bazel-3.1.0-installer-linux-x86_64.sh
./bazel-3.1.0-installer-linux-x86_64.sh --user
source /home/ziipin/.bazel/bin/bazel-complete.bash
使用
--user参数会将bazel安装在用户目录下。
7.2.3 验证是否安装成功
使用bazel version命令进行验证,如果安装成功,输出内容如下:
Build label: 3.1.0
Build target: bazel-out/k8-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Wed Apr 22 10:32:27 2020 (1587551547)
Build timestamp: 1587551547
Build timestamp as int: 1587551547
如果不成功,可以在.bashrc文件末尾追加内容export PATH="$PATH:$HOME/bin",然后source .bashrc,再重新测试。
7.3 安装python依赖包
7.3.1 安装pip3
sudo apt install python3-dev python3-pip
7.3.2 安装依赖包
这里的依赖包使用--user参数,安装在用户目录。
pip3 install -U --user pip six 'numpy<1.19.0' wheel setuptools mock 'future>=0.17.1' 'gast==0.3.3' typing_extensions
pip3 install -U --user keras_applications --no-deps
pip3 install -U --user keras_preprocessing --no-deps
7.4 编译配置
注意:这里配置过程中选择使用gcc进行编译,笔者设置使用clang总是会出现问题。
在代码根目录,使用./configure进行编译配置。其输出和相关选择如下:
You have bazel 3.1.0 installed.
Please specify the location of python. [Default is /usr/bin/python3]:
Found possible Python library paths:
/usr/local/lib/python3.8/dist-packages
/usr/lib/python3/dist-packages
Please input the desired Python library path to use. Default is [/usr/local/lib/python3.8/dist-packages]
Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: N
No OpenCL SYCL support will be enabled for TensorFlow.
Do you wish to build TensorFlow with ROCm support? [y/N]: N
No ROCm support will be enabled for TensorFlow.
Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.
Do you wish to build TensorFlow with TensorRT support? [y/N]: N
No TensorRT support will be enabled for TensorFlow.
Found CUDA 11.1 in:
/usr/local/cuda-11.1/targets/x86_64-linux/lib
/usr/local/cuda-11.1/targets/x86_64-linux/include
Found cuDNN 8 in:
/usr/local/cuda-11.1/targets/x86_64-linux/lib
/usr/local/cuda-11.1/targets/x86_64-linux/include
Please specify a list of comma-separated CUDA compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus. Each capability can be specified as "x.y" or "compute_xy" to include both virtual and binary GPU code, or as "sm_xy" to only include the binary code.
Please note that each additional compute capability significantly increases your build time and binary size, and that TensorFlow only supports compute capabilities >= 3.5 [Default is: 3.5,7.0]: 8.6
Do you want to use clang as CUDA compiler? [y/N]: N
nvcc will be used as CUDA compiler.
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native -Wno-sign-compare]:
Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: N
Not configuring the WORKSPACE for Android builds.
Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details.
--config=mkl # Build with MKL support.
--config=monolithic # Config for mostly static monolithic build.
--config=ngraph # Build with Intel nGraph support.
--config=numa # Build with NUMA support.
--config=dynamic_kernels # (Experimental) Build kernels into separate shared objects.
--config=v2 # Build TensorFlow 2.x instead of 1.x.
Preconfigured Bazel build configs to DISABLE default on features:
--config=noaws # Disable AWS S3 filesystem support.
--config=nogcp # Disable GCP support.
--config=nohdfs # Disable HDFS support.
--config=nonccl # Disable NVIDIA NCCL support.
Configuration finished
7.5 编译安装
注意:在使用bazel编译过程中,编译的缓存结果会存储在用户目录的.cache/bazel中,如果编译失败,可选择手动删除缓存以避免编译问题。其命令如下:
rm -rf ~/.cache/bazel/
7.5.1 bazel编译
友情提醒一下,编译的过程可能会非常漫长,这里使用64C128T的服务器,编译了大概3个小时。先来个编译中CPU占用一览。
在v2.3.0分支下,默认编译的是v2版本,可以使用如下命令进行编译。
注意:官方文档中
--config=v1参数只是开启V1版本的API和使用,并非独立编译V1版本,详情参考issue。
- CPU支持
bazel build --config=opt //tensorflow/tools/pip_package:build_pip_package
- GPU支持
bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
这里使用的是GPU支持编译命令,编译成功后输出如下内容(部分)。
INFO: From Executing genrule //tensorflow/python/keras/api:keras_python_api_gen:
2020-11-25 22:21:53.458424: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.1
INFO: From Executing genrule //tensorflow/python/keras/api:keras_python_api_gen_compat_v1:
2020-11-25 22:21:53.446344: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.1
INFO: From Executing genrule //tensorflow/python/keras/api:keras_python_api_gen_compat_v2:
2020-11-25 22:21:53.445993: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.1
INFO: From Executing genrule //tensorflow:tf_python_api_gen_v2:
2020-11-25 22:21:53.449584: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.1
Target //tensorflow/tools/pip_package:build_pip_package up-to-date:
bazel-bin/tensorflow/tools/pip_package/build_pip_package
INFO: Elapsed time: 9228.987s, Critical Path: 3555.80s
INFO: 24110 processes: 24110 local.
INFO: Build completed successfully, 35277 total actions
7.5.2 构建软件包
编译完成后,可以使用如下命令进行软件包构建,存储在主目录的tensorflow_pkg文件夹下。
./bazel-bin/tensorflow/tools/pip_package/build_pip_package ~/tensorflow_pkg
7.5.3 安装软件包
使用pip3安装在用户目录。
pip3 install --user ~/tensorflow_pkg/tensorflow-2.3.0-cp38-cp38-linux_x86_64.whl
7.5.4 测试软件包
使用vim minst_test.py命令新建测试文件,写入如下内容。然后使用python3 minst_test.py命令进行测试。
import tensorflow as tf
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test, y_test, verbose=2)
测试过程输出内容如下,可以看到准确度已经接近98%。
2020-11-26 11:16:19.776316: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.1
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
11493376/11490434 [==============================] - 38s 3us/step
2020-11-26 11:17:00.750010: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-11-26 11:17:04.777428: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:24:00.0 name: GeForce RTX 3090 computeCapability: 8.6
coreClock: 1.695GHz coreCount: 82 deviceMemorySize: 23.70GiB deviceMemoryBandwidth: 871.81GiB/s
2020-11-26 11:17:04.779964: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 1 with properties:
pciBusID: 0000:41:00.0 name: GeForce RTX 3090 computeCapability: 8.6
coreClock: 1.695GHz coreCount: 82 deviceMemorySize: 23.70GiB deviceMemoryBandwidth: 871.81GiB/s
2020-11-26 11:17:04.782452: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 2 with properties:
pciBusID: 0000:81:00.0 name: GeForce RTX 3090 computeCapability: 8.6
coreClock: 1.695GHz coreCount: 82 deviceMemorySize: 23.70GiB deviceMemoryBandwidth: 871.81GiB/s
2020-11-26 11:17:04.784922: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 3 with properties:
pciBusID: 0000:e1:00.0 name: GeForce RTX 3090 computeCapability: 8.6
coreClock: 1.695GHz coreCount: 82 deviceMemorySize: 23.70GiB deviceMemoryBandwidth: 871.81GiB/s
2020-11-26 11:17:04.784954: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.1
2020-11-26 11:17:04.789936: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.11
2020-11-26 11:17:04.791692: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-11-26 11:17:04.792036: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-11-26 11:17:04.797416: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.11
2020-11-26 11:17:04.798504: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.11
2020-11-26 11:17:04.798660: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2020-11-26 11:17:04.818450: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0, 1, 2, 3
2020-11-26 11:17:04.869642: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2499775000 Hz
2020-11-26 11:17:04.885947: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x42c5640 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-11-26 11:17:04.886010: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-11-26 11:17:05.581808: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x387ea40 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-11-26 11:17:05.581858: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce RTX 3090, Compute Capability 8.6
2020-11-26 11:17:05.581880: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (1): GeForce RTX 3090, Compute Capability 8.6
2020-11-26 11:17:05.581899: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (2): GeForce RTX 3090, Compute Capability 8.6
2020-11-26 11:17:05.581923: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (3): GeForce RTX 3090, Compute Capability 8.6
2020-11-26 11:17:05.586577: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:24:00.0 name: GeForce RTX 3090 computeCapability: 8.6
coreClock: 1.695GHz coreCount: 82 deviceMemorySize: 23.70GiB deviceMemoryBandwidth: 871.81GiB/s
2020-11-26 11:17:05.590535: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 1 with properties:
pciBusID: 0000:41:00.0 name: GeForce RTX 3090 computeCapability: 8.6
coreClock: 1.695GHz coreCount: 82 deviceMemorySize: 23.70GiB deviceMemoryBandwidth: 871.81GiB/s
2020-11-26 11:17:05.593028: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 2 with properties:
pciBusID: 0000:81:00.0 name: GeForce RTX 3090 computeCapability: 8.6
coreClock: 1.695GHz coreCount: 82 deviceMemorySize: 23.70GiB deviceMemoryBandwidth: 871.81GiB/s
2020-11-26 11:17:05.595505: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 3 with properties:
pciBusID: 0000:e1:00.0 name: GeForce RTX 3090 computeCapability: 8.6
coreClock: 1.695GHz coreCount: 82 deviceMemorySize: 23.70GiB deviceMemoryBandwidth: 871.81GiB/s
2020-11-26 11:17:05.595552: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.1
2020-11-26 11:17:05.595593: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.11
2020-11-26 11:17:05.595616: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-11-26 11:17:05.595637: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-11-26 11:17:05.595659: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.11
2020-11-26 11:17:05.595680: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.11
2020-11-26 11:17:05.595701: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2020-11-26 11:17:05.615147: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0, 1, 2, 3
2020-11-26 11:17:05.615193: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.1
2020-11-26 11:17:09.220986: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-11-26 11:17:09.221102: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263] 0 1 2 3
2020-11-26 11:17:09.221114: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0: N N N N
2020-11-26 11:17:09.221124: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 1: N N N N
2020-11-26 11:17:09.221137: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 2: N N N N
2020-11-26 11:17:09.221202: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 3: N N N N
2020-11-26 11:17:09.234266: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 22430 MB memory) -> physical GPU (device: 0, name: GeForce RTX 3090, pci bus id: 0000:24:00.0, compute capability: 8.6)
2020-11-26 11:17:09.237725: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 22430 MB memory) -> physical GPU (device: 1, name: GeForce RTX 3090, pci bus id: 0000:41:00.0, compute capability: 8.6)
2020-11-26 11:17:09.241007: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 22430 MB memory) -> physical GPU (device: 2, name: GeForce RTX 3090, pci bus id: 0000:81:00.0, compute capability: 8.6)
2020-11-26 11:17:09.244133: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 22430 MB memory) -> physical GPU (device: 3, name: GeForce RTX 3090, pci bus id: 0000:e1:00.0, compute capability: 8.6)
Epoch 1/5
2020-11-26 11:17:10.791021: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.11
1875/1875 [==============================] - 4s 2ms/step - loss: 0.3000 - accuracy: 0.9123
Epoch 2/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.1428 - accuracy: 0.9577
Epoch 3/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.1061 - accuracy: 0.9677
Epoch 4/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.0887 - accuracy: 0.9729
Epoch 5/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.0731 - accuracy: 0.9771
313/313 - 1s - loss: 0.0856 - accuracy: 0.9741
另外,测试过程中的GPU占用如下图。
8 编译问题及解决方案
8.1 No library found under: /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudart.so.11.1
8.1.1 问题描述
遇到问题,问题描述如下:
ERROR: An error occurred during the fetch of repository 'local_config_cuda':
Traceback (most recent call last):
File "/home/ziipin/Codes/tensorflow/third_party/gpus/cuda_configure.bzl", line 1369
_create_local_cuda_repository(<1 more arguments>)
File "/home/ziipin/Codes/tensorflow/third_party/gpus/cuda_configure.bzl", line 1051, in _create_local_cuda_repository
_find_libs(repository_ctx, <2 more arguments>)
File "/home/ziipin/Codes/tensorflow/third_party/gpus/cuda_configure.bzl", line 598, in _find_libs
_check_cuda_libs(repository_ctx, <2 more arguments>)
File "/home/ziipin/Codes/tensorflow/third_party/gpus/cuda_configure.bzl", line 500, in _check_cuda_libs
execute(repository_ctx, <1 more arguments>)
File "/home/ziipin/Codes/tensorflow/third_party/remote_config/common.bzl", line 208, in execute
fail(<1 more arguments>)
Repository command failed
No library found under: /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudart.so.11.1
ERROR: Skipping '//tensorflow/tools/pip_package:build_pip_package': no such package '@local_config_cuda//cuda': Traceback (most recent call last):
File "/home/ziipin/Codes/tensorflow/third_party/gpus/cuda_configure.bzl", line 1369
_create_local_cuda_repository(<1 more arguments>)
File "/home/ziipin/Codes/tensorflow/third_party/gpus/cuda_configure.bzl", line 1051, in _create_local_cuda_repository
_find_libs(repository_ctx, <2 more arguments>)
File "/home/ziipin/Codes/tensorflow/third_party/gpus/cuda_configure.bzl", line 598, in _find_libs
_check_cuda_libs(repository_ctx, <2 more arguments>)
File "/home/ziipin/Codes/tensorflow/third_party/gpus/cuda_configure.bzl", line 500, in _check_cuda_libs
execute(repository_ctx, <1 more arguments>)
File "/home/ziipin/Codes/tensorflow/third_party/remote_config/common.bzl", line 208, in execute
fail(<1 more arguments>)
Repository command failed
No library found under: /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudart.so.11.1
WARNING: Target pattern parsing failed.
ERROR: no such package '@local_config_cuda//cuda': Traceback (most recent call last):
File "/home/ziipin/Codes/tensorflow/third_party/gpus/cuda_configure.bzl", line 1369
_create_local_cuda_repository(<1 more arguments>)
File "/home/ziipin/Codes/tensorflow/third_party/gpus/cuda_configure.bzl", line 1051, in _create_local_cuda_repository
_find_libs(repository_ctx, <2 more arguments>)
File "/home/ziipin/Codes/tensorflow/third_party/gpus/cuda_configure.bzl", line 598, in _find_libs
_check_cuda_libs(repository_ctx, <2 more arguments>)
File "/home/ziipin/Codes/tensorflow/third_party/gpus/cuda_configure.bzl", line 500, in _check_cuda_libs
execute(repository_ctx, <1 more arguments>)
File "/home/ziipin/Codes/tensorflow/third_party/remote_config/common.bzl", line 208, in execute
fail(<1 more arguments>)
Repository command failed
No library found under: /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudart.so.11.1
INFO: Elapsed time: 46.344s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded)
currently loading: tensorflow/tools/pip_package
8.1.2 解决方案
进入源码根目录,修改third_party/gpus/cuda_configure.bzl文件。修改完成后保存,并记得清空编译缓存。
修改第532行下方代码为如下内容:
"cudart": _check_cuda_lib_params(
"cudart",
cpu_value,
cuda_config.config["cuda_library_dir"],
# cuda_config.cuda_version,
"11.0",
static = False,
),
"cudart_static": _check_cuda_lib_params(
"cudart_static",
cpu_value,
cuda_config.config["cuda_library_dir"],
# cuda_config.cuda_version,
"11.0",
static = True,
),
同时为了保险起见,可以对libcudart.so.11.1做软链接,命令如下。
cd /usr/local/cuda-11.1/lib64
sudo ln -s libcudart.so.11.1.74 libcudart.so.11.1
清空编译缓存
rm -rf ~/.cache/bazel/
8.2 /usr/bin/env: 'python': No such file or directory
8.2.1 问题描述
遇到问题,问题描述如下:
ERROR: /home/ziipin/.cache/bazel/_bazel_ziipin/865b49621e687085adec52dd3a822b71/external/llvm-project/llvm/BUILD:45:1: Executing genrule @llvm-project//llvm:config_gen failed (Exit 127)
/usr/bin/env: 'python': No such file or directory
Target //tensorflow/tools/pip_package:build_pip_package failed to build
Use --verbose_failures to see the command lines of failed build steps.
ERROR: /home/ziipin/Codes/tensorflow/tensorflow/lite/toco/python/BUILD:84:1 Executing genrule @llvm-project//llvm:config_gen failed (Exit 127)
INFO: Elapsed time: 196.293s, Critical Path: 0.67s
INFO: 28 processes: 28 local.
FAILED: Build did NOT complete successfully
8.2.2 解决方案
对python做软链接,完成之后记得清空编译缓存。
sudo ln -s /usr/bin/python3.8 /usr/bin/python
rm -rf ~/.cache/bazel/
9. 安装 Tensorflow master
笔者编译时,master对应的版本为v2.5.0,其编译安装过程与v2.3.0基本一致,不同的是其对cuda11.1和cudnn8.0.5的支持更好,基本无需进行修改即可编译安装。
10. 写在最后
以上就是在Ubuntu Server 20.04.1操作系统下,针对NVIDIA最新显卡RTX 3090,使用CUDA11.1和cuDNN8.0.5,编译安装Tensorflow v2.3.0(|master)的过程。
由于编译环境的复杂性,读者可能会遇到其他问题,欢迎沟通交流。
PS:正如在7.5节介绍的,使用--config=v1无法编译TensorFlow 1.x版本,笔者正在修改v1.15.4源码,希望可以尽快完成,这样就可以在RTX3090上愉快的使用TensorFlow 1.x版本。