anaconda安装:
安装前可以先配置conda和pip安装源:linux conda,pip安装源设置_conda 指定源安装_无聊时看看书的博客-CSDN博客
官网下载地址: Free Download | AnacondaLinux:下载.sh文件到本地,运行:sh 文件.sh,注意最后要添加环境变量
windows:下载exe文件安装,按默认配置即可
创建需要的python环境:
conda create -n env_name python=x.x
pip install 安装需要的python三方件:
示例:安装torch 1.12.0 cuda 11.6版本
pip install torch==1.12.0 torchvision torchaudio --extra-index-url /whl/cu116
Linux安装cuda 11.6:
官网 CUDA Toolkit Archive | NVIDIA Developer
选择需要的版本
根据安装环境选择,选择安装方式:
官网会根据选择情况提供安装命令,可直接使用:
wget https://developer./compute/cuda/11.6.0/local_installers/cuda_11.6.0_510.39.01_linux.runsudo sh cuda_11.6.0_510.39.01_linux.run
下面是安装过程:
输入accept
选择需要安装的内容([X]代表安装,[ ]代表不安装)最后选择按install后回车就开始安装了
Windows需要手动下载exe文件包,按照指导手动安装
linux上安装完成后会打印如下内容:
============ Summary ============Driver: InstalledToolkit: Installed in /usr/local/cuda-11.6/Please make sure that- PATH includes /usr/local/cuda-11.6/bin- LD_LIBRARY_PATH includes /usr/local/cuda-11.6/lib64, or, add /usr/local/cuda-11.6/lib64 to /etc/ld.so.conf and run ldconfig as rootTo uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-11.6/binTo uninstall the NVIDIA Driver, run nvidia-uninstallLogfile is /var/log/cuda-installer.log
添加环境变量:
Linux:vim ~/.bashrc 在文件末尾添加下面内容
export CUDA_HOME=/usr/local/cuda export PATH=$PATH:$CUDA_HOME/bin export LD_LIBRARY_PATH=/usr/local/cuda-11.6/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
source ~/.bashrc
windows上:安装完成后会自动添加环境变量,自行查看验证环境变量
使用命令查看安装的CUDA版本:
nvcc -V会返回如下内容:nvcc: NVIDIA (R) Cuda compiler driverCopyright (c) - NVIDIA CorporationBuilt on Fri_Dec_17_18:16:03_PST_Cuda compilation tools, release 11.6, V11.6.55Build cuda_11.6.r11.6/compiler.30794723_0
CUDA 11.6无自带sample测试用例了,可以通过自己的项目来测试安装是否正确
问题
如果安装之后,发现nvcc -V 版本还是旧版,如
或者nvidia-smi报错:Failed to initialize NVML: Driver/library version mismatch
执行如下命令:
sudo apt-get --purge remove "*nvidia*"
cuDNN配置
CUDA与cuDNN配套关系查看:Support Matrix - NVIDIA Docs,CUDA 11.6配的是cuDNN 8.7.0
cuDNN下载,需要注册用户登录才能下载:Log in | NVIDIA Developer/rdp/cudnn-download
linux:cudnn-linux-x86_64-8.7.0.84_cuda11-archive.tar.xz
windows:cudnn-windows-x86_64-8.7.0.84_cuda11-archive.zip
linux解压:
tar -xvf cudnn-linux-x86_64-8.7.0.84_cuda11-archive.tar.xz
解压后进入文件夹:
sudo cp include/cudnn*.h /usr/local/cuda/include/
sudo cp -P lib/libcudnn* /usr/local/cuda/lib64/
chmod a+r /usr/local/cuda/include/cudnn*.h
chmod a+r /usr/local/cuda/lib64/libcudnn*
完成,查看cudnn版本:
cat /usr/local/cuda/include/cudnn_version.h | grep CUDNN_MAJOR -A 2
返回内容如下:
#define CUDNN_MAJOR 8#define CUDNN_MINOR 7#define CUDNN_PATCHLEVEL 0--#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)/* cannot use constexpr here since this is a C-only file */
windows解压后:
找到cuda的安装位置:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7,将cuDNN中bin、include、lib文件夹里的文件全部复制到CUDA安装位置对应同名的文件夹中,即bin->bin、include->include、lib->lib即可。
NCCL 安装
nccl官网: /nccl/nccl-legacy-downloads
进入官网,根据CUDA版本选择nccl版本,以Ubuntu 18.04 cuda11.6版本为例
根据安装步骤,先安装网络,再安装nccl
Network Installer for Ubuntu20.04$ wget https://developer./compute/cuda/repos/ubuntu/x86_64/cuda-keyring_1.0-1_all.deb
$ sudo dpkg -i cuda-keyring_1.0-1_all.deb
$ sudo apt-get update
Network Installer for Ubuntu18.04
$ wget https://developer./compute/cuda/repos/ubuntu1804/x86_64/cuda-keyring_1.0-1_all.deb
$ sudo dpkg -i cuda-keyring_1.0-1_all.deb
$ sudo apt-get update
Network Installer for RedHat/CentOS 8
$ sudo dnf config-manager --add-repo https://developer./compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repo
Network Installer for RedHat/CentOS 7
$ sudo yum-config-manager --add-repo https://developer./compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo
then run the following command to installer NCCL:For Ubuntu: sudo apt install libnccl2=2.12.12-1+cuda11.6 libnccl-dev=2.12.12-1+cuda11.6
For RHEL/Centos: sudo yum install libnccl-2.12.12-1+cuda11.6 libnccl-devel-2.12.12-1+cuda11.6 libnccl-static-2.12.12-1+cuda11.6