centos 7 下通过 conda 安装 cuda pytorch
  TnD0WQEygW8e 2023年11月05日 50 0

先查看自己的linux上显卡型号:

# lspci | grep -i nvidia
04:00.0 VGA compatible controller: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] (rev a1)
04:00.1 Audio device: NVIDIA Corporation GP102 HDMI Audio Controller (rev a1)

查看是否有程序占用(如果存在占用,请停掉该程序)

# lsof | grep nvidia
nvidia-mo   443                 root  cwd       DIR              253,0        254          64 /
nvidia-mo   443                 root  rtd       DIR              253,0        254          64 /
nvidia-mo   443                 root  txt   unknown                                           /proc/443/exe

当然显卡驱动也可以这样安装:(推荐)
sudo yum install nvidia-detect 

nvidia-detect -v 

Probing for supported NVIDIA devices...
[10de:1b06] NVIDIA Corporation GP102 [GeForce GTX 1080 Ti]
This device requires the current 440.64

yum -y install kmod-nvidia

错误:nvidia-x11-drv-390xx conflicts with nvidia-x11-drv-460.39-1.el7_9.elrepo.x86_64
错误:nvidia-x11-drv-390xx conflicts with nvidia-x11-drv-libs-460.39-1.el7_9.elrepo.x86_64
错误:nvidia-x11-drv conflicts with nvidia-x11-drv-390xx-390.138-1.el7_8.elrepo.x86_64
 您可以尝试添加 --skip-broken 选项来解决该问题
** 发现 2 个已存在的 RPM 数据库问题, 'yum check' 输出如下:
dnf-4.0.9.2-1.el7_6.noarch 有缺少的需求 python2-dnf = ('0', '4.0.9.2', '1.el7_6')
orca-3.6.3-4.el7.x86_64 有缺少的需求 pyatspi

卸载冲突的包

yum remove -y nvidia-x11-drv-390xx-390.138-1.el7_8.elrepo.x86_64
yum remove -y nvidia-x11-drv-460.39-1.el7_9.elrepo.x86_64

卸载驱动:
sudo yum remove kmod-nvidia

 

# nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

# nvidia-smi

Failed to initialize NVML: Driver/library version mismatch

http://www.nvidia.cn/Download/Find.aspx?lang=cn

 

centos 7 下通过 conda 安装 cuda pytorch_显卡驱动

 

wget https://us.download.nvidia.com/XFree86/Linux-x86_64/440.64/NVIDIA-Linux-x86_64-440.64.run

 sudo chmod a+x NVIDIA-Linux-x86_64-440.64.run
./NVIDIA-Linux-x86_64-440.64.run

# nvidia-smi

 ERROR: An NVIDIA kernel module 'nvidia-drm' appears to already be loaded in your kernel. 

# sudo systemctl isolate multi-user.target
# sudo modprobe -r nvidia-drm
modprobe: FATAL: Module nvidia_drm is in use.

sudo modprobe -r nvidia-modeset

 

 # lsmod | grep nvidia.drm
nvidia_drm             43547  2
nvidia_modeset       1053327  1 nvidia_drm
drm_kms_helper        186531  1nvidia_drm
drm                   456166  5 drm_kms_helper,nvidia_drm

Run lsmod | grep nvidia.drm and see the numbers to the right of the nvidia_drm module name. The first number is simply the size of the module; the second is the use count.

If the X11 server is running and using the nvidia driver, then the nvidia_drm kernel module will most assuredly be in use. So you'll need, at the very least, switch into text console and shutdown the X11 server. Usually this can be done by stopping whichever X Display Manager service you're using (depends on which desktop environment you're using).

As the error message said, if you are running nvidia-persistenced, you'll need to stop that too before you can unload the nvidia_drm module.

 

 

kill -9 Xvnc

17080 root      20   0  519316 214832  47908 S   6.3  0.1   5421:48 Xvnc

 

 

ps aux | grep nvidia
root       443  0.0  0.0      0     0 ?        S     2020   0:00 [nvidia-modeset]
root      8197  0.0  0.0 112832   984 pts/0    S+   22:01   0:00 grep --color=auto nvidia

 



【版权声明】本文内容来自摩杜云社区用户原创、第三方投稿、转载,内容版权归原作者所有。本网站的目的在于传递更多信息,不拥有版权,亦不承担相应法律责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@moduyun.com

  1. 分享:
最后一次编辑于 2023年11月08日 0

暂无评论

推荐阅读
TnD0WQEygW8e