Quantcast
Channel: Ubuntu Forums - Virtualisation
Viewing all articles
Browse latest Browse all 4211

[SOLVED] Cuda On Pascal In A KVM VM via VGA Passthrough

$
0
0
I've just purchased a new Pascal video card, in part for deep learning applications. I wanted to be able to use the card in a VM (because this all runs on my one and only server along with lots of other stuff including windows games)
BUT I didn't want the card to attach to my monitor (I've run out of ports on the monitor on my desk), I want to address the vm using Spice as "normal" for a vm

Initially I tried installing from the repositories, but that just resulted in blank screens ... not very helpful, so experimented until i could get it to work

These 2 links were key for me .. plus some other general investigation and experimentation
http://docs.nvidia.com/cuda/cuda-ins...nux/index.html
https://www.pugetsystems.com/labs/hp...ascal-GPU-825/

This assumes you are able to create a VM and pass a video card to it. In my case I created a kubuntu vm using virt-manager (libvirt), WITHOUT passing the video card initially.
I used the q35 motherboard and UEFI bios ... which meant the VM needed to be modified BEFORE the install ( extra check-box on last dialog of virt-manager create)

Then, once the vm was installed, I shut it down and used virsh edit to go in and change the xml.
Add this at the top (replace the existing first line), it allows you to send arguments directly to qemu, which we need because Ubuntu's version of libvirt is just a little too old
Code:

<domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
Then add these lines just before the </domain> tag at the end
Code:

  <qemu:commandline>
    <qemu:arg value='-cpu'/>
    <qemu:arg value='host,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff,kvm=off,hv_vendor_id=some4321text'/>
  </qemu:commandline>

Note that the "hv_" parameters are Windows specific "enlightenments" ... I just add them for consistency when creating vms for nvidia gpu passthrough. You only NEED the "value=....,kvm=off,vendor_id=some4321text" not the "hv_" elements.

Now you can attach the nVidia video card and restart the vm. The above qemu lines hide the hypervisor from the nvidia driver and add some windows performance optimisations (see above).

Once you have a VM, you've ensured the packages are up to date and it's running with the video card attached you'll need to blacklist the nouveau driver by creating a new file at /etc/modprobe.d/blacklist-nouveau.conf with the following contents
Code:

blacklist nouveau                                                                                                                                         
options nouveau modeset=0

The rebuild intiramfs by executing the following
Code:

sudo update-initramfs -u
Now reboot so the nouveau driver is not loaded, and check after reboot with
Code:

lsmod |grep no
Download the latest nVidia driver from here https://www.nvidia.com/Download/index.aspx?lang=en-us. I saved the driver "run" file to ~/Downloads/cudaNOTE that you will want to download the "run" version NOT the deb files !!

FROM THIS POINT FORWARD IT'S BEST TO SHUT DOWN THE WINDOW MANAGER (the gui) SO THERE ARE NO CONFLICTS. If using virt-manager gui there are menu options to send <ctl><alt><f2> etc., so do that to get to a command line and then execute one of the following (depending on the flavour you're running eg. KDE uses SDDM, Unity and Mate use LIGHTDM)
Code:

sudo service sddm stop
sudo service lightdm stop

That should stop the display manager and your gui session so that the nvidia routines have a clear path to install without conflict

Now MANUALLY install nVidia driver. NOTE THAT OPTIONS ARE !!!!!CRITICAL!!!!!, other than the logfile name which you may want to change
Code:

sudo sh NVIDIA-Linux-x86_64-367.44.run --no-opengl-files --log-file-name=~/Downloads/cuda/NVIDIA-driver-install.log --dkms -a
That should work and to have created a new "nvidia" module. There's an extra step though, for some reason the nvidia specific entries in /dev are not always created so you'll need the following code to run at startup (I created a new script and execute it from /etc/rc.local). The script contents (as supplied by nVidia in their cuda installation guide)
Code:

#!/bin/bash

/sbin/modprobe nvidia

if [ "$?" -eq 0 ]; then
  # Count the number of NVIDIA controllers found.
  NVDEVS=`lspci | grep -i NVIDIA`
  N3D=`echo "$NVDEVS" | grep "3D controller" | wc -l`
  NVGA=`echo "$NVDEVS" | grep "VGA compatible controller" | wc -l`

  N=`expr $N3D + $NVGA - 1`
  for i in `seq 0 $N`; do
    mknod -m 666 /dev/nvidia$i c 195 $i
  done

  mknod -m 666 /dev/nvidiactl c 195 255

else
  exit 1
fi

/sbin/modprobe nvidia-uvm

if [ "$?" -eq 0 ]; then
  # Find out the major device number used by the nvidia-uvm driver
  D=`grep nvidia-uvm /proc/devices | awk '{print $1}'`

  mknod -m 666 /dev/nvidia-uvm c $D 0
else
  exit 1
fi

Now reboot and check that
  1. The "nvidia" module is loaded (use lsmod |grep nv)
  2. The "nvidia" module is attached to the video card (use "lspci -vn")
  3. The nvidia devices have been created in /dev (use 'ls /dev/nv*)', you should see 3 nvidia entries
  4. Execute cat /proc/driver/nvidia/version and check the version of the nvidia driver loaded (should be the one you downloaded and installed


If that all worked ok you're ready to install cuda itself

Download the latest cuda (I need the latest because I have a very recent model card) from https://developer.nvidia.com/cuda-toolkit In my case I downloaded V8 beta (the latest at time of writing)

Once again it's probably best to shut the gui session down eg. sudo service lightdm stop) ... probably not necessary, but it's a whole lot better to be "safe" and maximise the probability of success.

Start by installing the dependencies for the nVidia components (thanks to Puget systems for this)
Code:

sudo apt install dkms build-essential ca-certificates-java default-jre default-jre-headless fonts-dejavu-extra freeglut3 freeglut3-dev java-common libatk-wrapper-java libatk-wrapper-java-jni  libdrm-dev libgl1-mesa-dev libglu1-mesa-dev libgnomevfs2-0 libgnomevfs2-common libice-dev libpthread-stubs0-dev libsctp1 libsm-dev libx11-dev libx11-doc libx11-xcb-dev libxau-dev libxcb-dri2-0-dev libxcb-dri3-dev libxcb-glx0-dev libxcb-present-dev libxcb-randr0-dev libxcb-render0-dev libxcb-shape0-dev libxcb-sync-dev libxcb-xfixes0-dev libxcb1-dev libxdamage-dev libxdmcp-dev libxext-dev libxfixes-dev libxi-dev libxmu-dev libxmu-headers libxshmfence-dev libxt-dev libxxf86vm-dev lksctp-tools mesa-common-dev  x11proto-core-dev x11proto-damage-dev  x11proto-dri2-dev x11proto-fixes-dev x11proto-gl-dev x11proto-input-dev x11proto-kb-dev x11proto-xext-dev x11proto-xf86vidmode-dev xorg-sgml-doctools xtrans-dev libgles2-mesa-dev
Now install the toolkit and samples (note that the toolkit must be installed as root)
Code:

sudo sh ~/Downloads/cuda/cuda_8.0.27_linux.run --silent --toolkit --toolkitpath=/usr/local/cuda-8.0 --override  --no-opengl-libs
sh ~/Downloads/cuda/cuda_8.0.27_linux.run --silent --samples --samplespath=~/Downloads/cuda/cuda-8.0_samples --override  --no-opengl-libs

. I found it necessary to change the ownership of the samples with
Code:

sudo sh chown -R <me>:<me> ~/Downloads/cuda
Then we need to add the libraries to ~/.bashrc by adding the lines
Code:

export PATH=/usr/local/cuda-8.0/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64:$LD_LIBRARY_PATH

You might want to reboot again at this point, just be sure (no big deal since its a vm).

After rebooting. Open a new terminal session and
  • check the path (echo $PATH)
  • check that the toolkit installed ok by executing the following which should display toolkit information
    Code:

    nvcc -V


Assuming that worked ok, we can now compile all the samples

First we need to update some peculiarities of the nvidia supplied code (thanks to Puget systems for this code)
Code:

# Fix Host config so GCC doesn't cause errors when compiling
sudo sed -i '/unsupported GNU version/ s/^/\/\//' /usr/local/cuda-8.0/include/host_config.h
# Fix hard coded driver id in Samples
sudo find /usr/local/cuda-8.0/samples -type f -exec sed -i 's/nvidia-3../nvidia-367/g' {} +

Compile the samples (assuming you saved the downloads to ~/Downloads/cuda and used the above commands)
Code:

cd ~/Downloads/cuda/cuda-8.0_samples/NVIDIA_CUDA-8.0_Samples/
make

Once that all completes (it takes a while, there are lots of them), execute
Code:

~/Downloads/cuda/cuda-8.0_samples/NVIDIA_CUDA-8.0_Samples/1_Utilities/deviceQuery/deviceQuery
. You should see a summary of the card like this
Code:

~/Downloads/cuda/cuda-8.0_samples/NVIDIA_CUDA-8.0_Samples/1_Utilities/deviceQuery/deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 1070"
  CUDA Driver Version / Runtime Version          8.0 / 8.0
  CUDA Capability Major/Minor version number:    6.1
  Total amount of global memory:                8112 MBytes (8505589760 bytes)
  (15) Multiprocessors, (128) CUDA Cores/MP:    1920 CUDA Cores
  GPU Max Clock rate:                            1785 MHz (1.78 GHz)
  Memory Clock rate:                            4004 Mhz
  Memory Bus Width:                              256-bit
  L2 Cache Size:                                2097152 bytes
  Maximum Texture Dimension Size (x,y,z)        1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:              65536 bytes
  Total amount of shared memory per block:      49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                    32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:          1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                            512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                    No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:      Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Domain ID / Bus ID / location ID:  0 / 2 / 6
  Compute Mode:
    < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = GeForce GTX 1070
Result = PASS

At this point it's all done ! BE AWARE that the sample routines using opengl will fail

With any luck spice is still the main video driver, and the cuda routines are ready for use.

Viewing all articles
Browse latest Browse all 4211

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>