Nvidia Driver and Cuda9.0 Installation

Tested hardware and OS configuration:

  • OS: Ubuntu 16.04 LTS
  • NVIDIA Graphic Card: Quadro M1000M
  • Cuda Version: 9.0
  • Graphic Card Driver Version: 410.xx
  • Disable secure boot in BIOS setting

The recommended way to install the Nvidia driver and Cuda is using .run files since the run files provide flexibility for configuration. You can get rid of the login loop mess-up caused by the bundled opengl libs in Nvidia drivers with care.

Let's start with the dkms(Dynamic Kernel Module Support) package. This is a super useful package when you install drivers. Sometimes your laptop may have the latest hardware that the ubuntu does not support. You have to install the corresponding drivers by building from source codes. The issue is that if the system updates the kernel automatically, you basically lose your manually-installed drivers for the hardware. You have to re-install it. But with dkms package, you don't need to worry about this problem anymore. The rebuild of the modules is handled automatically when a kernel is upgraded.

Preliminary

  1. Install dkms via apt-get
    sudo apt-get install dkms
  2. Install the kernel header with
    sudo apt-get install linux-headers-$(uname -r)
  3. Download the nvidia driver according to your graphic card model from nvidia-website and the corresponding cuda-toolkit from here. They look like something like NVIDIA-Linux-x86_64-xxx.xx.run and cuda_9.0.176_384.81_linux.run. Make them executable by
    chmod +x NVIDIA-Linux-x86_64-410.93.run
    chmod +x cuda_9.0.176_384.81_linux.run
    You may download the four patches for the cuda9.0 as well.
    chmod +x cuda_9.0.176.1_linux.run
    chmod +x cuda_9.0.176.2_linux.run
    chmod +x cuda_9.0.176.3_linux.run
    chmod +x cuda_9.0.176.4_linux.run
  4. Blacklist the nouveau. The nouveau coming with ubuntu systems will affect the installation of nvidia drivers. Blacklist it by
    # create blacklist file for nouveau
    sudo touch /etc/modprobe.d/blacklist-nouveau.conf
    # write the content
    sudo bash -c "echo 'blacklist nouveau
    options nouveau modeset=0' > /etc/modprobe.d/blacklist-nouveau.conf"
    # update the blacklist
    sudo update-initramfs -u
    If you get nothing output in the terminal by lsmod | grep nouveau, you are good to go.
  5. Purge the nvidia driver installed via PPA by
    sudo apt-get purge nvidia*

Installation

  1. Kill the x-server by
    sudo service lightdm stop
    ```
    2. Login to the system from tty by `alt+ctrl+F1`, login with your user name and password.
    3. Navigate to the directory of downloaded `run` files, install graphic driver in headless mode:
    ```bash
    sudo ./NVIDIA-Linux-x86_64-410.93.run -no-opengl-files
    ```
    4. Install cuda9.0
    ```bash
    sudo ./cuda_9.0.176_384.81_linux.run --no-opengl-libs
    During the installation,
  • accept the EULA conditions
  • say NO to installing the nvidia drivers
  • say YES to installing cuda toolkit
  • say YES to installing cuda samples
  • say YES to creating a symbolic link for cuda
  • say NO to rebuilding any Xserver configuration with nvidia
  1. Set the env variables for cuda in ~/.bashrc
    sudo bash -c "echo 'export PATH=/usr/local/cuda/bin:$PATH' >> ~/.bashrc"
    sudo bash -c "echo '/usr/local/cuda/lib64/' > /etc/ld.so.conf.d/cuda.conf"
    source ~/.bashrc
    sudo ldconfig
    NOTE: Be careful with >> and >, the symbol > will overwrite the file. You don't want to overwrite your .bashrc file.
    Reboot or get back to the window mode by:
    sudo service lightdm start
    You should get similar outputs as below by nvidia-smi command:
    Mon Jan  7 21:00:08 2019       
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 410.93 Driver Version: 410.93 CUDA Version: 10.0 |
    |-------------------------------+----------------------+----------------------+
    | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
    | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
    |===============================+======================+======================|
    | 0 Quadro M1000M Off | 00000000:01:00.0 Off | N/A |
    | N/A 51C P0 N/A / N/A | 0MiB / 2004MiB | 0% Default |
    +-------------------------------+----------------------+----------------------+
    +-----------------------------------------------------------------------------+
    | Processes: GPU Memory |
    | GPU PID Type Process name Usage |
    |=============================================================================|
    | No running processes found |
    +-----------------------------------------------------------------------------+
  2. Apply the patches if you need them (optional):
    sudo ./cuda_9.0.176.1_linux.run  # accept and Enter
    sudo ./cuda_9.0.176.2_linux.run # accept and Enter
    sudo ./cuda_9.0.176.3_linux.run # accept and Enter
    sudo ./cuda_9.0.176.4_linux.run # accept and Enter

Cuda Testing with Samples

  • install compiler
    sudo apt-get install -y gcc build-essential
  • compiling
    cd ~/NVIDIA_CUDA-9.0_Samples
    make -j8
  • testing
    bin/x86_64/linux/release/deviceQuery # test 1
    bin/x86_64/linux/release/bandwidthTest # test 2
  • you should get something like this:
    bin/x86_64/linux/release/deviceQuery Starting...
    CUDA Device Query (Runtime API) version (CUDART static linking)
    Detected 1 CUDA Capable device(s)
    Device 0: "Quadro M1000M"
    CUDA Driver Version / Runtime Version 10.0 / 9.0
    CUDA Capability Major/Minor version number: 5.0
    Total amount of global memory: 2004 MBytes (2101870592 bytes)
    ( 4) Multiprocessors, (128) CUDA Cores/MP: 512 CUDA Cores
    GPU Max Clock rate: 1072 MHz (1.07 GHz)
    Memory Clock rate: 2505 Mhz
    Memory Bus Width: 128-bit
    L2 Cache Size: 2097152 bytes
    Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
    Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
    Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
    Total amount of constant memory: 65536 bytes
    Total amount of shared memory per block: 49152 bytes
    Total number of registers available per block: 65536
    Warp size: 32
    Maximum number of threads per multiprocessor: 2048
    Maximum number of threads per block: 1024
    Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
    Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
    Maximum memory pitch: 2147483647 bytes
    Texture alignment: 512 bytes
    Concurrent copy and kernel execution: Yes with 1 copy engine(s)
    Run time limit on kernels: No
    Integrated GPU sharing Host Memory: No
    Support host page-locked memory mapping: Yes
    Alignment requirement for Surfaces: Yes
    Device has ECC support: Disabled
    Device supports Unified Addressing (UVA): Yes
    Supports Cooperative Kernel Launch: No
    Supports MultiDevice Co-op Kernel Launch: No
    Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
    Compute Mode:
    < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
    deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.0, CUDA Runtime Version = 9.0, NumDevs = 1
    Result = PASS

Troubleshooting

If you get stuck in the login loop by accidently installing nvidia driver with the bundled opengl library, uninstall the nvidia driver and cuda library by

sudo /usr/bin/nvidia-uninstall
sudo /usr/local/cuda-9.0/bin/uninstall_cuda_9.0.pl

and then reboot. You are back to normal.

Install cuDNN

Pretty easy!

  • Join the NVIDIA Developer Program and get the permission to download the cuDNN.
  • Download the cuDNN from here, select the deb version that matches cuda9.0:
    • cuDNN Runtime Library for Ubuntu16.04 (Deb): libcudnn7_7.4.2.24-1+cuda9.0_amd64.deb
    • cuDNN Developer Library for Ubuntu16.04 (Deb): libcudnn7-dev_7.4.2.24-1+cuda9.0_amd64.deb
    • cuDNN Code Samples and User Guide for Ubuntu16.04 (Deb): libcudnn7-doc_7.4.2.24-1+cuda9.0_amd64.deb
  • Install cuDNN by
    sudo dpkg -i libcudnn7_7.4.2.24-1+cuda9.0_amd64.deb
    sudo dpkg -i libcudnn7-dev_7.4.2.24-1+cuda9.0_amd64.deb
    sudo dpkg -i libcudnn7-doc_7.4.2.24-1+cuda9.0_amd64.deb
  • Testing
    cd /usr/src/cudnn_samples_v7/mnistCUDNN
    sudo make -j8
    ./mnistCUDNN
    You should get something like this if installed successfully
    cudnnGetVersion() : 7402 , CUDNN_VERSION from cudnn.h : 7402 (7.4.2)
    Host compiler version : GCC 5.5.0
    There are 1 CUDA capable devices on your machine :
    device 0 : sms 4 Capabilities 5.0, SmClock 1071.5 Mhz, MemSize (Mb) 2004, MemClock 2505.0 Mhz, Ecc=0, boardGroupID=0
    Using device 0
    Testing single precision
    Loading image data/one_28x28.pgm
    Performing forward propagation ...
    Testing cudnnGetConvolutionForwardAlgorithm ...
    Fastest algorithm is Algo 1
    Testing cudnnFindConvolutionForwardAlgorithm ...
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.029920 time requiring 0 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 1.930336 time requiring 2057744 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 2.276768 time requiring 3464 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 2.370656 time requiring 203008 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 17.794369 time requiring 57600 memory
    Resulting weights from Softmax:
    0.0000000 0.9999399 0.0000000 0.0000000 0.0000561 0.0000000 0.0000012 0.0000017 0.0000010 0.0000000
    Loading image data/three_28x28.pgm
    Performing forward propagation ...
    Resulting weights from Softmax:
    0.0000000 0.0000000 0.0000000 0.9999288 0.0000000 0.0000711 0.0000000 0.0000000 0.0000000 0.0000000
    Loading image data/five_28x28.pgm
    Performing forward propagation ...
    Resulting weights from Softmax:
    0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 0.9999820 0.0000154 0.0000000 0.0000012 0.0000006
    Result of classification: 1 3 5
    Test passed!
    Testing half precision (math in single precision)
    Loading image data/one_28x28.pgm
    Performing forward propagation ...
    Testing cudnnGetConvolutionForwardAlgorithm ...
    Fastest algorithm is Algo 1
    Testing cudnnFindConvolutionForwardAlgorithm ...
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.025600 time requiring 0 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.033376 time requiring 3464 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.170336 time requiring 207360 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.340768 time requiring 2057744 memory
    ^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.551552 time requiring 203008 memory
    Resulting weights from Softmax:
    0.0000001 1.0000000 0.0000001 0.0000000 0.0000563 0.0000001 0.0000012 0.0000017 0.0000010 0.0000001
    Loading image data/three_28x28.pgm
    Performing forward propagation ...
    Resulting weights from Softmax:
    0.0000000 0.0000000 0.0000000 1.0000000 0.0000000 0.0000714 0.0000000 0.0000000 0.0000000 0.0000000
    Loading image data/five_28x28.pgm
    Performing forward propagation ...
    Resulting weights from Softmax:
    0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 1.0000000 0.0000154 0.0000000 0.0000012 0.0000006
    Result of classification: 1 3 5
    Test passed!

Install tensorflow-gpu

Take the python3 from the system as an example:

sudo pip3 install --upgrade tensorflow-gpu

Test

python3
>>> import tensorflow as tf
>>> sess = \
tf.Session(config=tf.ConfigProto(log_device_placement=True))

The output should be something like this:

2019-01-07 22:54:32.219774: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-01-07 22:54:32.697484: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-01-07 22:54:32.697992: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: Quadro M1000M major: 5 minor: 0 memoryClockRate(GHz): 1.0715
pciBusID: 0000:01:00.0
totalMemory: 1.96GiB freeMemory: 1.92GiB
2019-01-07 22:54:32.698012: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-01-07 22:58:26.675975: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-01-07 22:58:26.676021: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2019-01-07 22:58:26.676036: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2019-01-07 22:58:26.676508: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1683 MB memory) -> physical GPU (device: 0, name: Quadro M1000M, pci bus id: 0000:01:00.0, compute capability: 5.0)
Device mapping:
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Quadro M1000M, pci bus id: 0000:01:00.0, compute capability: 5.0
2019-01-07 22:58:26.678076: I tensorflow/core/common_runtime/direct_session.cc:307] Device mapping:
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Quadro M1000M, pci bus id: 0000:01:00.0, compute capability: 5.0