I recently had an NVMe drive fault and wanted to check for a firmware update. Western Digital (was SanDisk) in their infinite wisdom only offer firmware for their drives in the form of ... a Windows management application.
How hard can it really be to run that under Linux?
Moderately, it turns out, and that's by using a Windows VM under libvirt-managed kvm to do it.
First I needed VT-d and the host IOMMU enabled:
- ensure VT-d was enabled in UEFI firmware (it was). Check grep -oE 'svm|vmx' /proc/cpuinfo | uniq . If there's output, it's available.
- Run sudo virt-host-validate to check kvm. The line "Checking if IOMMU is enabled by kernel" was marked with a warning.
- Add intel_iommu=on to my kernel command line and reboot
- Re-check virt-host-validate, which no longer complains
Then I needed to unbind my NVMe controller and the parent bus from the host so I could pass it through.
Obviously you can only do this if you're booted off something that won't need the PCI devices you're unbinding. In my case I'm booted from a USB3 HDD.
Identify the bus path to the NVMe device:
$ sudo lspci -t -nnn -vv -k
-[0000:00]-+-00.0 Intel Corporation Device [8086:9b61]
+-02.0 Intel Corporation UHD Graphics [8086:9b41]
+-04.0 Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem [8086:1903]
+-08.0 Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th/8th Gen Core Processor Gaussian Mixture Model [8086:1911]
+-12.0 Intel Corporation Comet Lake Thermal Subsytem [8086:02f9]
+-14.0 Intel Corporation Device [8086:02ed]
+-14.2 Intel Corporation Device [8086:02ef]
+-14.3 Intel Corporation Wireless-AC 9462 [8086:02f0]
+-16.0 Intel Corporation Comet Lake Management Engine Interface [8086:02e0]
+-17.0 Intel Corporation Comet Lake SATA AHCI Controller [8086:02d3]
+-1d.0-[04]----00.0 Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168]
+-1d.4-[07]----00.0 Sandisk Corp Device [15b7:5005]
+-1f.0 Intel Corporation Device [8086:0284]
+-1f.3 Intel Corporation Device [8086:02c8]
+-1f.4 Intel Corporation Device [8086:02a3]
\-1f.5 Intel Corporation Comet Lake SPI (flash) Controller [8086:02a4]
In this case it's device 15b7:5005 .
I originally tried to pass through just that device after unloading the nvme module, but it failed with errors like
vfio-pci 0000:03:00.0: not ready 65535ms after FLR; giving up
so I landed up having to unbind the parent on the bus too.
First, identify the kernel drivers used, bus IDs, and device IDs. In my case I'll be doing the NVMe SSD device 15b7:5005, the parent SATA AHCI controller 8086:02d3, and the PCIe ethernet controller 10ec:8168 that seems to be a child of the SATA controller.
Use lspci -k -d {{deviceid}} to see the kernel driver bound to each device, and any kernel module(s) registered as supporting it, e.g.:
# lspci -k -d 8086:02d3
00:17.0 SATA controller: Intel Corporation Comet Lake SATA AHCI Controller
Subsystem: Lenovo Device 5079
Kernel driver in use: ahci
# lspci -k -d 15b7:5005
07:00.0 Non-Volatile memory controller: Sandisk Corp Device 5005 (rev 01)
Subsystem: Sandisk Corp Device 5005
Kernel driver in use: nvme
Kernel modules: nvme
If it's owned by a module you can often just unload the module to unbind it, e.g.
$ rmmod nvme
but if it's owned by a built-in driver like "ahci" is in my kernel, you can't do that. It doesn't show up in lsmod and cannot be rmmod'd.
Instead you need to use sysfs to unbind it. (You can do this for devices bound to modules too, which is handy if you need the module for something critical for the host OS).
To unbind the ahci driver from the controller on my host, for example, here's what I did:
# ls /sys/module/ahci/drivers/
pci:ahci
# ls "/sys/module/ahci/drivers/pci:ahci/"
0000:00:17.0 bind module new_id remove_id uevent unbind
Note that '0000:00:17.0' matches the bus address we saw in lspci? Cool. Now unbind it:
# echo '0000:00:17.0' > "/sys/module/ahci/drivers/pci:ahci/"
Verify everything's unbound now:
00:17.0 SATA controller: Intel Corporation Comet Lake SATA AHCI Controller
Subsystem: Lenovo Device 507
# lspci -k -d 15b7:5005
07:00.0 Non-Volatile memory controller: Sandisk Corp Device 5005 (rev 01)
Subsystem: Sandisk Corp Device 5005
Kernel modules: nvme
Now bind it into the vfio-pci driver with:
# modprobe vfio-pci ids=8086:02d3,15b7:5005
Nowwith a bit of luck it can be attached to a kvm so it's accessible inside the guest.
I used virt-manager for that because libvirt's semi-documented XML-based interface makes me want to scream. Just the guest, "add hardware", "PCI Device", and pick both the NVMe controller and the parent device. I didn't bother with the Ethernet controller, didn't seem to need it.
Sadly, it still won't work:
[ 2641.079391] vfio-pci 0000:07:00.0: vfio_ecap_init: hiding ecap 0x19@0x300
[ 2641.079395] vfio-pci 0000:07:00.0: vfio_ecap_init: hiding ecap 0x1e@0x900
[ 2641.109860] pcieport 0000:00:1d.4: DPC: containment event, status:0x1f11 source:0x0000
[ 2641.109863] pcieport 0000:00:1d.4: DPC: unmasked uncorrectable error detected
[ 2641.109867] pcieport 0000:00:1d.4: AER: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Receiver ID)
[ 2641.109868] pcieport 0000:00:1d.4: AER: device [8086:02b4] error status/mask=00200000/00010000
[ 2641.109869] pcieport 0000:00:1d.4: AER: [21] ACSViol (First)
[ 2642.319541] vfio-pci 0000:07:00.0: not ready 1023ms after FLR; waiting
[ 2643.407529] vfio-pci 0000:07:00.0: not ready 2047ms after FLR; waiting
[ 2645.519286] vfio-pci 0000:07:00.0: not ready 4095ms after FLR; waiting
[ 2650.063213] vfio-pci 0000:07:00.0: not ready 8191ms after FLR; waiting
[ 2658.767373] vfio-pci 0000:07:00.0: not ready 16383ms after FLR; waiting
[ 2675.663235] vfio-pci 0000:07:00.0: not ready 32767ms after FLR; waiting
[ 2712.526507] vfio-pci 0000:07:00.0: not ready 65535ms after FLR; giving up
[ 2713.766494] pcieport 0000:00:1d.4: AER: device recovery successful
[ 2714.435994] vfio-pci 0000:07:00.0: vfio_bar_restore: reset recovery - restoring BARs
[ 2764.090284] pcieport 0000:00:1d.4: DPC: containment event, status:0x1f15 source:0x0700
[ 2764.090286] pcieport 0000:00:1d.4: DPC: ERR_FATAL detected
[ 2764.254057] pcieport 0000:00:1d.4: AER: device recovery successful
... followed by a hang of the VM. But hey. It was worth a try.
DPC is "Downstream Port Containment" which is supposed to protect the host from failures in PCI devices by isolating them.
Since this later scrambled my host graphics and I had to force a reboot,
At least now you know how to unbind a driver from a device on the host without a reboot and without messing around with module blacklisting etc.
yay?