xemu/hw
Alexey Kardashevskiy ec132efaa8 spapr: Support NVIDIA V100 GPU with NVLink2
NVIDIA V100 GPUs have on-board RAM which is mapped into the host memory
space and accessible as normal RAM via an NVLink bus. The VFIO-PCI driver
implements special regions for such GPUs and emulates an NVLink bridge.
NVLink2-enabled POWER9 CPUs also provide address translation services
which includes an ATS shootdown (ATSD) register exported via the NVLink
bridge device.

This adds a quirk to VFIO to map the GPU memory and create an MR;
the new MR is stored in a PCI device as a QOM link. The sPAPR PCI uses
this to get the MR and map it to the system address space.
Another quirk does the same for ATSD.

This adds additional steps to sPAPR PHB setup:

1. Search for specific GPUs and NPUs, collect findings in
sPAPRPHBState::nvgpus, manage system address space mappings;

2. Add device-specific properties such as "ibm,npu", "ibm,gpu",
"memory-block", "link-speed" to advertise the NVLink2 function to
the guest;

3. Add "mmio-atsd" to vPHB to advertise the ATSD capability;

4. Add new memory blocks (with extra "linux,memory-usable" to prevent
the guest OS from accessing the new memory until it is onlined) and
npuphb# nodes representing an NPU unit for every vPHB as the GPU driver
uses it for link discovery.

This allocates space for GPU RAM and ATSD like we do for MMIOs by
adding 2 new parameters to the phb_placement() hook. Older machine types
set these to zero.

This puts new memory nodes in a separate NUMA node to as the GPU RAM
needs to be configured equally distant from any other node in the system.
Unlike the host setup which assigns numa ids from 255 downwards, this
adds new NUMA nodes after the user configures nodes or from 1 if none
were configured.

This adds requirement similar to EEH - one IOMMU group per vPHB.
The reason for this is that ATSD registers belong to a physical NPU
so they cannot invalidate translations on GPUs attached to another NPU.
It is guaranteed by the host platform as it does not mix NVLink bridges
or GPUs from different NPU in the same IOMMU group. If more than one
IOMMU group is detected on a vPHB, this disables ATSD support for that
vPHB and prints a warning.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[aw: for vfio portions]
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Message-Id: <20190312082103.130561-1-aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-04-26 10:41:23 +10:00
..
9pfs trace-events: Fix attribution of trace points to source 2019-03-22 16:18:07 +00:00
acpi acpi: verify file entries in bios_linker_loader_add_pointer() 2019-04-02 11:49:14 -04:00
adc kconfig: introduce kconfig files 2019-03-07 21:45:53 +01:00
alpha * Kconfig improvements (msi_nonbroken, imply for default PCI devices) 2019-03-28 09:18:53 +00:00
arm trace-events: Fix attribution of trace points to source 2019-03-22 16:18:07 +00:00
audio Revert "audio: fix pc speaker init" 2019-04-01 08:53:40 +02:00
block xen-block: scale sector based quantities correctly 2019-04-04 18:00:07 +01:00
bt kconfig: introduce kconfig files 2019-03-07 21:45:53 +01:00
char * Kconfig improvements (msi_nonbroken, imply for default PCI devices) 2019-03-28 09:18:53 +00:00
core loader-fit: Wean off error_printf() 2019-04-17 21:21:49 +02:00
cpu kconfig: introduce kconfig files 2019-03-07 21:45:53 +01:00
cris cris-softmmu.mak: express dependencies with Kconfig 2019-03-07 21:46:19 +01:00
display * Kconfig improvements (msi_nonbroken, imply for default PCI devices) 2019-03-28 09:18:53 +00:00
dma trace-events: Shorten file names in comments 2019-03-22 16:18:07 +00:00
gpio Pull request 2019-03-25 17:01:10 +00:00
hppa * Kconfig improvements (msi_nonbroken, imply for default PCI devices) 2019-03-28 09:18:53 +00:00
hyperv hyperv: express dependencies with kconfig 2019-03-07 21:45:53 +01:00
i2c trace-events: Shorten file names in comments 2019-03-22 16:18:07 +00:00
i386 hw/i386/pc: Fix crash when hot-plugging nvdimm on older machine types 2019-04-09 18:34:21 +02:00
ide trace-events: Fix attribution of trace points to source 2019-03-22 16:18:07 +00:00
input trace-events: Fix attribution of trace points to source 2019-03-22 16:18:07 +00:00
intc * Kconfig improvements (msi_nonbroken, imply for default PCI devices) 2019-03-28 09:18:53 +00:00
ipack build: convert pci.mak to Kconfig 2019-03-07 21:45:53 +01:00
ipmi ipmi: express dependencies with kconfig 2019-03-07 21:45:53 +01:00
isa * Kconfig improvements (msi_nonbroken, imply for default PCI devices) 2019-03-28 09:18:53 +00:00
lm32 pflash: Clean up after commit 368a354f02, part 2 2019-03-11 22:53:44 +01:00
m68k m68k-softmmu.mak: express dependencies with Kconfig 2019-03-07 21:46:19 +01:00
mem trace-events: Shorten file names in comments 2019-03-22 16:18:07 +00:00
microblaze pflash: Clean up after commit 368a354f02, part 2 2019-03-11 22:53:44 +01:00
mips mips/boston: Report errors with error_report(), not error_printf() 2019-04-17 21:21:49 +02:00
misc * Kconfig improvements (msi_nonbroken, imply for default PCI devices) 2019-03-28 09:18:53 +00:00
moxie moxie-softmmu.mak: express dependencies with Kconfig 2019-03-07 21:46:19 +01:00
net virtio-net: Fix typo in comment 2019-04-02 11:49:14 -04:00
nios2 nios2-softmmu.mak: express dependencies with Kconfig 2019-03-07 21:46:19 +01:00
nvram trace-events: Shorten file names in comments 2019-03-22 16:18:07 +00:00
openrisc or1k-softmmu.mak: express dependencies with Kconfig 2019-03-07 21:46:19 +01:00
pci pci: Report fatal errors with error_report(), not error_printf() 2019-04-17 21:21:49 +02:00
pci-bridge kconfig: add dependencies on CONFIG_MSI_NONBROKEN 2019-03-18 09:39:57 +01:00
pci-host * Kconfig improvements (msi_nonbroken, imply for default PCI devices) 2019-03-28 09:18:53 +00:00
pcmcia kconfig: introduce kconfig files 2019-03-07 21:45:53 +01:00
ppc spapr: Support NVIDIA V100 GPU with NVLink2 2019-04-26 10:41:23 +10:00
rdma * Kconfig improvements (msi_nonbroken, imply for default PCI devices) 2019-03-28 09:18:53 +00:00
riscv riscv: plic: Log guest errors 2019-04-04 16:36:21 -07:00
s390x hw/s390x/3270-ccw: avoid taking address of fields in packed struct 2019-04-03 11:19:57 +02:00
scsi trace-events: Shorten file names in comments 2019-03-22 16:18:07 +00:00
sd trace-events: Delete unused trace points 2019-03-22 16:18:07 +00:00
sh4 hw/sh4/Kconfig: r2d machine requires the rtl8139 network card 2019-03-20 11:44:13 +01:00
smbios kconfig: introduce kconfig files 2019-03-07 21:45:53 +01:00
sparc trace-events: Shorten file names in comments 2019-03-22 16:18:07 +00:00
sparc64 * Kconfig improvements (msi_nonbroken, imply for default PCI devices) 2019-03-28 09:18:53 +00:00
ssi ssi: express dependencies with kconfig 2019-03-07 21:45:53 +01:00
timer hpet: Report warnings with warn_report(), not error_printf() 2019-04-17 21:21:49 +02:00
tpm trace-events: Shorten file names in comments 2019-03-22 16:18:07 +00:00
tricore - qtest fixes 2019-03-08 16:31:34 +00:00
unicore32 unicore32-softmmu.mak: express dependencies with Kconfig 2019-03-07 21:46:19 +01:00
usb usb-mtp: fix bounds check for guest provided filename 2019-04-16 20:43:39 +01:00
vfio spapr: Support NVIDIA V100 GPU with NVLink2 2019-04-26 10:41:23 +10:00
virtio trace-events: Shorten file names in comments 2019-03-22 16:18:07 +00:00
watchdog trace-events: Fix attribution of trace points to source 2019-03-22 16:18:07 +00:00
xen trace-events: Shorten file names in comments 2019-03-22 16:18:07 +00:00
xenpv xen: Replace few mentions of xend by libxl 2019-01-14 13:45:40 +00:00
xtensa hw: Use PFLASH_CFI0{1,2} and TYPE_PFLASH_CFI0{1,2} 2019-03-11 22:53:44 +01:00
Kconfig kconfig: add dependencies on CONFIG_MSI_NONBROKEN 2019-03-18 09:39:57 +01:00
Makefile.objs i2c: express dependencies with Kconfig 2019-03-07 21:45:53 +01:00