Commit Graph

44 Commits

Author SHA1 Message Date
Alex Williamson
6dcfdbad69 vfio: cleanup includes
Starting to get messy, put the back in alphabetical order.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2013-04-01 13:35:40 -06:00
Alex Williamson
c29029dd88 vfio: Add bootindex support
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2013-04-01 13:35:24 -06:00
Alex Williamson
ba66181828 vfio-pci: Move devices to D0 on reset
Guests may leave devices in a low power state at reboot, but we expect
devices to be woken up for the next boot.  Make this happen.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2013-04-01 13:35:08 -06:00
Alex Williamson
82ca891283 vfio-pci: Add extra debugging
Often when debugging it's useful to be able to disable bypass paths
so no interactions with the device are missed.  Add some extra debug
options to do this.  Also add device info on read/write BAR accesses,
which is useful when debugging more than one assigned device.  A
couple DPRINTFs also had redundant "vfio:" prefixes.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2013-04-01 13:34:56 -06:00
Alex Williamson
7076eabcbf qemu vfio-pci: Graphics device quirks
Graphics cards have a number of different backdoors.  Some of these
are alternative ways to get PCI BAR addresses, some of them are
complete mirrors of PCI config space available through MMIO and
I/O port access.  These quirks cover a number of ATI Radeon and
Nvidia devices.  On the ATI/AMD side, this should enable HD5450
and HD7850 and hopefully a host of devices around those generations.
For Nvidia, my card selection is much more dated.  A 8400gs works
well with both the Window shipped driver and the Nvidia downloaded
driver.  A 7300le works as well, with the caveat that generating
the Window experience index with the Nvidia driver causes the card
to reset several times before generating a BSOD.  An NVS 290 card
seems to run well with the shipped Windows driver, but generates
a BSOD with the Nvidia driver.  All of the Nvidia devices work with
the Linux Nvidia proprietary driver and nouveau, the HD5450 works
with either radeon or fglrx, HD7850 works with vesa and fglrx (not
supported by radeon).  Extremely limited 3D testing.

Device reset is also an issue with graphics.  It's unfortunately
very common that the devices offer no means to reset the card or
doesn't seem effective.  Nvidia devices are pretty good about being
able to get the device to a working state through the VGA BIOS init,
Radeon devices less so, and often require a host reboot.  Work
remains to be done here.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2013-04-01 13:34:40 -06:00
Alex Williamson
f15689c7e4 qemu vfio-pci: Add support for VGA MMIO and I/O port access
Most VGA cards need some kind of quirk to fully operate since they
hide backdoors to get to other registers outside of PCI config space
within the registers, but this provides the base infrastructure.  If
we could identity map PCI resources for assigned devices we would need
a lot fewer quirks.

To enable this, use a kernel side vfio-pci driver that incorporates
VGA support (v3.9), and use the -vga none option and add the x-vga=on
option for the vfio-pci device.  The "x-" denotes this as an
experimental feature.  You may also need to use a cached copy of the
VGA BIOS for your device, passing it to vfio-pci using the romfile=
option.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2013-04-01 13:33:44 -06:00
Alex Williamson
96adc5c7c2 vfio-pci: Add PCIe capability mangling based on bus type
Windows seems to pay particular interest to the PCIe header type of
devices and will fail to load drivers if we attach Endpoint devices or
Legacy Endpoint devices to the Root Complex.  We can use
pci_bus_is_express and pci_bus_is_root to determine the bus type and
mangle the type appropriately:

* Legacy PCI
  * No change, capability is unmodified for compatibility.
* PCI Express
  * Integrated Root Complex Endpoint -> Endpoint
* PCI Express Root Complex
  * Endpoint -> Integrated Root Complex Endpoint
  * Legacy Endpoint -> none, capability hidden

We also take this opportunity to explicitly limit supported devices
to Endpoints, Legacy Endpoints, and Root Complex Integrated Endpoints.
We don't currently have support for other types and users often cause
themselves problems by assigning them.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2013-04-01 11:50:04 -06:00
Alex Williamson
4b5d5e87c7 vfio-pci: Generalize PCI config mangling
Kernel-side vfio virtualizes all of config space, but some parts are
unique to Qemu.  For instance we may or may not expose the ROM BAR,
Qemu manages MSI/MSIX, and Qemu manages the multi-function bit so that
single function devices can appear as multi-function and vica versa.
Generalize this into a bitmap of Qemu emulated bits.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2013-04-01 11:50:04 -06:00
Paolo Bonzini
83c9f4ca79 hw: include hw header files with full paths
Done with this script:

cd hw
for i in `find . -name '*.h' | sed 's/^..//'`; do
  echo '\,^#.*include.*["<]'$i'[">], s,'$i',hw/&,'
done | sed -i -f - `find . -type f`

This is so that paths remain valid as files are moved.

Instead, files in hw/dataplane are referenced with the relative path.
We know they are not going to move to include/, and they are the only
include files that are in subdirectories _and_ move.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-03-01 15:01:17 +01:00
Markus Armbruster
312fd5f290 error: Strip trailing '\n' from error string arguments (again)
Commit 6daf194d and be62a2eb got rid of a bunch, but they keep coming
back.  Tracked down with this Coccinelle semantic patch:

    @r@
	expression err, eno, cls, fmt;
	position p;
    @@
    (
	error_report(fmt, ...)@p
    |
	error_set(err, cls, fmt, ...)@p
    |
	error_set_errno(err, eno, cls, fmt, ...)@p
    |
	error_setg(err, fmt, ...)@p
    |
	error_setg_errno(err, eno, fmt, ...)@p
    )
    @script:python@
	fmt << r.fmt;
	p << r.p;
    @@
    if "\\n" in str(fmt):
	print "%s:%s:%s:%s" % (p[0].file, p[0].line, p[0].column, fmt)

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-id: 1360354939-10994-4-git-send-email-armbru@redhat.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2013-02-11 08:13:19 -06:00
Markus Armbruster
1a9522cc6e error: Clean up abuse of error_report() for help
Use error_printf() instead, so the help gets presented more nicely.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-id: 1360354939-10994-3-git-send-email-armbru@redhat.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2013-02-11 08:13:18 -06:00
Alex Williamson
6a659bbff9 vfio-pci: Enable PCIe extended config space
We don't know pre-init time whether the device we're exposing is PCIe
or legacy PCI.  We could ask for it to be specified via a device
option, but that seems like too much to ask of the user.  Instead we
can assume everything will be PCIe, which makes PCI-core allocate
enough config space.  Removing the flag during init leaves the space
allocated, but allows legacy PCI devices to report the real device
config space size to rest of Qemu.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2013-01-30 01:31:09 +02:00
Alex Williamson
8fc94e5a80 vfio-pci: Loosen sanity checks to allow future features
VFIO_PCI_NUM_REGIONS and VFIO_PCI_NUM_IRQS should never have been
used in this manner as it locks a specific kernel implementation.
Future features may introduce new regions or interrupt entries
(VGA may add legacy ranges, AER might add an IRQ for error
signalling).  Fix this before it gets us into trouble.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Cc: qemu-stable@nongnu.org
2013-01-08 14:10:03 -07:00
Alex Williamson
b0223e29af vfio-pci: Make host MSI-X enable track guest
Guests typically enable MSI-X with all of the vectors in the MSI-X
vector table masked.  Only when the vector is enabled does the vector
get unmasked, resulting in a vector_use callback.  These two points,
enable and unmask, correspond to pci_enable_msix() and request_irq()
for Linux guests.  Some drivers rely on VF/PF or PF/fw communication
channels that expect the physical state of the device to match the
guest visible state of the device.  They don't appreciate lazily
enabling MSI-X on the physical device.

To solve this, enable MSI-X with a single vector when the MSI-X
capability is enabled and immediate disable the vector.  This leaves
the physical device in exactly the same state between host and guest.
Furthermore, the brief gap where we enable vector 0, it fires into
userspace, not KVM, so the guest doesn't get spurious interrupts.
Ideally we could call VFIO_DEVICE_SET_IRQS with the right parameters
to enable MSI-X with zero vectors, but this will currently return an
error as the Linux MSI-X interfaces do not allow it.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Cc: qemu-stable@nongnu.org
2013-01-08 14:09:03 -07:00
Michael S. Tsirkin
bbef882cc1 msi: add API to get notified about pending bit poll
Update all users.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2012-12-26 11:49:28 +02:00
Paolo Bonzini
9c17d615a6 softmmu: move include files to include/sysemu/
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-19 08:32:45 +01:00
Paolo Bonzini
1de7afc984 misc: move include files to include/qemu/
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-19 08:32:39 +01:00
Paolo Bonzini
022c62cbbc exec: move include files to include/exec/
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-19 08:31:31 +01:00
Paolo Bonzini
6f991980a5 Merge commit '1dd3a74d2ee2d873cde0b390b536e45420b3fe05' into HEAD
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-17 18:56:22 +01:00
Michael S. Tsirkin
a2cb15b0dd pci: update all users to look in pci/
update all users so we can remove the makefile hack.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2012-12-17 13:02:26 +02:00
Alex Williamson
d281084d3e vfio-pci: Don't use kvm_irqchip_in_kernel
kvm_irqchip_in_kernel() has an architecture specific meaning, so
we shouldn't be using it to determine whether to enabled KVM INTx
bypass.  kvm_irqfds_enabled() seems most appropriate.  Also use this
to protect our other call to kvm_check_extension() as that explodes
when KVM isn't enabled.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Cc: qemu-stable@nongnu.org
2012-12-10 11:30:03 -07:00
Alex Williamson
a771c51703 vfio-pci: Use common msi_get_message
We can get rid of our local version now that a helper exists.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2012-11-13 12:27:40 -07:00
Alex Williamson
e1d1e5867d vfio-pci: Add KVM INTx acceleration
This makes use of the new level irqfd support enabling bypass of qemu
userspace both on INTx injection and unmask.  This significantly
boosts the performance of devices making use of legacy interrupts (ex.
~60% better netperf TCP_RR scores for an e1000e assigned to a Linux
guest and booted with pci=nomsi).  This also avoids flipping mmaps on
and off to simulate EOIs, so greatly improves performance of device
access in addition to interrupt latency.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2012-11-13 12:27:40 -07:00
Avi Kivity
a8170e5e97 Rename target_phys_addr_t to hwaddr
target_phys_addr_t is unwieldly, violates the C standard (_t suffixes are
reserved) and its purpose doesn't match the name (most target_phys_addr_t
addresses are not target specific).  Replace it with a finger-friendly,
standards conformant hwaddr.

Outstanding patchsets can be fixed up with the command

  git rebase -i --exec 'find -name "*.[ch]"
                        | xargs s/target_phys_addr_t/hwaddr/g' origin

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-10-23 08:58:25 -05:00
Anthony Liguori
248bbe7493 Merge remote-tracking branch 'awilliam/tags/vfio-pci-for-qemu-20121017.0' into staging
* awilliam/tags/vfio-pci-for-qemu-20121017.0:
  vfio-pci: Mark non-migratable
  vfio-pci: Fix debug build
2012-10-22 14:48:23 -05:00
Avi Kivity
f6790af6bc memory: use AddressSpace for MemoryListener filtering
Using the AddressSpace type reduces confusion, as you can't accidentally
supply the MemoryRegion you're interested in.

Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-10-22 14:50:07 +02:00
Alex Williamson
d9f0e63898 vfio-pci: Mark non-migratable
We haven't magically fixed this yet.  Toss in a description too.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2012-10-17 11:20:14 -06:00
Alex Williamson
a011b10e0c vfio-pci: Fix debug build
Stray variable from before MSI-X rework

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2012-10-17 11:20:11 -06:00
Avi Kivity
e71e602cb5 vfio: drop no-op MemoryListener callbacks
Removes quite a bit of useless code.

Signed-off-by: Avi Kivity <avi@redhat.com>
2012-10-15 11:43:07 +02:00
Jan Kiszka
3a4f2816fa vfio-pci: Fix BAR->VFIODevice translation in
DO_UPCAST is supposed to translate from the first member of a struct to
that struct, not from arbitrary ones. And it (usually) breaks the build
when neglecting this rule. Use container_of to fix the build breakage
and likely also the runtime behavior.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
aw: runtime behavior is actually the same, but clearly misuse of DO_UPCAST
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2012-10-08 08:45:31 -06:00
Alex Williamson
1a40313381 vfio-pci: Clang cleanup
Blue Swirl reports that Clang doesn't like the structure we define to
avoid dynamic allocation for a number of calls to VFIO_DEVICE_SET_IRQS.
Adding an element after a variable sized type is a GNU extension.
Switch back to dynamic allocation, which really isn't a problem since
this is only done on interrupt setup changes.

Cc: Blue Swirl <blauwirbel@gmail.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2012-10-08 08:45:31 -06:00
Alex Williamson
ce59af2dba vfio-pci: Cleanup on INTx setup failure
Missing some unwind code.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2012-10-08 08:45:30 -06:00
Alex Williamson
5834a83f48 vfio-pci: Extend reset
Take what we've learned from pci-assign and apply it to vfio-pci.
On reset, disable previous interrupt config, perform a device
reset if available, re-enable INTx, and disable memory regions on
the device to prevent continuing DMA.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2012-10-08 08:45:30 -06:00
Alex Williamson
9b1e45c8f1 vfio-pci: Remove setting of MSI qsize
This was a misinterpretation of the spec, hardware doesn't get to
specify how many were actually enabled through this field.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2012-10-08 08:45:30 -06:00
Alex Williamson
5976cdd58b vfio-pci: Use uintptr_t for void* cast
We don't seem to run into any sign extension problems, but
unsigned looks more correct.

Signed-off-by: Alex williamson <alex.williamson@redhat.com>
2012-10-08 08:45:30 -06:00
Alex Williamson
e43b9a5a4f vfio-pci: Don't peak at msi_supported
Let the init function fail, just don't warn for -ENOTSUP.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2012-10-08 08:45:30 -06:00
Alex Williamson
5c97e5eba6 vfio-pci: Roll the header into the .c file
It's only ~100 lines and nobody else should be using this.
Suggested by Michael Tsirkin.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2012-10-08 08:45:30 -06:00
Alex Williamson
98cd5a5eaf vfio-pci: No spurious MSIs
FreeBSD doesn't like these spurious MSIs, remove them as they're
mostly paranoia anyway.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2012-10-08 08:45:29 -06:00
Alex Williamson
fd704adc47 vfio-pci: Rework MSIX setup/teardown
We try to do lazy initialization of MSIX since we don't actually need
to setup anything until MSIX vectors start getting used.  This leads
to problems if MSIX is enabled, but never used (we can end up trying
to re-enable INTx while it's still enabled).  We also run into
problems trying to expand our reset function to tear down interrupts
as we can then get vector release notifications after we've released
data structures.  By making explicit initialization and teardown we
can avoid both of these problems and behave more similar to bare
metal.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2012-10-08 08:45:29 -06:00
Alex Williamson
12af134487 vfio-pci: Unmap and retry DMA mapping
Occasionally we get regions added that overlap with existing mappings.
These always seems to be in the VGA ROM range.  VFIO returns EBUSY
for these mapping attempts.  We can try a little harder and assume
that the latest mapping is correct by removing any overlapping ranges
and retrying the original request.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2012-10-08 08:45:29 -06:00
Alex Williamson
af6bc27e39 vfio-pci: Re-order map/unmap
This cleans up the next patch that calls unmap from map.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2012-10-08 08:45:29 -06:00
Alex Williamson
ea486926b0 vfio-pci: Update slow path INTx algorithm
We can't afford the overhead of switching out and back into mmap mode
around each interrupt, but we can do it lazily via a timer.  On INTx
interrupt, disable the mmap'd memory regions and set a timer.  On
every interrupt, push the timer out.  If the timer expires and the
interrupt is no longer pending, switch back to mmap mode.

This has the benefit that things like graphics cards, which rarely or
never, fire an interrupt don't need manual user intervention to add
the x-intx=off parameter.  They'll just remain in mmap mode until they
trigger an interrupt, and if they don't continue to regularly fire
interrupts, they'll switch back.

The default timeout is tuned for network cards so that a ping is just
enough to keep them in non-mmap mode, where they have much better
latency.  It is tunable with an experimental option,
x-intx-mmap-timeout-ms.  A value of 0 keeps the device in non-mmap
mode after the first interrupt.

It's possible we could look at the class code of devices and come up
with reasonable per-class defaults based on expected interrupt
frequency and latency.  None of this is used for MSI interrupts and
also won't be used if we can bypass through KVM.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2012-10-08 08:45:29 -06:00
Anthony Liguori
0f41dc182c vfio_pci: fix build on 32-bit systems
We cannot cast directly from pointer to uint64.

Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Alex Barcelo <abarcelo@ac.upc.edu>
Reported-by: Alex Barcelo <abarcelo@ac.upc.edu>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-10-01 13:40:15 -05:00
Alex Williamson
65501a745d vfio: vfio-pci device assignment driver
This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config.  Load the vfio-pci
module.  To assign device 0000:05:00.0 to a guest, do the following:

for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
    vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
    device=$(cat /sys/bus/pci/devices/$dev/device)
    if [ -e /sys/bus/pci/devices/$dev/driver ]; then
        echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
    fi
    echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done

See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.

Then launch qemu including the option:

-device vfio-pci,host=0000:05:00.0

Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt.  It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures.  The side-effect is a significant
performance slow-down for device in INTx mode.  Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether.  This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-10-01 08:04:23 -05:00