Commit Graph

46688 Commits

Author SHA1 Message Date
Bharata B Rao
6f4b5c3ec5 spapr: CPU hot unplug support
Remove the CPU core device by removing the underlying CPU thread devices.
Hot removal of CPU for sPAPR guests is achieved by sending the hot unplug
notification to the guest. Release the vCPU object after CPU hot unplug so
that vCPU fd can be parked and reused.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-06-17 16:33:49 +10:00
Bharata B Rao
af81cf323c spapr: CPU hotplug support
Set up device tree entries for the hotplugged CPU core and use the
exising RTAS event logging infrastructure to send CPU hotplug notification
to the guest.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-06-17 16:33:49 +10:00
Bharata B Rao
94a94e4c49 spapr: convert boot CPUs into CPU core devices
Introduce sPAPRMachineClass.dr_cpu_enabled to indicate support for
CPU core hotplug. Initialize boot time CPUs as core deivces and prevent
topologies that result in partially filled cores. Both of these are done
only if CPU core hotplug is supported.

Note: An unrelated change in the call to xics_system_init() is done
in this patch as it makes sense to use the local variable smt introduced
in this patch instead of kvmppc_smt_threads() call here.

TODO: We derive sPAPR core type by looking at -cpu <model>. However
we don't take care of "compat=" feature yet for boot time as well
as hotplug CPUs.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-06-17 16:33:49 +10:00
Bharata B Rao
afd10a0fa6 spapr: Move spapr_cpu_init() to spapr_cpu_core.c
Start consolidating CPU init related routines in spapr_cpu_core.c. As
part of this, move spapr_cpu_init() and its dependencies from spapr.c
to spapr_cpu_core.c

No functionality change in this patch.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
[dwg: Rename TIMEBASE_FREQ to SPAPR_TIMEBASE_FREQ, since it's now in a
 public(ish) header]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-06-17 16:33:48 +10:00
Bharata B Rao
3b54254966 spapr: Abstract CPU core device and type specific core devices
Add sPAPR specific abastract CPU core device that is based on generic
CPU core device. Use this as base type to create sPAPR CPU specific core
devices.

TODO:
- Add core types for other remaining CPU types
- Handle CPU model alias correctly

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-06-17 16:33:48 +10:00
Bharata B Rao
3f97b53a68 qom: API to get instance_size of a type
Add an API object_type_get_size(const char *typename) that returns the
instance_size of the give typename.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-06-17 16:33:48 +10:00
Bharata B Rao
aab99135b6 spapr_drc: Prevent detach racing against attach for CPU DR
If a CPU is hot removed while hotplug of the same is still in progress,
the guest crashes. Prevent this by ensuring that detach is done only
after attach has completed.

The existing code already prevents such race for PCI hotplug. However
given that CPU is a logical DR unlike PCI and starts with ISOLATED
state, we need a logic that works for CPU too.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
               [Don't set awaiting_attach for PCI devices]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-06-17 16:33:48 +10:00
Bharata B Rao
4a4b344c7c xics,xics_kvm: Handle CPU unplug correctly
XICS is setup for each CPU during initialization. Provide a routine
to undo the same when CPU is unplugged. While here, move ss->cs management
into xics from xics_kvm since there is nothing KVM specific in it.
Also ensure xics reset doesn't set irq for CPUs that are already unplugged.

This allows reboot of a VM that has undergone CPU hotplug and unplug
to work correctly.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-06-17 16:33:48 +10:00
Bharata B Rao
f1020c2c26 cpu: Abstract CPU core type
Add an abstract CPU core type that could be used by machines that want
to define and hotplug CPUs in core granularity.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
               [Integer core property]
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
[dwg: changed property names to 'core-id' and 'nr-threads']
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-06-17 16:33:48 +10:00
Igor Mammedov
41346263c4 qdev: hotplug: Introduce HotplugHandler.pre_plug() callback
pre_plug callback is to be called before device.realize() is executed.
This would allow to check/set device's properties from HotplugHandler.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-06-17 16:33:48 +10:00
Richard Henderson
2e11b15dff target-ppc: Fix rlwimi, rlwinm, rlwnm
In 63ae0915f8, I arranged to use a 32-bit rotate, without
considering the effect of a mask value that wraps around to
the high bits of the word.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-06-17 16:33:33 +10:00
Gavin Shan
d917e88d85 vfio: Fix broken EEH
vfio_eeh_container_op() is the backend that communicates with
host kernel to support EEH functionality in QEMU. However, the
functon should return the value from host kernel instead of 0
unconditionally.

dwg: Specifically the problem occurs for the handful of EEH
sub-operations which can return a non-zero, non-error result.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
[dwg: clarification to commit message]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-06-17 15:59:18 +10:00
Jakub Horak
35b5066ea7 target-ppc: Bug in BookE wait instruction
Fixed bug in code generation for the PowerPC "wait" instruction. It
doesn't make sense to store a non-initialized register.

Signed-off-by: Jakub Horak <thement@ibawizard.net>
[dwg: revised commit message]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-06-17 15:59:16 +10:00
Thomas Huth
fcbf4a3c0c ppc / sparc: Add a tester for checking whether OpenBIOS runs successfully
Since the mac99 and g3beige PowerPC machines recently broke without
being noticed, it would be good to have a tester for "make check"
that detects such issues immediately. A simple way to test the firmware
of these machines is to use the "-prom-env" parameter of QEMU. This
parameter can be used to put some Forth code into the 'boot-command'
firmware variable which then can signal success to the tester by
writing a magic value to a known memory location. And since some of the
Sparc machines are also using OpenBIOS, they are now tested with this
prom-env-tester, too.

Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
[dwg: Removed sparc64, because it trips a TCG bug on 32-bit hosts]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-06-17 15:57:59 +10:00
Michael S. Tsirkin
874a235830 MAINTAINERS: add Marcel to PCI
Marcel is reviewing PCI patches anyway, things will
be easier if people remember to Cc him.

Cc: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-06-17 03:28:03 +03:00
Cao jin
2cbb1a6826 msi_init: change return value to 0 on success
No caller use its return value as msi capability offset, also in order
to make its return behaviour consistent with msix_init().

cc: Michael S. Tsirkin <mst@redhat.com>
cc: Paolo Bonzini <pbonzini@redhat.com>
cc: Hannes Reinecke <hare@suse.de>
cc: Markus Armbruster <armbru@redhat.com>
cc: Marcel Apfelbaum <marcel@redhat.com>

Acked-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-06-17 03:28:03 +03:00
Cao jin
52ea63dea4 fix some coding style problems
It has:
1. More newlines make the code block well separated.
2. Add more comments for msi_init.
3. Fix a indentation in vmxnet3.c.
4. ioh3420 & xio3130_downstream: put PCI Express capability init function
   together, make it more readable.

cc: Michael S. Tsirkin <mst@redhat.com>
cc: Markus Armbruster <armbru@redhat.com>
cc: Marcel Apfelbaum <marcel@redhat.com>
cc: Dmitry Fleytman <dmitry@daynix.com>
cc: Jason Wang <jasowang@redhat.com>

Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-06-17 03:28:03 +03:00
Cao jin
97fe42f19b pci core: assert ENOSPC when add capability
ENOSPC is programming error, assert it for debugging.

cc: Michael S. Tsirkin <mst@redhat.com>
cc: Marcel Apfelbaum <marcel@redhat.com>
cc: Markus Armbruster <armbru@redhat.com>

Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-06-17 03:28:03 +03:00
Marc-André Lureau
4616e3592b test: start vhost-user reconnect test
This is a simple reconnect test, that simply checks if vhost-user
reconnection is possible and restore the state. A more complete test
would actually manipulate and check the ring contents (such extended
testing would benefit from the libvhost-user proposed in QEMU list to
avoid duplication of ring manipulations)

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Victor Kaplansky <victork@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-06-17 03:28:03 +03:00
Marc-André Lureau
0ee2e9daf8 tests: append i386 tests
Do not overwrite x86-64 tests, re-enable vhost-user-test.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Victor Kaplansky <victork@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-06-17 03:28:03 +03:00
Marc-André Lureau
bfc6cf31ce vhost-net: save & restore vring enable state
A driver may change the vring enable state at run time but vhost-user
backend may not be present (a contrived example is when the backend is
disconnected and the device is reconfigured after driver rebinding)

Restore the vring state when the vhost-user backend is started, so it
can process the ring.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Tested-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Victor Kaplansky <victork@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-06-17 03:28:03 +03:00
Marc-André Lureau
a463215b08 vhost-net: save & restore vhost-user acked features
The initial vhost-user connection sets the features to be negotiated
with the driver. Renegotiation isn't possible without device reset.

To handle reconnection of vhost-user backend, ensure the same set of
features are provided, and reuse already acked features.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Tested-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Victor Kaplansky <victork@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-06-17 03:28:03 +03:00
Marc-André Lureau
72b65f922b vhost-net: do not crash if backend is not present
Do not crash when backend is not present while enabling the ring. A
following patch will save the enabled state so it can be restored once
the backend is started.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Tested-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Victor Kaplansky <victork@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-06-17 03:28:03 +03:00
Marc-André Lureau
0d572afd52 vhost-user: disconnect on start failure
If the backend failed to start (for example feature negociation failed),
do not exit, but disconnect the char device instead. Slightly more
robust for reconnect case.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Tested-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Victor Kaplansky <victork@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-06-17 03:28:02 +03:00
Tetsuya Mukawa
7d9d17f71e qemu-char: add qemu_chr_disconnect to close a fd accepted by listen fd
The patch introduces qemu_chr_disconnect(). The function is used for
closing a fd accepted by listen fd. Though we already have qemu_chr_delete(),
but it closes not only accepted fd but also listen fd. This new function
is used when we still want to keep listen fd.

Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Tested-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Victor Kaplansky <victork@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-06-17 03:28:02 +03:00
Marc-André Lureau
523b018dde tests/vhost-user-bridge: workaround stale vring base
This patch is a similar solution to what Yuanhan Liu/Huawei Xie have
suggested for DPDK. When vubr quits (killed or crashed), a restart of
vubr would get stale vring base from QEMU. That would break the kernel
virtio net completely, making it non-work any more, unless a driver
reset is done.

So, instead of getting the stale vring base from QEMU, Huawei suggested
we could get a proper one from used->idx. This works because the queues
packets are processed in order.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Tested-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Victor Kaplansky <victork@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-06-17 03:28:02 +03:00
Marc-André Lureau
aef8486ede tests/vhost-user-bridge: add client mode
If -c is specified, vubr will try to connect to the socket instead of
listening for connections.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Tested-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Victor Kaplansky <victork@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-06-17 03:28:02 +03:00
Tetsuya Mukawa
a6553598be vhost-user: add ability to know vhost-user backend disconnection
Current QEMU cannot detect vhost-user backend disconnection. The
patch adds ability to know it.
To know disconnection, add watcher to detect G_IO_HUP event. When
G_IO_HUP event is detected, the disconnected socket will be read
to cause a CHR_EVENT_CLOSED.

Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Tested-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Victor Kaplansky <victork@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-06-17 03:28:02 +03:00
Peter Xu
4a94b3aa6d pci: fix pci_requester_id()
This fix SID verification failure when IOMMU IR is enabled with PCI
bridges. Existing pci_requester_id() is more like getting BDF info
only. Renaming it to pci_get_bdf(). Meanwhile, we provide the correct
implementation to get requester ID. VT-d spec 5.1.1 is a good reference
to go, though it talks only about interrupt delivery, the rule works
exactly the same for non-interrupt cases.

Currently, there are three use cases for pci_requester_id():

- PCIX status bits: here we need BDF only, not requester ID. Replacing
  with pci_get_bdf().
- PCIe Error injection and MSI delivery: for both these cases, we are
  looking for requester IDs. Here we should use the new impl.

To avoid a PCI walk every time we send MSI message, one requester_id
cache field is added to PCIDevice to cache the result when initialize
PCI device.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-06-17 03:28:02 +03:00
Thomas Huth
a1aa130989 hw/ppc/spapr: Silence deprecation message in qtest mode
When running "make check", there is currently always an error message
saying "spapr-pci-vfio-host-bridge is deprecated". This happens because
the QOM tests are instantiating all possible devices, and the error
message is currently located in the instance_init() function of the
device. Since it is legal for the tests to instantiate a device without
using it, the error message should be silenced when we're running in
test mode.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-06-17 09:47:59 +10:00
Peter Maydell
585fcd4b11 * KVM startup speedup (Chao Peng)
* configure fixes and cleanups (David, Thomas)
 * ctags fix (Sergey)
 * NBD cleanups (Peter, Eric)
 * "-L help" command line option (Richard)
 * More esp.c bugfixes (me, Prasad)
 * KVM_CAP_MAX_VCPU_ID support (Greg)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJXYtYuAAoJEL/70l94x66Dy68H/1lP2Y/hz6ngVuo64kvQPoNE
 Iw4gA6yaUljGkC2uFsvQKipwawgE9VahGKQfV1soABpjFKFKocFWKruGRxjBcj/h
 jJU79cczxseSXO8Wi0yuXjE2bq8cx9q3Zmxp6JLpfYQ3CTaEAf1O8zUfz85hBdCz
 4/RhPf9cBi8L2p/MagqyCd9jWZ0S/xcJFXgj54Ya+kCEq6QbpgUjFhVlt3xkeMBo
 fQUnMROunjOCMwwVp1DPXAyh3YorIfCdfIkAFjIoUzDcsnwsL8L4+z+qXR9n+emT
 ugM13M7Fq3GP6BURCcTft74xfZg0fbKcHkQlSc92tIvGrWY8BOUBcckVLBClOjg=
 =kB7q
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging

* KVM startup speedup (Chao Peng)
* configure fixes and cleanups (David, Thomas)
* ctags fix (Sergey)
* NBD cleanups (Peter, Eric)
* "-L help" command line option (Richard)
* More esp.c bugfixes (me, Prasad)
* KVM_CAP_MAX_VCPU_ID support (Greg)

# gpg: Signature made Thu 16 Jun 2016 17:39:10 BST
# gpg:                using RSA key 0xBFFBD25F78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>"
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini/tags/for-upstream: (29 commits)
  vl: smp_parse: cleanups
  scsi: esp: make cmdbuf big enough for maximum CDB size
  scsi: esp: clean up handle_ti/esp_do_dma if s->do_cmd
  scsi: esp: respect FIFO invariant after message phase
  scsi: esp: check buffer length before reading scsi command
  nbd: Avoid magic number for NBD max name size
  nbd: Detect servers that send unexpected error values
  nbd: Clean up ioctl handling of qemu-nbd -c
  nbd: Group all Linux-specific ioctl code in one place
  nbd: Reject unknown request flags
  nbd: Improve server handling of bogus commands
  nbd: Quit server after any write error
  nbd: More debug typo fixes, use correct formats
  nbd: Use BDRV_REQ_FUA for better FUA where supported
  vl.c: Add '-L help' which lists data dirs.
  KVM: use KVM_CAP_MAX_VCPU_ID
  scsi-disk: Use (unsigned long) typecasts when using "%lu" format string
  target-i386: kvm: cache KVM_GET_SUPPORTED_CPUID data
  nbd: simplify the nbd_request and nbd_reply structs
  nbd: Don't use cpu_to_*w() functions
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-06-16 17:58:45 +01:00
Andrew Jones
0544edd88a vl: smp_parse: cleanups
No functional changes; only some code movement and removal of
dead code (impossible conditions). Also, max_cpus can be
initialized to 1, like smp_cpus, because it's either set by the
user or set to smp_cpus, when smp_cpus is set by the user, or
set to 1, when nothing is set.

Signed-off-by: Andrew Jones <drjones@redhat.com>
Message-Id: <1465580427-13596-2-git-send-email-drjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16 18:39:05 +02:00
Prasad J Pandit
926cde5f3e scsi: esp: make cmdbuf big enough for maximum CDB size
While doing DMA read into ESP command buffer 's->cmdbuf', it could
write past the 's->cmdbuf' area, if it was transferring more than 16
bytes.  Increase the command buffer size to 32, which is maximum when
's->do_cmd' is set, and add a check on 'len' to avoid OOB access.

Reported-by: Li Qiang <liqiang6-s@360.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16 18:39:05 +02:00
Paolo Bonzini
7f0b6e114a scsi: esp: clean up handle_ti/esp_do_dma if s->do_cmd
Avoid duplicated code between esp_do_dma and handle_ti.  esp_do_dma
has the same code that handle_ti contains after the call to esp_do_dma;
but the code in handle_ti is never reached because it is in an "else if".
Remove the else and also the pointless return.

esp_do_dma also has a partially dead assignment of the to_device
variable.  Sink it to the point where it's actually used.

Finally, assert that the other caller of esp_do_dma (esp_transfer_data)
only transfers data and not a command.  This is true because get_cmd
cancels the old request synchronously before its caller handle_satn_stop
sets do_cmd to 1.

Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16 18:39:05 +02:00
Paolo Bonzini
d020aa504c scsi: esp: respect FIFO invariant after message phase
The FIFO contains two bytes; hence the write ptr should be two bytes ahead
of the read pointer.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16 18:39:05 +02:00
Prasad J Pandit
d3cdc49138 scsi: esp: check buffer length before reading scsi command
The 53C9X Fast SCSI Controller(FSC) comes with an internal 16-byte
FIFO buffer. It is used to handle command and data transfer.
Routine get_cmd() in non-DMA mode, uses 'ti_size' to read scsi
command into a buffer. Add check to validate command length against
buffer size to avoid any overrun.

Reported-by: Li Qiang <liqiang6-s@360.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-Id: <1464717207-7549-1-git-send-email-ppandit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16 18:39:05 +02:00
Eric Blake
943cec86d0 nbd: Avoid magic number for NBD max name size
Declare a constant and use that when determining if an export
name fits within the constraints we are willing to support.

Note that upstream NBD recently documented that clients MUST
support export names of 256 bytes (not including trailing NUL),
and SHOULD support names up to 4096 bytes.  4096 is a bit big
(we would lose benefits of stack-allocation of a name array),
and we already have other limits in place (for example, qcow2
snapshot names are clamped around 1024).  So for now, just
stick to the required minimum, as that's easier to audit than
a full-scale support for larger names.

Signed-off-by: Eric Blake <eblake@redhat.com>

Message-Id: <1463006384-7734-12-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16 18:39:05 +02:00
Eric Blake
f3c32fce36 nbd: Detect servers that send unexpected error values
Add some debugging to flag servers that are not compliant to
the NBD protocol.  This would have flagged the server bug
fixed in commit c0301fcc.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alex Bligh <alex@alex.org.uk>

Message-Id: <1463006384-7734-11-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16 18:39:05 +02:00
Eric Blake
f57e2416aa nbd: Clean up ioctl handling of qemu-nbd -c
The kernel ioctl() interface into NBD is limited to 'unsigned long';
we MUST pass in input with that type (and not int or size_t, as
there may be platform ABIs where the wrong types promote incorrectly
through var-args).  Furthermore, on 32-bit platforms, the kernel
is limited to a maximum export size of 2T (our BLKSIZE of 512 times
a SIZE_BLOCKS constrained by 32 bit unsigned long).

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1463006384-7734-8-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16 18:39:05 +02:00
Eric Blake
98494e3b92 nbd: Group all Linux-specific ioctl code in one place
NBD ioctl()s are used to manage an NBD client session where
initial handshake is done in userspace, but then the transmission
phase is handed off to the kernel through a /dev/nbdX device.
As such, all ioctls sent to the kernel on the /dev/nbdX fd belong
in client.c; nbd_disconnect() was out-of-place in server.c.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1463006384-7734-7-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16 18:39:05 +02:00
Eric Blake
ab7c548e26 nbd: Reject unknown request flags
The NBD protocol says that clients should not send a command flag
that has not been negotiated (whether by the client requesting an
option during a handshake, or because we advertise support for the
flag in response to NBD_OPT_EXPORT_NAME), and that servers should
reject invalid flags with EINVAL.  We were silently ignoring the
flags instead.  The client can't rely on our behavior, since it is
their fault for passing the bad flag in the first place, but it's
better to be robust up front than to possibly behave differently
than the client was expecting with the attempted flag.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alex Bligh <alex@alex.org.uk>
Message-Id: <1463006384-7734-6-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16 18:39:05 +02:00
Eric Blake
29b6c3b319 nbd: Improve server handling of bogus commands
We have a few bugs in how we handle invalid client commands:

- A client can send an NBD_CMD_DISC where from + len overflows,
convincing us to reply with an error and stay connected, even
though the protocol requires us to silently disconnect. Fix by
hoisting the special case sooner.

- A client can send an NBD_CMD_WRITE where from + len overflows,
where we reply to the client with EINVAL without consuming the
payload; this will normally cause us to fail if the next thing
read is not the right magic, but in rare cases, could cause us
to interpret the data payload as valid commands and do things
not requested by the client. Fix by adding a complete flag to
track whether we are in sync or must disconnect.

Furthermore, we have split the checks for bogus from/len across
two functions, when it is easier to do it all at once.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1463006384-7734-5-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16 18:39:05 +02:00
Eric Blake
63d5ef869e nbd: Quit server after any write error
We should never ignore failure from nbd_negotiate_send_rep(); if
we are unable to write to the client, then it is not worth trying
to continue the negotiation.  Fortunately, the problem is not
too severe - chances are that the errors being ignored here (mainly
inability to write the reply to the client) are indications of
a closed connection or something similar, which will also affect
the next attempt to interact with the client and eventually reach
a point where the errors are detected to end the loop.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1463006384-7734-4-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16 18:39:05 +02:00
Eric Blake
2cb347493c nbd: More debug typo fixes, use correct formats
Clean up some debug message oddities missed earlier; this includes
some typos, and recognizing that %d is not necessarily compatible
with uint32_t. Also add a couple messages that I found useful
while debugging things.

Signed-off-by: Eric Blake <eblake@redhat.com>

Message-Id: <1463006384-7734-3-git-send-email-eblake@redhat.com>
[Do not use PRIx16, clang complains. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16 18:39:05 +02:00
Eric Blake
a0c303693e nbd: Use BDRV_REQ_FUA for better FUA where supported
Rather than always flushing ourselves, let the block layer
forward the FUA on to the underlying device - where all
underlying layers also understand FUA, we are now more
efficient; and where any underlying layer doesn't understand
it, now the block layer takes care of the full flush fallback
on our behalf.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1463006384-7734-2-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16 18:39:04 +02:00
Richard W.M. Jones
37146e7eaf vl.c: Add '-L help' which lists data dirs.
QEMU compiles a list of data directories from various sources.  When
consuming a QEMU binary it's useful to be able to get this list of
data directories: a primary reason is so you can list what BIOSes or
keymaps ship with this version of QEMU.  However without reproducing
the method that QEMU uses internally, it's not possible to get the
list of data directories.

This commit adds a simple '-L help' option that just lists out the
data directories as qemu calculates them:

$ ./x86_64-softmmu/qemu-system-x86_64 -L help
/home/rjones/d/qemu/pc-bios
/usr/local/share/qemu

$ ./x86_64-softmmu/qemu-system-x86_64 -L /tmp -L help
/tmp
/home/rjones/d/qemu/pc-bios
/usr/local/share/qemu

Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1463416475-11728-2-git-send-email-rjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16 18:39:04 +02:00
Greg Kurz
f31e326637 KVM: use KVM_CAP_MAX_VCPU_ID
As stated in linux/Documentation/virtual/kvm/api.txt:

The maximum possible value for max_vcpu_id can be retrieved using the
KVM_CAP_MAX_VCPU_ID of the KVM_CHECK_EXTENSION ioctl() at run-time.

If the KVM_CAP_MAX_VCPU_ID does not exist, you should assume that
max_vcpu_id is the same as the value returned from KVM_CAP_MAX_VCPUS.

Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Message-Id: <146424974323.5666.5471538288045048119.stgit@bahia.huguette.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16 18:39:04 +02:00
Thomas Huth
142c21455b scsi-disk: Use (unsigned long) typecasts when using "%lu" format string
Some source code analyzers like cppcheck spill out a warning if
the sign of the argument does not match the format string.

Ticket: https://bugs.launchpad.net/qemu/+bug/1589564
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1465805418-15906-1-git-send-email-thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16 18:39:04 +02:00
Chao Peng
494e95e910 target-i386: kvm: cache KVM_GET_SUPPORTED_CPUID data
KVM_GET_SUPPORTED_CPUID ioctl is called frequently when initializing
CPU. Depends on CPU features and CPU count, the number of calls can be
extremely high which slows down QEMU booting significantly. In our
testing, we saw 5922 calls with switches:

    -cpu SandyBridge -smp 6,sockets=6,cores=1,threads=1

This ioctl takes more than 100ms, which is almost half of the total
QEMU startup time.

While for most cases the data returned from two different invocations
are not changed, that means, we can cache the data to avoid trapping
into kernel for the second time. To make sure the cache safe one
assumption is desirable: the ioctl is stateless. This is not true for
CPUID leaves in general (such as CPUID leaf 0xD, whose value depends
on guest XCR0 and IA32_XSS) but it is true of KVM_GET_SUPPORTED_CPUID,
which runs before there is a value for XCR0 and IA32_XSS.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Message-Id: <1465784487-23482-1-git-send-email-chao.p.peng@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16 18:39:04 +02:00
Paolo Bonzini
56af2dda98 nbd: simplify the nbd_request and nbd_reply structs
These structs are never used to represent the bytes that go over the
network.  The big-endian network data is built into a uint8_t array
in nbd_{receive,send}_{request,reply}.  Remove the unused magic field,
reorder the struct to avoid holes, and remove the packed attribute.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16 18:39:04 +02:00