Commit Graph

16056 Commits

Author SHA1 Message Date
Peter Maydell
2c107d7684 -----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
 
 iQEcBAABAgAGBQJXT9DWAAoJEO8Ells5jWIRgFAH/1ZDXm8V523AMDOEvBAWgqur
 Dj8ZaIwFkqJp7xtLdhS0yKF3xW+vtgx9k+Qftk0S8qEiFKPbThR8iB5VNuesErwd
 AZhWo4bnVhKwtWyMw3BDRDK1N4huAWPMZEva1xovR/Cc9v5IG5mx57/K3Zz5C8ec
 Jsn4DsLKN0q7W0D0dlnbEOkSjl6iKJchvfPCR6UfvrU7BxfXaCZ9Z7Sfh8ec6tfr
 iMgcV9u3A3Zs72gTM9/jdKx8vOrWtdKJufJ8s2Bctc7CyfBNWwnV8PjndhEe3Xvs
 vlYeJopdpDPsdMkMtYD6cevtEgvD5yhOBndJ7et807jjuCvUf837tMhodKkFk9M=
 =SjIZ
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/jasowang/tags/net-pull-request' into staging

# gpg: Signature made Thu 02 Jun 2016 07:23:18 BST using RSA key ID 398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <jasowang@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 215D 46F4 8246 689E C77F  3562 EF04 965B 398D 6211

* remotes/jasowang/tags/net-pull-request: (31 commits)
  Add ENET device to i.MX6 SOC.
  Add ENET/Gbps Ethernet support to FEC device
  i.MX: move FEC device to a register array structure.
  i.MX: Rename i.MX FEC defines to ENET_XXX
  i.MX: reset TX/RX descriptors when FEC is disabled.
  i.MX: Fix FEC code for ECR register reset value.
  i.MX: Fix FEC code for MDIO address selection
  i.MX: Fix FEC code for MDIO operation selection
  net: handle optional VLAN header in checksum computation.
  net: improve UDP/TCP checksum computation.
  e1000e: Introduce qtest for e1000e device
  net: Introduce e1000e device emulation
  e1000: Move out code that will be reused in e1000e
  e1000_regs: Add definitions for Intel 82574-specific bits
  vmxnet3: Use pci_dma_* API instead of cpu_physical_memory_*
  net_pkt: Extend packet abstraction as required by e1000e functionality
  rtl8139: Move more TCP definitions to common header
  net_pkt: Name vmxnet3 packet abstractions more generic
  vmxnet3: Use common MAC address tracing macros
  net: Add macros for MAC address tracing
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-06-02 14:26:57 +01:00
Jean-Christophe Dubois
517b5e9a17 Add ENET device to i.MX6 SOC.
This adds the ENET device to the i.MX6 SOC.

This was tested by booting Linux on an Qemu i.MX6 instance and accessing
the internet from the linux guest.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:46 +08:00
Jean-Christophe Dubois
a699b410d7 Add ENET/Gbps Ethernet support to FEC device
The ENET device (present in i.MX6) is "derived" from FEC and backward
compatible with it.

This patch adds the necessary support of the added feature in the ENET
device to allow Linux to use it (on supported processors).

Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:46 +08:00
Jean-Christophe Dubois
db0de35268 i.MX: move FEC device to a register array structure.
This is to prepare for the ENET Gb device of the i.MX6.

Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:46 +08:00
Jean-Christophe Dubois
1bb3c37182 i.MX: Rename i.MX FEC defines to ENET_XXX
Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:46 +08:00
Jean-Christophe Dubois
ff4b325f5e i.MX: reset TX/RX descriptors when FEC is disabled.
According to the FEC chapter of i.MX25 reference manual

RX adn TX descriptors are reseted when the FEC device is disabled through ECR.

Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:46 +08:00
Jean-Christophe Dubois
ccdb81d327 i.MX: Fix FEC code for ECR register reset value.
According to the FEC chapter of i.MX25 reference manual ECR register is
initialized at 0xf0000000 at reset time.

We fix the value.

Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:46 +08:00
Jean-Christophe Dubois
b413643a5c i.MX: Fix FEC code for MDIO address selection
According to the FEC chapter of i.MX25 reference manual

When writing to MMFR register, the MDIO device and adress are selected by
bit 27 to 23 and bit 22 to 18 respectively. This is a total of 10 bits
that need to be used by the Phy chip/address decoding function.

This patch fixes the number of bits used from 9 to 10.

Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:46 +08:00
Jean-Christophe Dubois
4816dc168b i.MX: Fix FEC code for MDIO operation selection
According to the FEC chapter of i.MX25 reference manual

When writing the MMFR register, bit 29 and 28 select the requested operation.
 * 10 means read operation with valid MII mgmt frame
 * 11 means read operation with non compliant MII mgmt frame
 * 01 means write operation with valid MII mgmt frame
 * 00 means write operation with non compliant MII mgmt frame

So while bit 28 does change beween read/write for valid MII mgmt frame, the
mening is inverted for non compliant MII mgmt frame.

Bit 29 on the other hand means read/write whatever the type of mgmt frame
involved.

So this patch change the operation selection from bit 28 to bit 29 as it is
more generic.

Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:46 +08:00
Dmitry Fleytman
6f3fbe4ed0 net: Introduce e1000e device emulation
This patch introduces emulation for the Intel 82574 adapter, AKA e1000e.

This implementation is derived from the e1000 emulation code, and
utilizes the TX/RX packet abstractions that were initially developed for
the vmxnet3 device. Although some parts of the introduced code may be
shared with e1000, the differences are substantial enough so that the
only shared resources for the two devices are the definitions in
hw/net/e1000_regs.h.

Similarly to vmxnet3, the new device uses virtio headers for task
offloads (for backends that support virtio extensions). Usage of
virtio headers may be forcibly disabled via a boolean device property
"vnet" (which is enabled by default). In such case task offloads
will be performed in software, in the same way it is done on
backends that do not support virtio headers.

The device code is split into two parts:

  1. hw/net/e1000e.c: QEMU-specific code for a network device;
  2. hw/net/e1000e_core.[hc]: Device emulation according to the spec.

The new device name is e1000e.

Intel specifications for the 82574 controller are available at:
http://www.intel.com/content/dam/doc/datasheet/82574l-gbe-controller-datasheet.pdf

Throughput measurement results (iperf2):

                Fedora 22 guest, TCP, RX
    4 ++------------------------------------------+
      |                                           |
      |                           X   X   X   X   X
  3.5 ++          X   X   X   X                   |
      |       X                                   |
      |                                           |
    3 ++                                          |
G     |   X                                       |
b     |                                           |
/ 2.5 ++                                          |
s     |                                           |
      |                                           |
    2 ++                                          |
      |                                           |
      |                                           |
  1.5 X+                                          |
      |                                           |
      +   +   +   +   +   +   +   +   +   +   +   +
    1 ++--+---+---+---+---+---+---+---+---+---+---+
     32  64  128 256 512  1   2   4   8  16  32  64
      B   B   B   B   B   KB  KB  KB  KB KB  KB  KB
                       Buffer size

               Fedora 22 guest, TCP, TX
  18 ++-------------------------------------------+
     |                        X                   |
  16 ++                           X   X   X   X   X
     |                   X                        |
  14 ++                                           |
     |                                            |
  12 ++                                           |
G    |               X                            |
b 10 ++                                           |
/    |                                            |
s  8 ++                                           |
     |                                            |
   6 ++          X                                |
     |                                            |
   4 ++                                           |
     |       X                                    |
   2 ++  X                                        |
     X   +   +   +   +   +    +   +   +   +   +   +
   0 ++--+---+---+---+---+----+---+---+---+---+---+
    32  64  128 256 512  1    2   4   8  16  32  64
     B   B   B   B   B   KB   KB  KB  KB KB  KB  KB
                       Buffer size

                Fedora 22 guest, UDP, RX
    3 ++------------------------------------------+
      |                                           X
      |                                           |
  2.5 ++                                          |
      |                                           |
      |                                           |
    2 ++                                 X        |
G     |                                           |
b     |                                           |
/ 1.5 ++                                          |
s     |                         X                 |
      |                                           |
    1 ++                                          |
      |                                           |
      |                 X                         |
  0.5 ++                                          |
      |        X                                  |
      X        +        +       +        +        +
    0 ++-------+--------+-------+--------+--------+
     32       64       128     256      512       1
      B        B         B       B        B      KB
                       Datagram size

                Fedora 22 guest, UDP, TX
    1 ++------------------------------------------+
      |                                           X
  0.9 ++                                          |
      |                                           |
  0.8 ++                                          |
  0.7 ++                                          |
      |                                           |
G 0.6 ++                                          |
b     |                                           |
/ 0.5 ++                                          |
s     |                                  X        |
  0.4 ++                                          |
      |                                           |
  0.3 ++                                          |
  0.2 ++                        X                 |
      |                                           |
  0.1 ++                X                         |
      X        X        +       +        +        +
    0 ++-------+--------+-------+--------+--------+
     32       64       128     256      512       1
      B        B         B       B        B      KB
                       Datagram size

              Windows 2012R2 guest, TCP, RX
  3.2 ++------------------------------------------+
      |                                   X       |
    3 ++                                          |
      |                                           |
  2.8 ++                                          |
      |                                           |
  2.6 ++                              X           |
G     |   X                   X   X           X   X
b 2.4 ++      X       X                           |
/     |                                           |
s 2.2 ++                                          |
      |                                           |
    2 ++                                          |
      |           X       X                       |
  1.8 ++                                          |
      |                                           |
  1.6 X+                                          |
      +   +   +   +   +   +   +   +   +   +   +   +
  1.4 ++--+---+---+---+---+---+---+---+---+---+---+
     32  64  128 256 512  1   2   4   8  16  32  64
      B   B   B   B   B   KB  KB  KB  KB KB  KB  KB
                       Buffer size

             Windows 2012R2 guest, TCP, TX
  14 ++-------------------------------------------+
     |                                            |
     |                                        X   X
  12 ++                                           |
     |                                            |
  10 ++                                           |
     |                                            |
G    |                                            |
b  8 ++                                           |
/    |                                    X       |
s  6 ++                                           |
     |                                            |
     |                                            |
   4 ++                               X           |
     |                                            |
   2 ++                                           |
     |           X   X            X               |
     +   X   X   +   +   X    X   +   +   +   +   +
   0 X+--+---+---+---+---+----+---+---+---+---+---+
    32  64  128 256 512  1    2   4   8  16  32  64
     B   B   B   B   B   KB   KB  KB  KB KB  KB  KB
                       Buffer size

              Windows 2012R2 guest, UDP, RX
  1.6 ++------------------------------------------X
      |                                           |
  1.4 ++                                          |
      |                                           |
  1.2 ++                                          |
      |                                  X        |
      |                                           |
G   1 ++                                          |
b     |                                           |
/ 0.8 ++                                          |
s     |                                           |
  0.6 ++                        X                 |
      |                                           |
  0.4 ++                                          |
      |                 X                         |
      |                                           |
  0.2 ++       X                                  |
      X        +        +       +        +        +
    0 ++-------+--------+-------+--------+--------+
     32       64       128     256      512       1
      B        B         B       B        B      KB
                       Datagram size

              Windows 2012R2 guest, UDP, TX
  0.6 ++------------------------------------------+
      |                                           X
      |                                           |
  0.5 ++                                          |
      |                                           |
      |                                           |
  0.4 ++                                          |
G     |                                           |
b     |                                           |
/ 0.3 ++                                 X        |
s     |                                           |
      |                                           |
  0.2 ++                                          |
      |                                           |
      |                         X                 |
  0.1 ++                                          |
      |                 X                         |
      X        X        +       +        +        +
    0 ++-------+--------+-------+--------+--------+
     32       64       128     256      512       1
      B        B         B       B        B      KB
                       Datagram size

Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:29 +08:00
Dmitry Fleytman
093454e21d e1000: Move out code that will be reused in e1000e
Code that will be shared moved to a separate files.

Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:29 +08:00
Dmitry Fleytman
06e7fa0ad7 e1000_regs: Add definitions for Intel 82574-specific bits
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:29 +08:00
Dmitry Fleytman
111710107d vmxnet3: Use pci_dma_* API instead of cpu_physical_memory_*
To make this device and network packets
abstractions ready for IOMMU.

Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:28 +08:00
Dmitry Fleytman
eb700029c7 net_pkt: Extend packet abstraction as required by e1000e functionality
This patch extends the TX/RX packet abstractions with features that will
be used by the e1000e device implementation.

Changes are:

  1. Support iovec lists for RX buffers
  2. Deeper RX packets parsing
  3. Loopback option for TX packets
  4. Extended VLAN headers handling
  5. RSS processing for RX packets

Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:28 +08:00
Dmitry Fleytman
66409b7c8b rtl8139: Move more TCP definitions to common header
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:28 +08:00
Dmitry Fleytman
605d52e62f net_pkt: Name vmxnet3 packet abstractions more generic
This patch drops "vmx" prefix from packet abstractions names
to emphasize the fact they are generic and not tied to any
specific network device.

These abstractions will be reused by e1000e emulation implementation
introduced by following patches so their names need generalization.

This patch (except renamed files, adjusted comments and changes in MAINTAINTERS)
was produced by:

git grep -lz 'vmxnet_tx_pkt' | xargs -0 perl -i'' -pE "s/vmxnet_tx_pkt/net_tx_pkt/g"
git grep -lz 'vmxnet_rx_pkt' | xargs -0 perl -i'' -pE "s/vmxnet_rx_pkt/net_rx_pkt/g"
git grep -lz 'VmxnetTxPkt' | xargs -0 perl -i'' -pE "s/VmxnetTxPkt/NetTxPkt/g"
git grep -lz 'VMXNET_TX_PKT' | xargs -0 perl -i'' -pE "s/VMXNET_TX_PKT/NET_TX_PKT/g"
git grep -lz 'VmxnetRxPkt' | xargs -0 perl -i'' -pE "s/VmxnetRxPkt/NetRxPkt/g"
git grep -lz 'VMXNET_RX_PKT' | xargs -0 perl -i'' -pE "s/VMXNET_RX_PKT/NET_RX_PKT/g"
sed -ie 's/VMXNET_/NET_/g' hw/net/vmxnet_rx_pkt.c
sed -ie 's/VMXNET_/NET_/g' hw/net/vmxnet_tx_pkt.c

Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:27 +08:00
Dmitry Fleytman
ab64787201 vmxnet3: Use common MAC address tracing macros
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:27 +08:00
Dmitry Fleytman
a4b387e623 vmxnet3: Use generic function for DSN capability definition
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:26 +08:00
Dmitry Fleytman
b56b9285e4 pcie: Introduce function for DSN capability creation
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:26 +08:00
Dmitry Fleytman
6383292ac8 pcie: Add support for PCIe CAP v1
Added support for PCIe CAP v1, while reusing some of the existing v2
infrastructure.

Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:26 +08:00
Dmitry Fleytman
3bdfaabbcf msix: make msix_clr_pending() visible for clients
This function will be used by e1000e device code.

Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:09 +08:00
Peter Maydell
500acc9c41 ppc patch queue for 2016-05-31
Here's another ppc patch queue.  This batch is all preliminaries
 towards two significant features:
 
 1) Full hypervisor-mode support for POWER8
     Patches 1-8 start fixing various bugs with TCG's handling of
     hypervisor mode
 
 2) CPU hotplug support
     Patches 9-12 make some preliminary fixes towards implementing CPU
     hotplug on ppc64 (and other non-x86 platforms).  These patches are
     actually to generic code, not ppc, but are included here with
     Paolo's ACK.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXTN1QAAoJEGw4ysog2bOSM4kP/2TKm/wkGo3nsGm7vG0CArs+
 JVIlTWI9Le7Cq5ijCkTwV9gjeG2CYz+Us2PCh2ZAoHpXgZtP7px2HRcDv07SbCnt
 SaCwCS+EGf3ZO9baQrzG0zfe8XrlJF+XXTejD2zWtOZw7sZ/4OPWF9KdcZbjWqFp
 PzJuXrpYOAaIyXyEPJSZFpHY+AC9NIblqHlUrKntPLLOYbqQBYP4IMxsUmOgu2IX
 rFK/5A8t20BJN0lbmx8JNKh0voorFpHY/hhaH/1T7rKxsRkKMh3VbYSxD6EYs3Uc
 nZ4ufQQW6C4CEFta3YHNwoClcsQUbnZQh3Ra+gKo9bXvqDzasVpq/mBgl3BDjGeG
 LQPSA6sfmEA8lqtRikVdgSgdXDnwy5YXJLVmIXeAIG1KHa6eRuUxC3o+ScOkcH3A
 ynLglCEBl9slsG9/yYkDcFW2u0t/txTBUvaxMfOQomAejrGjOLGZqnSWMd2UC4Gt
 KQRP+b7igkXC7+bfrpBPWKyYGvOOCukESw3OV90hLBIzOthI1dI5hO0Gj61C/nlI
 NXMRbx0qTztgj3tfSTTs6e9Ke8PEKnyXqgol0+t9Yxntlz28f2alTubUoyv/vZjx
 8J1IOlNms3PnO26TxUVBu7/KaCGCM25eTQgllbTx8rhaqin3wAH3dRc1RTlWWhwJ
 SgADl+MWf8sa7DkcxnZa
 =vAIf
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.7-20160531' into staging

ppc patch queue for 2016-05-31

Here's another ppc patch queue.  This batch is all preliminaries
towards two significant features:

1) Full hypervisor-mode support for POWER8
    Patches 1-8 start fixing various bugs with TCG's handling of
    hypervisor mode

2) CPU hotplug support
    Patches 9-12 make some preliminary fixes towards implementing CPU
    hotplug on ppc64 (and other non-x86 platforms).  These patches are
    actually to generic code, not ppc, but are included here with
    Paolo's ACK.

# gpg: Signature made Tue 31 May 2016 01:39:44 BST using RSA key ID 20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-2.7-20160531:
  cpu: Add a sync version of cpu_remove()
  cpu: Reclaim vCPU objects
  exec: Do vmstate unregistration from cpu_exec_exit()
  exec: Remove cpu from cpus list during cpu_exec_exit()
  ppc: Add PPC_64H instruction flag to POWER7 and POWER8
  ppc: Get out of emulation on SMT "OR" ops
  ppc: Fix sign extension issue in mtmsr(d) emulation
  ppc: Change 'invalid' bit mask of tlbiel and tlbie
  ppc: tlbie, tlbia and tlbisync are HV only
  ppc: Do some batching of TCG tlb flushes
  ppc: Use split I/D mmu modes to avoid flushes on interrupts
  ppc: Remove MMU_MODEn_SUFFIX definitions

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-31 10:37:22 +01:00
Benjamin Herrenschmidt
cd0c6f4735 ppc: Do some batching of TCG tlb flushes
On ppc64 especially, we flush the tlb on any slbie or tlbie instruction.

However, those instructions often come in bursts of 3 or more (context
switch will favor a series of slbie's for example to an slbia if the
SLB has less than a certain number of entries in it, and tlbie's can
happen in a series, with PAPR, H_BULK_REMOVE can remove up to 4 entries
at a time.

Doing a tlb_flush() each time is a waste of time. We end up doing a memset
of the whole TLB, reloading it for the next instruction, memset'ing again,
etc...

Those instructions don't have to take effect immediately. For slbie, they
can wait for the next context synchronizing event. For tlbie, the next
tlbsync.

This implements batching by keeping a flag that indicates that we have a
TLB in need of flushing. We check it on interrupts, rfi's, isync's and
tlbsync and flush the TLB if needed.

This reduces the number of tlb_flush() on a boot to a ubuntu installer
first dialog screen from roughly 360K down to 36K.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[clg: added a 'CPUPPCState *' variable in h_remove() and
      h_bulk_remove() ]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
[dwg: removed spurious whitespace change, use 0/1 not true/false
      consistently, since tlb_need_flush has int type]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-05-30 13:20:04 +10:00
Paolo Bonzini
07bdaa4196 memory: split memory_region_from_host from qemu_ram_addr_from_host
Move the old qemu_ram_addr_from_host to memory_region_from_host and
make it return an offset within the region.  For qemu_ram_addr_from_host
return the ram_addr_t directly, similar to what it was before
commit 1b5ec23 ("memory: return MemoryRegion from qemu_ram_addr_from_host",
2013-07-04).

Reviewed-by: Marc-André Lureau <marcandre.lureau@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-29 09:11:12 +02:00
Paolo Bonzini
4ff87573df memory: remove qemu_get_ram_fd, qemu_set_ram_fd, qemu_ram_block_host_ptr
Remove direct uses of ram_addr_t and optimize memory_region_{get,set}_fd
now that a MemoryRegion knows its RAMBlock directly.

Reviewed-by: Marc-André Lureau <marcandre.lureau@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-29 09:11:12 +02:00
Fam Zheng
063143d5b1 scsi-generic: Merge block max xfer len in INQUIRY response
The rationale is similar to the above mode sense response interception:
this is practically the only channel to communicate restraints from
elsewhere such as host and block driver.

The scsi bus we attach onto can have a larger max xfer len than what is
accepted by the host file system (guarding between the host scsi LUN and
QEMU), in which case the SG_IO we generate would get -EINVAL.

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <1464243305-10661-3-git-send-email-famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-29 09:11:12 +02:00
Paolo Bonzini
8fdc7839e4 scsi-block: always use SG_IO
Using pread/pwrite or io_submit has the advantage of eliminating the
bounce buffer, but drops the SCSI status.  This keeps the guest from
seeing unit attention codes, as well as statuses such as RESERVATION
CONFLICT.  Because we know scsi-block operates on an SBC device we can
still use the DMA helpers with SG_IO; just remember to patch the CDBs
if the transfer is split into multiple segments.

This means that scsi-block will always use the thread-pool unfortunately,
instead of respecting aio=native.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-29 09:11:11 +02:00
Paolo Bonzini
5b956f415a scsi-disk: introduce scsi_disk_req_check_error
Commonize all the checks for canceled requests and errors.  The next patch
will add another case to check for, in order to handle passthrough commands.

There is no semantic change here; the only nontrivial modification is in
scsi_write_do_fua, where cancellation has been checked earlier by both
callers.  Thus, the check is replaced with an assertion.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-29 09:11:11 +02:00
Paolo Bonzini
94f8ba1125 scsi-disk: add need_fua_emulation to SCSIDiskClass
scsi-block will be able to do FUA just by passing the request through
to the LUN (which is also more efficient); there is no need to emulate
it like we do for scsi-disk.

Add a new method to distinguish this.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-29 09:11:11 +02:00
Paolo Bonzini
fcaafb1001 scsi-disk: introduce dma_readv and dma_writev
These are replacements for blk_aio_readv and blk_aio_writev that allow
customization of the data path.  They reuse the DMA helpers' DMAIOFunc
callback type, so that the same function can be used in either the
QEMUSGList or the bounce-buffered case.

This customization will be needed in the next patch to do zero-copy
SG_IO on scsi-block.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-29 09:11:11 +02:00
Paolo Bonzini
993935f315 scsi-disk: introduce a common base class
This will be the place to add DMAIOFuncs in the next patch.  There
are also a couple DeviceClass members that can be moved to the
abstract class's initialization function.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-29 09:11:11 +02:00
Paolo Bonzini
141af038dd bt: rewrite csrhci_write to avoid out-of-bounds writes
The usage of INT_MAX in this function confuses Coverity.  I think
the defect is bogus, however there is no protection against
getting more than sizeof(s->inpkt) bytes from the character device
backend.

Rewrite the function to only fill in as much data as needed from
buf into s->inpkt.  The plen variable is replaced by a simple
state machine and there is no need anymore to shift contents to
the beginning of s->inpkt.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-29 09:11:11 +02:00
Prasad J Pandit
b60bdd1f1e scsi: megasas: check 'read_queue_head' index value
While doing MegaRAID SAS controller command frame lookup, routine
'megasas_lookup_frame' uses 'read_queue_head' value as an index
into 'frames[MEGASAS_MAX_FRAMES=2048]' array. Limit its value
within array bounds to avoid any OOB access.

Reported-by: Li Qiang <liqiang6-s@360.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-Id: <1464179110-18593-1-git-send-email-ppandit@redhat.com>
Reviewed-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-29 09:11:11 +02:00
Prasad J Pandit
d37af74073 scsi: megasas: initialise local configuration data buffer
When reading MegaRAID SAS controller configuration via MegaRAID
Firmware Interface(MFI) commands, routine megasas_dcmd_cfg_read
uses an uninitialised local data buffer. Initialise this buffer
to avoid stack information leakage.

Reported-by: Li Qiang <liqiang6-s@360.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-Id: <1464178304-12831-1-git-send-email-ppandit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-29 09:11:10 +02:00
Prasad J Pandit
1b85898025 scsi: megasas: use appropriate property buffer size
When setting MegaRAID SAS controller properties via MegaRAID
Firmware Interface(MFI) commands, a user supplied size parameter
is used to set property value. Use appropriate size value to avoid
OOB access issues.

Reported-by: Li Qiang <liqiang6-s@360.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-Id: <1464172291-2856-2-git-send-email-ppandit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-29 09:11:10 +02:00
Prasad J Pandit
06630554cc scsi: mptsas: infinite loop while fetching requests
The LSI SAS1068 Host Bus Adapter emulator in Qemu, periodically
looks for requests and fetches them. A loop doing that in
mptsas_fetch_requests() could run infinitely if 's->state' was
not operational. Move check to avoid such a loop.

Reported-by: Li Qiang <liqiang6-s@360.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Cc: qemu-stable@nongnu.org
Message-Id: <1464077264-25473-1-git-send-email-ppandit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-29 09:11:10 +02:00
Prasad J Pandit
3e831b40e0 scsi: pvscsi: check command descriptor ring buffer size (CVE-2016-4952)
Vmware Paravirtual SCSI emulation uses command descriptors to
process SCSI commands. These descriptors come with their ring
buffers. A guest could set the ring buffer size to an arbitrary
value leading to OOB access issue. Add check to avoid it.

Reported-by: Li Qiang <liqiang6-s@360.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Cc: qemu-stable@nongnu.org
Message-Id: <1464000485-27041-1-git-send-email-ppandit@redhat.com>
Reviewed-by: Shmulik Ladkani <shmulik.ladkani@ravellosystems.com>
Reviewed-by: Dmitry Fleytman <dmitry@daynix.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-29 09:11:10 +02:00
xiaoqiang zhao
e269fbe231 hw/char: QOM'ify milkymist-uart.c
drop the qemu_char_get_next_serial and use chardev prop instead

Signed-off-by: xiaoqiang zhao <zxq_yx_007@163.com>
Message-Id: <1464158344-12266-6-git-send-email-zxq_yx_007@163.com>
Tested-by: Michael Walle <michael@walle.cc>
Acked-by: Michael Walle <michael@walle.cc>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-29 09:11:10 +02:00
xiaoqiang zhao
7aaefcaf66 hw/char: QOM'ify lm32_uart.c
* Drop the old SysBus init function and use instance_init
* Call qemu_chr_add_handlers in the realize callback
* Use qdev chardev prop instead of qemu_char_get_next_serial
* Add lm32_uart_create function to create lm32 uart device

Signed-off-by: xiaoqiang zhao <zxq_yx_007@163.com>
Message-Id: <1464158344-12266-5-git-send-email-zxq_yx_007@163.com>
Tested-by: Michael Walle <michael@walle.cc>
Acked-by: Michael Walle <michael@walle.cc>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-29 09:11:10 +02:00
xiaoqiang zhao
c2ddaa62b6 hw/char: QOM'ify lm32_juart.c
* Drop the old SysBus init function
* Call qemu_chr_add_handlers in the realize callback
* Use qdev chardev prop instead of qemu_char_get_next_serial

Signed-off-by: xiaoqiang zhao <zxq_yx_007@163.com>
Message-Id: <1464158344-12266-4-git-send-email-zxq_yx_007@163.com>
Tested-by: Michael Walle <michael@walle.cc>
Acked-by: Michael Walle <michael@walle.cc>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-29 09:11:10 +02:00
xiaoqiang zhao
8290de92b8 hw/char: QOM'ify etraxfs_ser.c
* Drop the old SysBus init function and use instance_init
* Call qemu_chr_add_handlers in the realize callback
* Use qdev chardev prop instead of qemu_char_get_next_serial
* Add etraxfs_ser_create function to create etraxfs serial device

Signed-off-by: xiaoqiang zhao <zxq_yx_007@163.com>
Message-Id: <1464158344-12266-3-git-send-email-zxq_yx_007@163.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-29 09:11:10 +02:00
xiaoqiang zhao
e7c9136977 hw/char: QOM'ify escc.c
* Drop the old SysBus init function and use instance_init
* Call qemu_chr_add_handlers in the realize callback

Signed-off-by: xiaoqiang zhao <zxq_yx_007@163.com>
Message-Id: <1464158344-12266-2-git-send-email-zxq_yx_007@163.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-29 09:11:10 +02:00
Alexey Kardashevskiy
fec5d3a1cd spapr_iommu: Move table allocation to helpers
At the moment presence of vfio-pci devices on a bus affect the way
the guest view table is allocated. If there is no vfio-pci on a PHB
and the host kernel supports KVM acceleration of H_PUT_TCE, a table
is allocated in KVM. However, if there is vfio-pci and we do yet not
KVM acceleration for these, the table has to be allocated by
the userspace. At the moment the table is allocated once at boot time
but next patches will reallocate it.

This moves kvmppc_create_spapr_tce/g_malloc0 and their counterparts
to helpers.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-05-27 09:40:23 +10:00
Alexey Kardashevskiy
eded5bac3b spapr_pci: Use correct DMA LIOBN when composing the device tree
The user could have picked LIOBN via the CLI but the device tree
rendering code would still use the value derived from the PHB index
(which is the default fallback if LIOBN is not set in the CLI).

This replaces SPAPR_PCI_LIOBN() with the actual DMA LIOBN value.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-05-27 09:40:23 +10:00
Jianjun Duan
5dd5238c0b spapr: ensure device trees are always associated with DRC
There are possible racing situations involving hotplug events and
guest migration. For cases where a hotplug event is migrated, or
the guest is in the process of fetching device tree at the time of
migration, we need to ensure the device tree is created and
associated with the corresponding DRC for devices that were
hotplugged on the source, but 'coldplugged' on the target.

Signed-off-by: Jianjun Duan <duanj@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-05-27 09:40:23 +10:00
Zhou Jie
8afc22a20f Added negative check for get_image_size()
This patch adds check for negative return value from get_image_size(),
where it is missing. It avoids unnecessary two function calls.

Signed-off-by: Zhou Jie <zhoujie2011@cn.fujitsu.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-05-27 09:40:23 +10:00
Thomas Huth
5c29dd8c28 hw/net/spapr_llan: Provide counter with dropped rx frames to the guest
The last 8 bytes of the receive buffer list page (that has been supplied
by the guest with the H_REGISTER_LOGICAL_LAN call) contain a counter
for frames that have been dropped because there was no suitable receive
buffer available. This patch introduces code to use this field to
provide the information about dropped rx packets to the guest.
There it can be queried with "ethtool -S eth0 | grep rx_no_buffer".

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-05-27 09:40:23 +10:00
Thomas Huth
8836630f5d hw/net/spapr_llan: Delay flushing of the RX queue while adding new RX buffers
Currently, the spapr-vlan device is trying to flush the RX queue
after each RX buffer that has been added by the guest via the
H_ADD_LOGICAL_LAN_BUFFER hypercall. In case the receive buffer pool
was empty before, we only pass single packets to the guest this
way. This can cause very bad performance if a sender is trying
to stream fragmented UDP packets to the guest. For example when
using the UDP_STREAM test from netperf with UDP packets that are
much bigger than the MTU size, almost all UDP packets are dropped
in the guest since the chances are quite high that at least one of
the fragments got lost on the way.

When flushing the receive queue, it's much better if we'd have
a bunch of receive buffers available already, so that fragmented
packets can be passed to the guest in one go. To do this, the
spapr_vlan_receive() function should return 0 instead of -1 if there
are no more receive buffers available, so that receive_disabled = 1
gets temporarily set for the receive queue, and we have to delay
the queue flushing at the end of h_add_logical_lan_buffer() a little
bit by using a timer, so that the guest gets a chance to add multiple
RX buffers before we flush the queue again.

This improves the UDP_STREAM test with the spapr-vlan device a lot:
Running
 netserver -p 44444 -L <guestip> -f -D -4
in the guest, and
 netperf -p 44444 -L <hostip> -H <guestip> -t UDP_STREAM -l 60 -- -m 16384
in the host, I get the following values _without_ this patch:

Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

229376   16384   60.00     1738970      0    3798.83
229376           60.00          23              0.05

That "0.05" means that almost all UDP packets got lost/discarded
at the receiving side.
With this patch applied, the value look much better:

Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

229376   16384   60.00     1789104      0    3908.35
229376           60.00       22818             49.85

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-05-27 09:40:22 +10:00
Peter Maydell
84cfc756d1 VFIO updates 2016-05-26
- Infrastructure and quirks to support IGD assignment (Alex Williamson)
  - Fixes to 128bit handling, IOMMU replay, IOMMU translation sanity
    checking (Alexey Kardashevskiy)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJXRzdlAAoJECObm247sIsiPcoP/Ay3xXIkMgef1cWpxeVn+MV6
 02/fEWOHf54LMMEkJJZujFh7n9s6fDJXaNOsdHxPyxbGf+enJyEFGWG8THne3N3G
 jsKEjjgpP70f19oLEOMfnK2n4vRPXmkFgSMQJcCjwoeHcwjzfbczixkwh1Y0Qxwo
 QD1pqp/yXefyP8TaanzTHqv/ED4McqBAJpoTZ8CvcDeHNhTatoCQj8kNhgY3sIWm
 25f2GNFrpHYpRulgcxvBt5CaiCfUvz9qU2z/Y0XaPfRQ5KvDM8wNQXj5TITrotkF
 jVqXRKs+vMlPF/+HeBZNjVGS4dJAJwqukWRDgcQ2mfNlbuPUyDEL8HaD/shS8QjT
 agL60D9duScTfHU1wqyWEnUiBT/IbJG9oHWyYuVSRvW13muAlvLx4gKONss8WL7+
 RqmSNqhEyZHqPRkoqhiCdVn/+quR56rz7uS0mlazMFVZ7ABA19W7BLWxLIylkLQN
 xrvVWlBtM+Fun9UIcxJnGX7o16RzMqczeRgtBgblWtHStnKAzBC+mjwhqZ+vOXt8
 S8YXn4g/+FWjO4RRvHKuROFALFpwT3BfqjzZgZcNf6ve4s7n2Dn9d81UbO61TLgw
 /DZS62C95WLQhA6nBJksxwrpkQ6JHA5HikzwBRE9iQgyEGlAR51xqOM71h1Wxp8p
 qzaiYlMQx1btYdc/A+xU
 =in4A
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/awilliam/tags/vfio-update-20160526.1' into staging

VFIO updates 2016-05-26

 - Infrastructure and quirks to support IGD assignment (Alex Williamson)
 - Fixes to 128bit handling, IOMMU replay, IOMMU translation sanity
   checking (Alexey Kardashevskiy)

# gpg: Signature made Thu 26 May 2016 18:50:29 BST using RSA key ID 3BB08B22
# gpg: Good signature from "Alex Williamson <alex.williamson@redhat.com>"
# gpg:                 aka "Alex Williamson <alex@shazbot.org>"
# gpg:                 aka "Alex Williamson <alwillia@redhat.com>"
# gpg:                 aka "Alex Williamson <alex.l.williamson@gmail.com>"

* remotes/awilliam/tags/vfio-update-20160526.1:
  vfio: Check that IOMMU MR translates to system address space
  memory: Fix IOMMU replay base address
  vfio: Fix 128 bit handling when deleting region
  vfio/pci: Add IGD documentation
  vfio/pci: Add a separate option for IGD OpRegion support
  vfio/pci: Intel graphics legacy mode assignment
  vfio/pci: Setup BAR quirks after capabilities probing
  vfio/pci: Consolidate VGA setup
  vfio/pci: Fix return of vfio_populate_vga()
  vfio: Create device specific region info helper
  vfio: Enable sparse mmap capability

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-26 19:18:08 +01:00
Alexey Kardashevskiy
f1f9365019 vfio: Check that IOMMU MR translates to system address space
At the moment IOMMU MR only translate to the system memory.
However if some new code changes this, we will need clear indication why
it is not working so here is the check.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-05-26 11:12:09 -06:00