The SRIOV series exposed an issued with how CC register writes are
handled and how CSTS is set in response to that. Specifically, after
applying the SRIOV series, the controller could end up in a state with
CC.EN set to '1' but with CSTS.RDY cleared to '0', causing drivers to
expect CSTS.RDY to transition to '1' but timing out.
Clean this up.
Reviewed-by: Łukasz Gieryk <lukasz.gieryk@linux.intel.com>
Reviewed-by: Lukasz Maniak <lukasz.maniak@linux.intel.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
PCI device capable of SR-IOV support is a new, still-experimental
feature with only a single working example of the Nvme device.
This patch in an attempt to fix a double-free problem when a
SR-IOV-capable Nvme device is hot-unplugged in the following scenario:
Qemu CLI:
---------
-device pcie-root-port,slot=0,id=rp0
-device nvme-subsys,id=subsys0
-device nvme,id=nvme0,bus=rp0,serial=deadbeef,subsys=subsys0,sriov_max_vfs=1,sriov_vq_flexible=2,sriov_vi_flexible=1
Guest OS:
---------
sudo nvme virt-mgmt /dev/nvme0 -c 0 -r 1 -a 1 -n 0
sudo nvme virt-mgmt /dev/nvme0 -c 0 -r 0 -a 1 -n 0
echo 1 > /sys/bus/pci/devices/0000:01:00.0/reset
sleep 1
echo 1 > /sys/bus/pci/devices/0000:01:00.0/sriov_numvfs
nvme virt-mgmt /dev/nvme0 -c 1 -r 1 -a 8 -n 1
nvme virt-mgmt /dev/nvme0 -c 1 -r 0 -a 8 -n 2
nvme virt-mgmt /dev/nvme0 -c 1 -r 0 -a 9 -n 0
sleep 2
echo 01:00.1 > /sys/bus/pci/drivers/nvme/bind
Qemu monitor:
-------------
device_del nvme0
Explanation of the problem and the proposed solution:
1) The current SR-IOV implementation assumes it’s the PhysicalFunction
that creates and deletes VirtualFunctions.
2) It’s a design decision (the Nvme device at least) for the VFs to be
of the same class as PF. Effectively, they share the dc->hotpluggable
value.
3) When a VF is created, it’s added as a child node to PF’s PCI bus
slot.
4) Monitor/device_del triggers the ACPI mechanism. The implementation is
not aware of SR/IOV and ejects PF’s PCI slot, directly unrealizing all
hot-pluggable (!acpi_pcihp_pc_no_hotplug) children nodes.
5) VFs are unrealized directly, and it doesn’t work well with (1).
SR/IOV structures are not updated, so when it’s PF’s turn to be
unrealized, it works on stale pointers to already-deleted VFs.
The proposed fix is to make the PCI ACPI code aware of SR/IOV.
Signed-off-by: Łukasz Gieryk <lukasz.gieryk@linux.intel.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
This patch updates the initialization place for the AER queue, so it’s
initialized once, at controller initialization, and not every time
controller is enabled.
While the original version works for a non-SR-IOV device, as it’s hard
to interact with the controller if it’s not enabled, the multiple
reinitialization is not necessarily correct.
With the SR/IOV feature enabled a segfault can happen: a VF can have its
controller disabled, while a namespace can still be attached to the
controller through the parent PF. An event generated in such case ends
up on an uninitialized queue.
While it’s an interesting question whether a VF should support AER in
the first place, I don’t think it must be answered today.
Signed-off-by: Łukasz Gieryk <lukasz.gieryk@linux.intel.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
With the new command one can:
- assign flexible resources (queues, interrupts) to primary and
secondary controllers,
- toggle the online/offline state of given controller.
Signed-off-by: Łukasz Gieryk <lukasz.gieryk@linux.intel.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
With four new properties:
- sriov_v{i,q}_flexible,
- sriov_max_v{i,q}_per_vf,
one can configure the number of available flexible resources, as well as
the limits. The primary and secondary controller capability structures
are initialized accordingly.
Since the number of available queues (interrupts) now varies between
VF/PF, BAR size calculation is also adjusted.
Signed-off-by: Łukasz Gieryk <lukasz.gieryk@linux.intel.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
An NVMe device with SR-IOV capability calculates the BAR size
differently for PF and VF, so it makes sense to extract the common code
to a separate function.
Signed-off-by: Łukasz Gieryk <lukasz.gieryk@linux.intel.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
The n->reg_size parameter unnecessarily splits the BAR0 size calculation
in two phases; removed to simplify the code.
With all the calculations done in one place, it seems the pow2ceil,
applied originally to reg_size, is unnecessary. The rounding should
happen as the last step, when BAR size includes Nvme registers, queue
registers, and MSIX-related space.
Finally, the size of the mmio memory region is extended to cover the 1st
4KiB padding (see the map below). Access to this range is handled as
interaction with a non-existing queue and generates an error trace, so
actually nothing changes, while the reg_size variable is no longer needed.
--------------------
| BAR0 |
--------------------
[Nvme Registers ]
[Queues ]
[power-of-2 padding] - removed in this patch
[4KiB padding (1) ]
[MSIX TABLE ]
[4KiB padding (2) ]
[MSIX PBA ]
[power-of-2 padding]
Signed-off-by: Łukasz Gieryk <lukasz.gieryk@linux.intel.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
The NVMe device defines two properties: max_ioqpairs, msix_qsize. Having
them as constants is problematic for SR-IOV support.
SR-IOV introduces virtual resources (queues, interrupts) that can be
assigned to PF and its dependent VFs. Each device, following a reset,
should work with the configured number of queues. A single constant is
no longer sufficient to hold the whole state.
This patch tries to solve the problem by introducing additional
variables in NvmeCtrl’s state. The variables for, e.g., managing queues
are therefore organized as:
- n->params.max_ioqpairs – no changes, constant set by the user
- n->(mutable_state) – (not a part of this patch) user-configurable,
specifies number of queues available _after_
reset
- n->conf_ioqpairs - (new) used in all the places instead of the ‘old’
n->params.max_ioqpairs; initialized in realize()
and updated during reset() to reflect user’s
changes to the mutable state
Since the number of available i/o queues and interrupts can change in
runtime, buffers for sq/cqs and the MSIX-related structures are
allocated big enough to handle the limits, to completely avoid the
complicated reallocation. A helper function (nvme_update_msixcap_ts)
updates the corresponding capability register, to signal configuration
changes.
Signed-off-by: Łukasz Gieryk <lukasz.gieryk@linux.intel.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
This patch implements the Function Level Reset, a feature currently not
implemented for the Nvme device, while listed as a mandatory ("shall")
in the 1.4 spec.
The implementation reuses FLR-related building blocks defined for the
pci-bridge module, and follows the same logic:
- FLR capability is advertised in the PCIE config,
- custom pci_write_config callback detects a write to the trigger
register and performs the PCI reset,
- which, eventually, calls the custom dc->reset handler.
Depending on reset type, parts of the state should (or should not) be
cleared. To distinguish the type of reset, an additional parameter is
passed to the reset function.
This patch also enables advertisement of the Power Management PCI
capability. The main reason behind it is to announce the no_soft_reset=1
bit, to signal SR-IOV support where each VF can be reset individually.
The implementation purposedly ignores writes to the PMCS.PS register,
as even such naïve behavior is enough to correctly handle the D3->D0
transition.
It’s worth to note, that the power state transition back to to D3, with
all the corresponding side effects, wasn't and stil isn't handled
properly.
Signed-off-by: Łukasz Gieryk <lukasz.gieryk@linux.intel.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Introduce handling for Secondary Controller List (Identify command with
CNS value of 15h).
Secondary controller ids are unique in the subsystem, hence they are
reserved by it upon initialization of the primary controller to the
number of sriov_max_vfs.
ID reservation requires the addition of an intermediate controller slot
state, so the reserved controller has the address 0xFFFF.
A secondary controller is in the reserved state when it has no virtual
function assigned, but its primary controller is realized.
Secondary controller reservations are released to NULL when its primary
controller is unregistered.
Signed-off-by: Lukasz Maniak <lukasz.maniak@linux.intel.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Implementation of Primary Controller Capabilities data
structure (Identify command with CNS value of 14h).
Currently, the command returns only ID of a primary controller.
Handling of remaining fields are added in subsequent patches
implementing virtualization enhancements.
Signed-off-by: Lukasz Maniak <lukasz.maniak@linux.intel.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
This patch implements initial support for Single Root I/O Virtualization
on an NVMe device.
Essentially, it allows to define the maximum number of virtual functions
supported by the NVMe controller via sriov_max_vfs parameter.
Passing a non-zero value to sriov_max_vfs triggers reporting of SR-IOV
capability by a physical controller and ARI capability by both the
physical and virtual function devices.
NVMe controllers created via virtual functions mirror functionally
the physical controller, which may not entirely be the case, thus
consideration would be needed on the way to limit the capabilities of
the VF.
NVMe subsystem is required for the use of SR-IOV.
Signed-off-by: Lukasz Maniak <lukasz.maniak@linux.intel.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
There is no 'slave match interrupt' enable bit in the Interrupt
Control Register. Consider it is always enabled and extend the mask
value 'bus->regs[intr_ctrl_reg]' with the SLAVE_ADDR_RX_MATCH bit when
the interrupt is raised.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Add support for writing and reading the device address register in old
register mode.
On the AST2400 (only 1 slave address)
* no upper bits
On the AST2500 (2 possible slave addresses),
* bit[31] : Slave Address match indicator
* bit[30] : Slave Address Receiving pending
On the AST2600 (3 possible slave addresses),
* bit[31-30] : Slave Address match indicator
* bit[29] : Slave Address Receiving pending
The model could be more precise to take into account all fields but
since the Linux driver is masking the register value being set, it
should be fine. See commit 3fb2e2aeafb2 ("i2c: aspeed: disable
additional device addresses on ast2[56]xx") from Zeiv. This can be
addressed later.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
[ clg: add details to commit log ]
Message-Id: <20220601210831.67259-3-its@irrelevant.dk>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Build a single string instead of having several parameters on the trace
event.
Suggested-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
[ clg: simplified trace buffer creation ]
Message-Id: <20220601210831.67259-2-its@irrelevant.dk>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Instantiate the I2C buses in AST1030 model and create two slave device
for ast1030-evb.
Signed-off-by: Troy Lee <troy_lee@aspeedtech.com>
Signed-off-by: Jamin Lin <jamin_lin@aspeedtech.com>
Signed-off-by: Steven Lee <steven_lee@aspeedtech.com>
Reviewed-by: Joel Stanley <joel@jms.id.au>
[ clg : - adapted to current AST1030 upstream models
- changed AST2600 to AST1030 in comment
- fixed typo in commit log ]
Message-Id: <20220324100439.478317-3-troy_lee@aspeedtech.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Moves register definitions and short commonly used inlined functiosn to
the header file to help tidy up the implementation file.
Signed-off-by: Joe Komlodi <komlodi@google.com>
Change-Id: I34dff7485b6bbe3c9482715ccd94dbd65dc5f324
Message-Id: <20220331043248.2237838-8-komlodi@google.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
On AST2600, I2C has a secondary mode, called "new mode", which changes
the layout of registers, adds some minor behavior changes, and
introduces a new way to transfer data called "packet mode".
Most of the bit positions of the fields are the same between old and new
mode, so we use SHARED_FIELD_XX macros to reuse most of the code between
the different modes.
For packet mode, most of the command behavior is the same compared to
other modes, but there are some minor changes to how interrupts are
handled compared to other modes.
Signed-off-by: Joe Komlodi <komlodi@google.com>
Change-Id: I072f8301964f623afc74af1fe50c12e5caef199e
Message-Id: <20220331043248.2237838-6-komlodi@google.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Using a register array will allow us to represent old-mode and new-mode
I2C registers by using the same underlying register array, instead of
adding an entire new set of variables to represent new mode.
As part of this, we also do additional cleanup to use ARRAY_FIELD_
macros instead of FIELD_ macros on registers.
Signed-off-by: Joe Komlodi <komlodi@google.com>
Change-Id: Ib94996b17c361b8490c042b43c99d8abc69332e3
[ clg: use of memset in aspeed_i2c_bus_reset() ]
Message-Id: <20220331043248.2237838-5-komlodi@google.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
This cleans up some of the field accessing, setting, and clearing
bitwise operations, and wraps them in macros instead.
Signed-off-by: Joe Komlodi <komlodi@google.com>
Change-Id: I33018d6325fa04376e7c29dc4a49ab389a8e333a
Message-Id: <20220331043248.2237838-4-komlodi@google.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
The board has no such device. It might have been useful for some tests
in the past, it's not anymore and the same can be achieved on the
command line.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
The DEFINE_PROP* macros in pnv files are using extra spaces for no good
reason.
Cc: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-Id: <20220602215351.149910-1-danielhb413@gmail.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
When accessing a thread context through the IC BAR, the offset of the
page in the BAR identifies the CPU. From that offset, we can compute
the PIR (processor ID register) of the CPU to do the data structure
lookup. On P10, the current code assumes an access for node 0 when
computing the PIR. Everything is almost in place to allow access for
other nodes though. So this patch reworks how the PIR value is
computed so that we can access all thread contexts through the IC BAR.
The PIR is already correct on P9, so no need to modify anything there.
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20220602165310.558810-1-fbarrat@linux.ibm.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Recent changes to pcie_host corrected size of its internal region to
match what it expects: only the low 28 bits are ever decoded. Previous
code just ignored bit 29 (if size was 1 << 29) in the address which does
not make much sense. We are now asserting on size > 1 << 28 instead,
but PPC 4xx actually allows guest to configure different sizes, and some
firmwares seem to set it to 1 << 29.
This caused e.g. qemu-system-ppc -M sam460ex to exit with an assert when
the guest writes a value to CFGMSK register when trying to map config
space. This is done in the board firmware in ppc4xx_init_pcie_port() in
roms/u-boot-sam460ex/arch/powerpc/cpu/ppc4xx/4xx_pcie.c
It's not clear what the proper fix should be but for now let's force the
size to 256MB, so anything outside the expected address range is
ignored.
Fixes: commit 1f1a7b2269 ("include/hw/pci/pcie_host: Correct PCIE_MMCFG_SIZE_MAX")
Reviewed-by: BALATON Zoltan <balaton@eik.bme.hu>
Tested-by: BALATON Zoltan <balaton@eik.bme.hu>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20220526224229.95183-1-mst@redhat.com>
[danielhb: changed commit msg as BALATON Zoltan suggested]
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
more CXL patches
RSA support for crypto
fixes, cleanups all over the place
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmKrYLMPHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRpwpwH/2IS+V7wS3q/XXPz1HndJLpUP/z+mkeu9W6+
X1U9CJ+66Ag4eD5T/jzoN0JEjiTeET/3xM+PY5NYZCh6QTAmA7EfFZv99oNWpGd1
+nyxOdaMDPSscOKjLfDziVTi/QYIZBtU6TeixL9whkipYCqmgbs5gXV8ynltmKyF
bIJVeaXm5yQLcCTGzKzdXf+HmTErpEGDCDHFjzrLVjICRDdekElGVwYTn+ycl7p7
oLsWWVDgqo0p86BITlrHUXUrxTXF3wyg2B59cT7Ilbb3o+Fa2GsP+o9IXMuVoNNp
A+zrq1QZ49UO3XwkS03xDDioUQ1T/V0L4w9dEfaGvpY4Horv0HI=
=PvmT
-----END PGP SIGNATURE-----
Merge tag 'for_upstream' of git://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging
virtio,pc,pci: fixes,cleanups,features
more CXL patches
RSA support for crypto
fixes, cleanups all over the place
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# -----BEGIN PGP SIGNATURE-----
#
# iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmKrYLMPHG1zdEByZWRo
# YXQuY29tAAoJECgfDbjSjVRpwpwH/2IS+V7wS3q/XXPz1HndJLpUP/z+mkeu9W6+
# X1U9CJ+66Ag4eD5T/jzoN0JEjiTeET/3xM+PY5NYZCh6QTAmA7EfFZv99oNWpGd1
# +nyxOdaMDPSscOKjLfDziVTi/QYIZBtU6TeixL9whkipYCqmgbs5gXV8ynltmKyF
# bIJVeaXm5yQLcCTGzKzdXf+HmTErpEGDCDHFjzrLVjICRDdekElGVwYTn+ycl7p7
# oLsWWVDgqo0p86BITlrHUXUrxTXF3wyg2B59cT7Ilbb3o+Fa2GsP+o9IXMuVoNNp
# A+zrq1QZ49UO3XwkS03xDDioUQ1T/V0L4w9dEfaGvpY4Horv0HI=
# =PvmT
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 16 Jun 2022 09:56:19 AM PDT
# gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg: issuer "mst@redhat.com"
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [undefined]
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [undefined]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67
# Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469
* tag 'for_upstream' of git://git.kernel.org/pub/scm/virt/kvm/mst/qemu:
acpi/erst: fix fallthrough code upon validation failure
vhost: also check queue state in the vhost_dev_set_log error routine
crypto: Introduce RSA algorithm
virtio-iommu: Add an assert check in translate routine
virtio-iommu: Use recursive lock to avoid deadlock
virtio-iommu: Add bypass mode support to assigned device
virtio/vhost-user: Fix wrong vhost notifier GPtrArray size
docs/cxl: Add switch documentation
pci-bridge/cxl_downstream: Add a CXL switch downstream port
pci-bridge/cxl_upstream: Add a CXL switch upstream port
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
At any step when any validation fail in check_erst_backend_storage(), there is
no need to continue further through other validation checks. Further, by
continuing even when record_size is 0, we run the risk of triggering a divide
by zero error if we continued with other validation checks. Hence, we should
simply return from this function upon validation failure.
CC: Peter Maydell <peter.maydell@linaro.org>
CC: Eric DeVolder <eric.devolder@oracle.com>
Signed-off-by: Ani Sinha <ani@anisinha.ca>
Message-Id: <20220513141005.1929422-1-ani@anisinha.ca>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eric DeVolder <eric.devolder@oracle.com>
When check queue state in the vhost_dev_set_log routine, it miss the error
routine check, this patch also check queue state in error case.
Fixes: 1e5a050f57 ("check queue state in the vhost_dev_set_log routine")
Signed-off-by: Ni Xun <richardni@tencent.com>
Reviewed-by: Zhigang Lu <tonnylu@tencent.com>
Message-Id: <OS0PR01MB57139163F3F3955960675B52EAA79@OS0PR01MB5713.jpnprd01.prod.outlook.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
There are two parts in this patch:
1, support akcipher service by cryptodev-builtin driver
2, virtio-crypto driver supports akcipher service
In principle, we should separate this into two patches, to avoid
compiling error, merge them into one.
Then virtio-crypto gets request from guest side, and forwards the
request to builtin driver to handle it.
Test with a guest linux:
1, The self-test framework of crypto layer works fine in guest kernel
2, Test with Linux guest(with asym support), the following script
test(note that pkey_XXX is supported only in a newer version of keyutils):
- both public key & private key
- create/close session
- encrypt/decrypt/sign/verify basic driver operation
- also test with kernel crypto layer(pkey add/query)
All the cases work fine.
Run script in guest:
rm -rf *.der *.pem *.pfx
modprobe pkcs8_key_parser # if CONFIG_PKCS8_PRIVATE_KEY_PARSER=m
rm -rf /tmp/data
dd if=/dev/random of=/tmp/data count=1 bs=20
openssl req -nodes -x509 -newkey rsa:2048 -keyout key.pem -out cert.pem -subj "/C=CN/ST=BJ/L=HD/O=qemu/OU=dev/CN=qemu/emailAddress=qemu@qemu.org"
openssl pkcs8 -in key.pem -topk8 -nocrypt -outform DER -out key.der
openssl x509 -in cert.pem -inform PEM -outform DER -out cert.der
PRIV_KEY_ID=`cat key.der | keyctl padd asymmetric test_priv_key @s`
echo "priv key id = "$PRIV_KEY_ID
PUB_KEY_ID=`cat cert.der | keyctl padd asymmetric test_pub_key @s`
echo "pub key id = "$PUB_KEY_ID
keyctl pkey_query $PRIV_KEY_ID 0
keyctl pkey_query $PUB_KEY_ID 0
echo "Enc with priv key..."
keyctl pkey_encrypt $PRIV_KEY_ID 0 /tmp/data enc=pkcs1 >/tmp/enc.priv
echo "Dec with pub key..."
keyctl pkey_decrypt $PRIV_KEY_ID 0 /tmp/enc.priv enc=pkcs1 >/tmp/dec
cmp /tmp/data /tmp/dec
echo "Sign with priv key..."
keyctl pkey_sign $PRIV_KEY_ID 0 /tmp/data enc=pkcs1 hash=sha1 > /tmp/sig
echo "Verify with pub key..."
keyctl pkey_verify $PRIV_KEY_ID 0 /tmp/data /tmp/sig enc=pkcs1 hash=sha1
echo "Enc with pub key..."
keyctl pkey_encrypt $PUB_KEY_ID 0 /tmp/data enc=pkcs1 >/tmp/enc.pub
echo "Dec with priv key..."
keyctl pkey_decrypt $PRIV_KEY_ID 0 /tmp/enc.pub enc=pkcs1 >/tmp/dec
cmp /tmp/data /tmp/dec
echo "Verify with pub key..."
keyctl pkey_verify $PUB_KEY_ID 0 /tmp/data /tmp/sig enc=pkcs1 hash=sha1
Reviewed-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: lei he <helei.sig11@bytedance.com
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
Message-Id: <20220611064243.24535-2-pizhenwei@bytedance.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
With address space switch supported, dma access translation only
happen after endpoint is attached to a non-bypass domain.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Message-Id: <20220613061010.2674054-4-zhenzhong.duan@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
When switching address space with mutex lock hold, mapping will be
replayed for assigned device. This will trigger relock deadlock.
Also release the mutex resource in unrealize routine.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Message-Id: <20220613061010.2674054-3-zhenzhong.duan@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Currently assigned devices can not work in virtio-iommu bypass mode.
Guest driver fails to probe the device due to DMA failure. And the
reason is because of lacking GPA -> HPA mappings when VM is created.
Add a root container memory region to hold both bypass memory region
and iommu memory region, so the switch between them is supported
just like the implementation in virtual VT-d.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Message-Id: <20220613061010.2674054-2-zhenzhong.duan@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
In fetch_or_create_notifier, idx begins with 0. So the GPtrArray size
should be idx + 1 and g_ptr_array_set_size should be called with idx + 1.
This wrong GPtrArray size causes fetch_or_create_notifier return an invalid
address. Passing this invalid pointer to vhost_user_host_notifier_remove
causes assert fail:
qemu/include/qemu/int128.h:27: int128_get64: Assertion `r == a' failed.
shutting down, reason=crashed
Backends like dpdk-vdpa which sends out vhost notifier requests almost always
hit qemu crash.
Fixes: 503e355465 ("virtio/vhost-user: dynamically assign VhostUserHostNotifiers")
Signed-off-by: Yajun Wu <yajunw@nvidia.com>
Acked-by: Parav Pandit <parav@nvidia.com>
Change-Id: I87e0f7591ca9a59d210879b260704a2d9e9d6bcd
Message-Id: <20220526034851.683258-1-yajunw@nvidia.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Emulation of a simple CXL Switch downstream port.
The Device ID has been allocated for this use.
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20220616145126.8002-3-Jonathan.Cameron@huawei.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
An initial simple upstream port emulation to allow the creation
of CXL switches. The Device ID has been allocated for this use.
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20220616145126.8002-2-Jonathan.Cameron@huawei.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Actual fix is patch 5, whereas patch 4 being preparatory, all other
patches are test cases to guard this Twalk issue.
-----BEGIN PGP SIGNATURE-----
iQJLBAABCgA1FiEEltjREM96+AhPiFkBNMK1h2Wkc5UFAmKrDSAXHHFlbXVfb3Nz
QGNydWRlYnl0ZS5jb20ACgkQNMK1h2Wkc5VgoQ//bA/lXYa6hds4f73+opq7iiJ/
88gnJO8uPctNWXJ5f6ufXcTFtC99QRcl97jgSQhSIUdaZCfcpg7Pq3fONc060cMt
MNxi5Da31Fq7xz4UhSQHgWlgAfomfClYoBSOtrrxjVbXChA2rB7FXhD9aewimUtt
TlolXdJuPbGR3F6H0glN1itij12Ay5c0DMqFPy5npYlzjNhxmPb8QgFZ8E+lxhcT
hG+OMmS9O5Mk7WKYWC1Iij7tWm45RbThPEUsfCPt6jIJYQqheOQs0ohJG9wyCZu3
JUCgSBPG1nNY0hgBJ/X7un7e89BoRw8edwqP+sSigfDf+LquUggqRFgz+joTbfvj
Prq+1NTDIckDRZF6CDUSkZE3+Gq3qlIhw/2vS+bjYZrk04lP4x8d9JYPSkT3i8xc
+YT/apDUkT68FjJ6PudfS2j6xRtYt86nOuWuhYukTZ2z5FJ0c9XAJlJX2ZS9Az3n
AqKFCT+8UE4VYKnAJ61xDdqvAdEmKJUi5YutfuwH+j6sS4peLX0gg8mGlNi7y8JK
bsqNjE1ve8rkp24DuUoHmivs/m1ogJi9Jxp5IjB4d26MPhgojrxOpaYUVg98QS7d
os2ES47CSn4KFxqsFMZnZpgzKxIvRQ4C9bBbSClDOffHWHRJub6PCw5F9eCTH4dO
z/QPJ+smDY7bolF+gSg=
=3ejn
-----END PGP SIGNATURE-----
Merge tag 'pull-9p-20220616' of https://github.com/cschoenebeck/qemu into staging
9pfs: fix 'Twalk' protocol violation
Actual fix is patch 5, whereas patch 4 being preparatory, all other
patches are test cases to guard this Twalk issue.
# -----BEGIN PGP SIGNATURE-----
#
# iQJLBAABCgA1FiEEltjREM96+AhPiFkBNMK1h2Wkc5UFAmKrDSAXHHFlbXVfb3Nz
# QGNydWRlYnl0ZS5jb20ACgkQNMK1h2Wkc5VgoQ//bA/lXYa6hds4f73+opq7iiJ/
# 88gnJO8uPctNWXJ5f6ufXcTFtC99QRcl97jgSQhSIUdaZCfcpg7Pq3fONc060cMt
# MNxi5Da31Fq7xz4UhSQHgWlgAfomfClYoBSOtrrxjVbXChA2rB7FXhD9aewimUtt
# TlolXdJuPbGR3F6H0glN1itij12Ay5c0DMqFPy5npYlzjNhxmPb8QgFZ8E+lxhcT
# hG+OMmS9O5Mk7WKYWC1Iij7tWm45RbThPEUsfCPt6jIJYQqheOQs0ohJG9wyCZu3
# JUCgSBPG1nNY0hgBJ/X7un7e89BoRw8edwqP+sSigfDf+LquUggqRFgz+joTbfvj
# Prq+1NTDIckDRZF6CDUSkZE3+Gq3qlIhw/2vS+bjYZrk04lP4x8d9JYPSkT3i8xc
# +YT/apDUkT68FjJ6PudfS2j6xRtYt86nOuWuhYukTZ2z5FJ0c9XAJlJX2ZS9Az3n
# AqKFCT+8UE4VYKnAJ61xDdqvAdEmKJUi5YutfuwH+j6sS4peLX0gg8mGlNi7y8JK
# bsqNjE1ve8rkp24DuUoHmivs/m1ogJi9Jxp5IjB4d26MPhgojrxOpaYUVg98QS7d
# os2ES47CSn4KFxqsFMZnZpgzKxIvRQ4C9bBbSClDOffHWHRJub6PCw5F9eCTH4dO
# z/QPJ+smDY7bolF+gSg=
# =3ejn
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 16 Jun 2022 03:59:44 AM PDT
# gpg: using RSA key 96D8D110CF7AF8084F88590134C2B58765A47395
# gpg: issuer "qemu_oss@crudebyte.com"
# gpg: Good signature from "Christian Schoenebeck <qemu_oss@crudebyte.com>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: ECAB 1A45 4014 1413 BA38 4926 30DB 47C3 A012 D5F4
# Subkey fingerprint: 96D8 D110 CF7A F808 4F88 5901 34C2 B587 65A4 7395
* tag 'pull-9p-20220616' of https://github.com/cschoenebeck/qemu:
tests/9pfs: check fid being unaffected in fs_walk_2nd_nonexistent
tests/9pfs: guard recent 'Twalk' behaviour fix
9pfs: fix 'Twalk' to only send error if no component walked
9pfs: refactor 'name_idx' -> 'nwalked' in v9fs_walk()
tests/9pfs: compare QIDs in fs_walk_none() test
tests/9pfs: Twalk with nwname=0
tests/9pfs: walk to non-existent dir
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* virtio reset cleanups
* build system cleanups
* fix Cirrus CI
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmKpooQUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroNlFwf+OugLGRZl3KVc7akQwUJe9gg2T31h
VkC+7Tei8FAwe8vDppVd+CYEIi0M3acxD2amRrv2etCCGSuySN1PbkfRcSfPBX01
pRWpasdhfqnZR8Iidi7YW1Ou5CcGqKH49nunBhW10+osb/mu5sVscMuOJgTDj/lK
CpsmDyk6572yGmczjNLlmhYcTU36clHpAZgazZHwk1PU+B3fCKlYYyvUpT3ItJvd
cK92aIUWrfofl3yTy0k4IwvZwNjTBirlstOIomZ333xzSA+mm5TR+mTvGRTZ69+a
v+snpMp4ILDMoB5kxQ42kK5WpdiN//LnriA9CBFDtOidsDDn8kx7gJe2RA==
=Dxwa
-----END PGP SIGNATURE-----
Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging
* statistics subsystem
* virtio reset cleanups
* build system cleanups
* fix Cirrus CI
# -----BEGIN PGP SIGNATURE-----
#
# iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmKpooQUHHBib256aW5p
# QHJlZGhhdC5jb20ACgkQv/vSX3jHroNlFwf+OugLGRZl3KVc7akQwUJe9gg2T31h
# VkC+7Tei8FAwe8vDppVd+CYEIi0M3acxD2amRrv2etCCGSuySN1PbkfRcSfPBX01
# pRWpasdhfqnZR8Iidi7YW1Ou5CcGqKH49nunBhW10+osb/mu5sVscMuOJgTDj/lK
# CpsmDyk6572yGmczjNLlmhYcTU36clHpAZgazZHwk1PU+B3fCKlYYyvUpT3ItJvd
# cK92aIUWrfofl3yTy0k4IwvZwNjTBirlstOIomZ333xzSA+mm5TR+mTvGRTZ69+a
# v+snpMp4ILDMoB5kxQ42kK5WpdiN//LnriA9CBFDtOidsDDn8kx7gJe2RA==
# =Dxwa
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 15 Jun 2022 02:12:36 AM PDT
# gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg: issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [undefined]
# gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [undefined]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1
# Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83
* tag 'for-upstream' of https://gitlab.com/bonzini/qemu: (21 commits)
build: include pc-bios/ part in the ROMS variable
meson: put cross compiler info in a separate section
q35:Enable TSEG only when G_SMRAME and TSEG_EN both enabled
build: fix check for -fsanitize-coverage-allowlist
tests/vm: allow running tests in an unconfigured source tree
configure: cleanup -fno-pie detection
configure: update list of preserved environment variables
virtio-mmio: cleanup reset
virtio: stop ioeventfd on reset
virtio-mmio: stop ioeventfd on legacy reset
s390x: simplify virtio_ccw_reset_virtio
block: add more commands to preconfig mode
hmp: add filtering of statistics by name
qmp: add filtering of statistics by name
hmp: add filtering of statistics by provider
qmp: add filtering of statistics by provider
hmp: add basic "info stats" implementation
cutils: add functions for IEC and SI prefixes
qmp: add filtering of statistics by target vCPU
kvm: Support for querying fd-based stats
...
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Current implementation of 'Twalk' request handling always sends an 'Rerror'
response if any error occured. The 9p2000 protocol spec says though:
"
If the first element cannot be walked for any reason, Rerror is returned.
Otherwise, the walk will return an Rwalk message containing nwqid qids
corresponding, in order, to the files that are visited by the nwqid
successful elementwise walks; nwqid is therefore either nwname or the index
of the first elementwise walk that failed.
"
http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor33
For that reason we are no longer leaving from an error path in function
v9fs_walk(), unless really no path component could be walked successfully or
if the request has been interrupted.
Local variable 'nwalked' counts and reflects the number of path components
successfully processed by background I/O thread, whereas local variable
'name_idx' subsequently counts and reflects the number of path components
eventually accepted successfully by 9p server controller portion.
New local variable 'any_err' is an aggregate variable reflecting whether any
error occurred at all, while already existing variable 'err' only reflects
the last error.
Despite QIDs being delivered to client in a more relaxed way now, it is
important to note though that fid still must remain unaffected if any error
occurred.
Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Message-Id: <bc73e24258a75dc29458024c7936c8a036c3eac5.1647339025.git.qemu_oss@crudebyte.com>
The local variable 'name_idx' is used in two loops in function v9fs_walk().
Let the first loop use its own variable 'nwalked' instead, which we will
use in subsequent patch as the number of (requested) path components
successfully walked by background I/O thread.
Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Message-Id: <d506308e7e343023c4db95d0e6053dd2627ed3c1.1647339025.git.qemu_oss@crudebyte.com>
Adds handler to reset a remote device
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 112eeadf3bc4c6cdb100bc3f9a6fcfc20b467c1b.1655151679.git.jag.raman@oracle.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Forward remote device's interrupts to the guest
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Message-id: 9523479eaafe050677f4de2af5dd0df18c27cfd9.1655151679.git.jag.raman@oracle.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Determine the BARs used by the PCI device and register handlers to
manage the access to the same.
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 3373e10b5be5f42846f0632d4382466e1698c505.1655151679.git.jag.raman@oracle.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Define and register callbacks to manage the RAM regions used for
device DMA
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: faacbcd45c4d02c591f0dbfdc19041fbb3eae7eb.1655151679.git.jag.raman@oracle.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Assign separate address space for each device in the remote processes.
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: afe0b0a97582cdad42b5b25636a29c523265a10a.1655151679.git.jag.raman@oracle.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Define and register handlers for PCI config space accesses
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: be9d2ccf9b1d24e50dcd9c23404dbf284142cec7.1655151679.git.jag.raman@oracle.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Setup a handler to run vfio-user context. The context is driven by
messages to the file descriptor associated with it - get the fd for
the context and hook up the handler with it
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: e934b0090529d448b6a7972b21dfc3d7421ce494.1655151679.git.jag.raman@oracle.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Find the PCI device with specified id. Initialize the device context
with the QEMU PCI device
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 7798dbd730099b33fdd00c4c202cfe79e5c5c151.1655151679.git.jag.raman@oracle.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
create a context with the vfio-user library to run a PCI device
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: a452871ac8c812ff96fc4f0ce6037f4769953fab.1655151679.git.jag.raman@oracle.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>