xemu-project/xemu - xemu - Gitea: Git with a cup of tea

mirror of https://github.com/xemu-project/xemu.git synced 2024-11-25 20:49:49 +00:00

Author	SHA1	Message	Date
Cédric Le Goater	b168a138a8	ppc/pnv: change powernv_ prefix to pnv_ for overall naming consistency The 'pnv' prefix is now used for all and the routines populating the device tree start with 'pnv_dt'. The handler of the PnvXScomInterface is also renamed to 'dt_xscom' which should reflect that it is populating the device tree under the 'xscom@' node of the chip. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-01-10 12:53:00 +11:00
Greg Kurz	bb2d8ab636	spapr: fix LSI interrupt specifiers in the device tree LoPAPR 1.1 B.6.9.1.2 describes the "#interrupt-cells" property of the PowerPC External Interrupt Source Controller node as follows: “#interrupt-cells” Standard property name to define the number of cells in an interrupt- specifier within an interrupt domain. prop-encoded-array: An integer, encoded as with encode-int, that denotes the number of cells required to represent an interrupt specifier in its child nodes. The value of this property for the PowerPC External Interrupt option shall be 2. Thus all interrupt specifiers (as used in the standard “interrupts” property) shall consist of two cells, each containing an integer encoded as with encode-int. The first integer represents the interrupt number the second integer is the trigger code: 0 for edge triggered, 1 for level triggered. This patch fixes the interrupt specifiers in the "interrupt-map" property of the PHB node, that were setting the second cell to 8 (confusion with IRQ_TYPE_LEVEL_LOW ?) instead of 1. VIO devices and RTAS event sources use the same format for interrupt specifiers: while here, we introduce a common helper to handle the encoding details. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Tested-by: Cédric Le Goater <clg@kaod.org> -- v3: - reference public LoPAPR instead of internal PAPR+ in changelog - change helper name to spapr_dt_xics_irq() v2: - drop the erroneous changes to the "interrupts" prop in PCI device nodes - introduce a common helper to encode interrupt specifiers Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-12-15 09:49:24 +11:00
Cédric Le Goater	7718375584	spapr: introduce a spapr_qirq() helper xics_get_qirq() is only used by the sPAPR machine. Let's move it there and change its name to reflect its scope. It will be useful for XIVE support which will use its own set of qirqs. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-12-15 09:49:24 +11:00
Cédric Le Goater	60c6823b9b	spapr: move the IRQ allocation routines under the machine Also change the prototype to use a sPAPRMachineState and prefix them with spapr_irq_. It will let us synchronise the IRQ allocation with the XIVE interrupt mode when available. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-12-15 09:49:24 +11:00
Cédric Le Goater	4f7a47beeb	ppc/xics: introduce an icp_create() helper The sPAPR and the PowerNV core objects create the interrupt presenter object of the CPUs in a very similar way. Let's provide a common routine in which we use the presenter 'type' as a child identifier. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-12-15 09:49:24 +11:00
Greg Kurz	94ad93bd97	spapr_cpu_core: instantiate CPUs separately The current code assumes that only the CPU core object holds a reference on each individual CPU object, and happily frees their allocated memory when the core is unrealized. This is dangerous as some other code can legitimely keep a pointer to a CPU if it calls object_ref(), but it would end up with a dangling pointer. Let's allocate all CPUs with object_new() and let QOM free them when their reference count reaches zero. This greatly simplify the code as we don't have to fiddle with the instance size anymore. Signed-off-by: Greg Kurz <groug@kaod.org> Acked-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-12-15 09:49:23 +11:00
Greg Kurz	dcb556fc6a	xics/kvm: synchonize state before 'info pic' When using the emulated XICS, the 'info pic' monitor command shows: CPU 0 XIRR=ff000000 ((nil)) PP=ff MFRR=ff ICS 1000..13ff 0x10040060340 1000 MSI 05 00 1001 MSI 05 00 1002 MSI 05 00 1003 MSI ff 00 1004 LSI ff 00 1005 LSI ff 00 1006 LSI ff 00 1007 LSI ff 00 1008 MSI 05 00 1009 MSI 05 00 100a MSI 05 00 100b MSI 05 00 100c MSI 05 00 but when using the in-kernel XICS with the very same guest, we get: CPU 0 XIRR=00000000 ((nil)) PP=ff MFRR=ff ICS 1000..13ff 0x10032e00340 1000 MSI ff 00 1001 MSI ff 00 1002 MSI ff 00 1003 MSI ff 00 1004 LSI ff 00 1005 LSI ff 00 1006 LSI ff 00 1007 LSI ff 00 1008 MSI ff 00 1009 MSI ff 00 100a MSI ff 00 100b MSI ff 00 100c MSI ff 00 ie, all irqs are masked and XIRR is null, while we should get the same output as with the emulated XICS. If the guest is then migrated, 'info pic' shows the expected values on both source and destination. The problem is that QEMU doesn't synchronize with KVM before printing the XICS state. Migration happens to fix the output because it enforces synchronization with KVM. To fix the invalid output of 'info pic', this patch introduces a new synchronize_state operation for both ICPStateClass and ICSStateClass. The ICP operation relies on run_on_cpu() in order to kick the vCPU and avoid sleeping on KVM_GET_ONE_REG. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-11-14 11:12:42 +11:00
Igor Mammedov	40abf43f72	ppc: pnv: drop PnvChipClass::cpu_model field deduce core type directly from chip type instead of maintaining type mapping in PnvChipClass::cpu_model. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-10-17 10:34:01 +11:00
Igor Mammedov	35bdb9def2	ppc: pnv: drop PnvCoreClass::cpu_oc field deduce cpu type directly from core type instead of maintaining type mapping in PnvCoreClass::cpu_oc and doing extra cpu_model parsing in pnv_core_class_init() Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-10-17 10:34:01 +11:00
Igor Mammedov	7fd544d8a7	ppc: pnv: normalize core/chip type names typically for cpus/core type names following convention is used new_type_prefix-superclass_typename make PNV core/chip to follow common convention. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-10-17 10:34:01 +11:00
Igor Mammedov	4a12c699d3	ppc: pnv: use generic cpu_model parsing use common cpu_model prasing in vl.c and set default cpu_model using generic MachineClass::default_cpu_type. Beside of switching to generic infrastructure it solves several issues. * ppc_cpu_class_by_name() is used to deal with lower/upper case and alias translations into actual cpu type, which fixes '-M powernv -cpu power8' and '-M powernv -cpu power9_v1.0' usecases which error out with: 'invalid CPU model 'FOO' for powernv machine' * allows to switch to lower-case typenames in pnv chip/core name (by convention typnames should be lower-case) * replace aliased names /power8, power9, .../ with exact cpu model names (i.e. typenames should be stable but aliases might decide to point to other cpu model withi family or changed by kvm). It will also help to simplify pnv_chip/core code and get rid of dependency on cpu_model parsing. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> [dwg: Updated to make DD2.0 as default POWER9 chip] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-10-17 10:34:01 +11:00
Igor Mammedov	2e9c10eba0	ppc: spapr: use generic cpu_model parsing use generic cpu_model parsing introduced by (`6063d4c0f` vl.c: convert cpu_model to cpu type and set of global properties before machine_init()) it allows to: * replace sPAPRMachineClass::tcg_default_cpu with MachineClass::default_cpu_type * drop cpu_parse_cpu_model() from hw/ppc/spapr.c and reuse one in vl.c * simplify spapr_get_cpu_core_type() by removing not needed anymore recurrsion since alias look up happens earlier at vl.c and spapr_get_cpu_core_type() works only with resulted from that cpu type. * spapr no more needs to parse/depend on being phased out MachineState::cpu_model, all tha parsing done by generic code and target specific callback. Signed-off-by: Igor Mammedov <imammedo@redhat.com> [dwg: Correct minor compile error] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-10-17 10:34:01 +11:00
Igor Mammedov	5bbb264186	ppc: spapr: register 'host' core type along with the rest of core types consolidate 'host' core type registration by moving it from KVM specific code into spapr_cpu_core.c, similar like it's done in x86 target. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-10-17 10:34:00 +11:00
Igor Mammedov	b51d3c8818	ppc: spapr: use cpu type name directly replace sPAPRCPUCoreClass::cpu_class with cpu type name since it were needed just to get that at points it were accessed. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-10-17 10:34:00 +11:00
Igor Mammedov	44cd95e31a	ppc: spapr: define core types statically spapr core type definition doesn't have any fields that require it to be defined at runtime. So replace code that fills in TypeInfo at runtime with static TypeInfo array that does the same at complie time. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-10-17 10:34:00 +11:00
Igor Mammedov	b8e999673b	ppc: move '-cpu foo,compat=xxx' parsing into ppc_cpu_parse_featurestr() there is a dedicated callback CPUClass::parse_features which purpose is to convert -cpu features into a set of global properties AND deal with compat/legacy features that couldn't be directly translated into CPU's properties. Create ppc variant of it (ppc_cpu_parse_featurestr) and move 'compat=val' handling from spapr_cpu_core.c into it. That removes a dependency of board/core code on cpu_model parsing and would let to reuse common -cpu parsing introduced by `6063d4c0` Set "max-cpu-compat" property only if it exists, in practice it should limit 'compat' hack to spapr machine and allow to avoid including machine/spapr headers in target/ppc/cpu.c Signed-off-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-10-17 10:34:00 +11:00
Igor Mammedov	a1063aa8a5	ppc: spapr: replace ppc_cpu_parse_features() with cpu_parse_cpu_model() ppc_cpu_parse_features() is doing practically the same thing as generic cpu_parse_cpu_model(). So remove duplicated impl. and reuse generic one. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-10-17 10:34:00 +11:00
Mark Cave-Ayland	ecba28dbf2	mac_dbdma: remove DBDMA_init() function Instead we can now instantiate the MAC_DBDMA object directly within the macio device. We also add the DBDMA device as a child property so that it is possible to retrieve later. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-09-27 13:05:41 +10:00
Mark Cave-Ayland	1d27f351af	mac_dbdma: QOMify Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-09-27 13:05:41 +10:00
Mark Cave-Ayland	2bb4a98f90	mac_dbdma: remove unused IO fields from DBDMAState These fields were used to manually handle IO requests that weren't aligned to a sector boundary before this feature was supported by the block API. Once the block API changed to support byte-aligned IO requests, the macio controller was switched over to use it in commit `be1e343` but these fields were accidentally left behind. Remove them, including the initialisation in DBDMA_init(). Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-09-27 13:05:41 +10:00
Eric Blake	5261158d21	ppc/pnv: Improve macro parenthesization Although none of the existing macro call-sites were broken, it's always better to write macros that properly parenthesize arguments that can be complex expressions, so that the intended order of operations is not broken. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-09-27 13:05:41 +10:00
Benjamin Herrenschmidt	58b6283586	ppc: Fix OpenPIC model Apple uses an IBM MPIC2A without timers, it has 64 sources. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-09-27 13:05:41 +10:00
Cédric Le Goater	21f3f8db0e	ppc/xive: fix OV5_XIVE_EXPLOIT bits On POWER9, the Client Architecture Support (CAS) negotiation process determines whether the guest operates in XIVE Legacy compatibility or in XIVE exploitation mode. Now that we have initial guest support for the XIVE interrupt controller, let's fix the bits definition which have evolved in the latest specs. The platform advertises the XIVE Exploitation Mode support using the property "ibm,arch-vec-5-platform-support-vec-5", byte 23 bits 0-1 : - 0b00 XIVE legacy mode Only - 0b01 XIVE exploitation mode Only - 0b10 XIVE legacy or exploitation mode The OS asks for XIVE Exploitation Mode support using the property "ibm,architecture-vec-5", byte 23 bits 0-1: - 0b00 XIVE legacy mode Only - 0b01 XIVE exploitation mode Only Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-09-15 10:29:48 +10:00
Sam Bobroff	fa98fbfcdf	PPC: KVM: Support machine option to set VSMT mode KVM now allows writing to KVM_CAP_PPC_SMT which has previously been read only. Doing so causes KVM to act, for that VM, as if the host's SMT mode was the given value. This is particularly important on Power 9 systems because their default value is 1, but they are able to support values up to 8. This patch introduces a way to control this capability via a new machine property called VSMT ("Virtual SMT"). If the value is not set on the command line a default is chosen that is, when possible, compatible with legacy systems. Note that the intialization of KVM_CAP_PPC_SMT has changed slightly because it has changed (in KVM) from a global capability to a VM-specific one. This won't cause a problem on older KVMs because VM capabilities fall back to global ones. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-09-08 09:30:55 +10:00
BALATON Zoltan	0453428047	ppc4xx: Make MAL emulation more generic Allow MAL with more RX and TX channels as found in newer versions. Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-09-08 09:30:55 +10:00
BALATON Zoltan	517284a771	ppc4xx: Move MAL from ppc405_uc to ppc4xx_devs This device appears in other SoCs as well not just in 405 ones Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-09-08 09:30:55 +10:00
Sam Bobroff	2e886fb391	ppc: spapr: Make VCPU ID handling private to SPAPR The concept of a VCPU ID that differs from the CPU's index (cpu->cpu_index) exists only within SPAPR machines so, move the functions ppc_get_vcpu_id() and ppc_get_cpu_by_vcpu_id() into spapr.c and rename them appropriately. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-09-08 09:30:55 +10:00
Daniel Henrique Barboza	10f12e6450	hw/ppc: CAS reset on early device hotplug This patch is a follow up on the discussions made in patch "hw/ppc: disable hotplug before CAS is completed" that can be found at [1]. At this moment, we do not support CPU/memory hotplug in early boot stages, before CAS. When a hotplug occurs, the event is logged in an internal RTAS event log queue and an IRQ pulse is fired. In regular conditions, the guest handles the interrupt by executing check_exception, fetching the generated hotplug event and enabling the device for use. In early boot, this IRQ isn't caught (SLOF does not handle hotplug events), leaving the event in the rtas event log queue. If the guest executes check_exception due to another hotplug event, the re-assertion of the IRQ ends up de-queuing the first hotplug event as well. In short, a device hotplugged before CAS is considered coldplugged by SLOF. This leads to device misbehavior and, in some cases, guest kernel Ooops when trying to unplug the device. A proper fix would be to turn every device hotplugged before CAS as a colplugged device. This is not trivial to do with the current code base though - the FDT is written in the guest memory at ppc_spapr_reset and can't be retrieved without adding extra state (fdt_size for example) that will need to managed and migrated. Adding the hotplugged DT in the middle of CAS negotiation via the updated DT tree works with CPU devs, but panics the guest kernel at boot. Additional analysis would be necessary for LMBs and PCI devices. There are questions to be made in QEMU/SLOF/kernel level about how we can make this change in a sustainable way. With Linux guests, a fix would be the kernel executing check_exception at boot time, de-queueing the events that happened in early boot and processing them. However, even if/when the newer kernels start fetching these events at boot time, we need to take care of older kernels that won't be doing that. This patch works around the situation by issuing a CAS reset if a hotplugged device is detected during CAS: - the DRC conditions that warrant a CAS reset is the same as those that triggers a DRC migration - the DRC must have a device attached and the DRC state is not equal to its ready_state. With that in mind, this patch makes use of 'spapr_drc_needed' to determine if a CAS reset is needed. - In the middle of CAS negotiations, the function 'spapr_hotplugged_dev_before_cas' goes through all the DRCs to see if there are any DRC that requires a reset, using spapr_drc_needed. If that happens, returns '1' in 'spapr_h_cas_compose_response' which will set spapr->cas_reboot to true, causing the machine to reboot. No changes are made for coldplug devices. [1] http://lists.nongnu.org/archive/html/qemu-devel/2017-08/msg02855.html Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-09-08 09:30:54 +10:00
Daniel Henrique Barboza	5625817423	hw/ppc: clear pending_events on machine reset The sPAPR machine isn't clearing up the pending events QTAILQ on machine reboot. This allows for unprocessed hotplug/epow events to persist in the queue after reset and, when reasserting the IRQs in check_exception later on, these will be being processed by the OS. This patch implements a new function called 'spapr_clear_pending_events' that clears up the pending_events QTAILQ. This helper is then called inside ppc_spapr_reset to clear up the events queue, preventing old/deprecated events from persisting after a reset. Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-09-08 09:30:54 +10:00
David Gibson	2772cf6be9	pseries: Use smaller default hash page tables when guest can resize We've now implemented a PAPR extension allowing PAPR guest to resize their hash page table (HPT) during runtime. This patch makes use of that facility to allocate smaller HPTs by default. Specifically when a guest is aware of the HPT resize facility, qemu sizes the HPT to the initial memory size, rather than the maximum memory size on the assumption that the guest will resize its HPT if necessary for hot plugged memory. When the initial memory size is much smaller than the maximum memory size (a common configuration with e.g. oVirt / RHEV) then this can save significant memory on the HPT. If the guest does not advertise HPT resize awareness when it makes the ibm,client-architecture-support call, qemu resizes the HPT for maxmimum memory size (unless it's been configured not to allow such guests at all). For now we make that reallocation assuming the guest has not yet used the HPT at all. That's true in practice, but not, strictly, an architectural or PAPR requirement. If we need to in future we can fix this by having the client-architecture-support call reboot the guest with the revised HPT size (the client-architecture-support call is explicitly permitted to trigger a reboot in this way). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>	2017-07-17 15:07:05 +10:00
David Gibson	0b0b831016	pseries: Implement HPT resizing This patch implements hypercalls allowing a PAPR guest to resize its own hash page table. This will eventually allow for more flexible memory hotplug. The implementation is partially asynchronous, handled in a special thread running the hpt_prepare_thread() function. The state of a pending resize is stored in SPAPR_MACHINE->pending_hpt. The H_RESIZE_HPT_PREPARE hypercall will kick off creation of a new HPT, or, if one is already in progress, monitor it for completion. If there is an existing HPT resize in progress that doesn't match the size specified in the call, it will cancel it, replacing it with a new one matching the given size. The H_RESIZE_HPT_COMMIT completes transition to a resized HPT, and can only be called successfully once H_RESIZE_HPT_PREPARE has successfully completed initialization of a new HPT. The guest must ensure that there are no concurrent accesses to the existing HPT while this is called (this effectively means stop_machine() for Linux guests). For now H_RESIZE_HPT_COMMIT goes through the whole old HPT, rehashing each HPTE into the new HPT. This can have quite high latency, but it seems to be of the order of typical migration downtime latencies for HPTs of size up to ~2GiB (which would be used in a 256GiB guest). In future we probably want to move more of the rehashing to the "prepare" phase, by having H_ENTER and other hcalls update both current and pending HPTs. That's a project for another day, but should be possible without any changes to the guest interface. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-07-17 15:07:05 +10:00
David Gibson	30f4b05bd0	pseries: Stubs for HPT resizing This introduces stub implementations of the H_RESIZE_HPT_PREPARE and H_RESIZE_HPT_COMMIT hypercalls which we hope to add in a PAPR extension to allow run time resizing of a guest's hash page table. It also adds a new machine property for controlling whether this new facility is available. For now we only allow resizing with TCG, allowing it with KVM will require kernel changes as well. Finally, it adds a new string to the hypertas property in the device tree, advertising to the guest the availability of the HPT resizing hypercalls. This is a tentative suggested value, and would need to be standardized by PAPR before being merged. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com>	2017-07-17 15:07:05 +10:00
Alexey Kardashevskiy	2ee77040f5	ppc/pnv: Remove unused XICSState reference `e6f7e110ee` "ppc/xics: remove the XICSState classes" got rid of XICSState, this is just an leftover. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-07-17 15:07:05 +10:00
David Gibson	67fea71bf3	spapr: Implement DR-indicator for physical DRCs only According to PAPR, the DR-indicator should only be valid for physical DRCs, not logical DRCs. At the moment we implement it for all DRCs, so restrict it to physical ones only. We move the state to the physical DRC subclass, which means adding some QOM boilerplate to handle the newly distinct type. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Daniel Barboza <danielhb@linux.vnet.ibm.com> Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>	2017-07-17 15:07:05 +10:00
David Gibson	4445b1d27e	spapr: Remove sPAPRConfigureConnectorState sub-structure Most of the time, the state of a DRC object is contained in the single 'state' variable. However, during the transition from UNISOLATE to CONFIGURED state requires multiple calls to the ibm,configure-connector RTAS call to retrieve the device tree for the attached device. We need some extra state to keep track of where we're up to in delivering the device tree information to the guest. Currently that extra state is in a sPAPRConfigureConnectorState substructure which is only allocated when we're in the middle of the configure connector process. That sounds like a good idea, but the extra state is only two integers - on many platforms that will take up the same room as the (maybe NULL) ccs pointer even before malloc() overhead. Plus it's another object whose lifetime we need to manage. In short, it's not worth it. So, fold the sPAPRConfigureConnectorState substructure directly into the DRC object. Previously the structure was allocated lazily when the configure-connector call discovers it's not there. Now, we need to initialize the subfields pre-emptively, as soon as we enter UNISOLATE state. Although it's not strictly necessary (the field values should only ever be consulted when in UNISOLATE state), we try to keep them at -1 when in other states, as a debugging aid. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Daniel Barboza <danielhb@linux.vnet.ibm.com> Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>	2017-07-17 15:07:05 +10:00
David Gibson	9d4c0f4f0a	spapr: Consolidate DRC state variables Each DRC has three fields describing its state: isolation_state, allocation_state and configured. At first this seems like a reasonable representation, since its based directly on the PAPR defined isolation-state and allocation-state indicators. However: * Only a few combinations of the two fields' values are permitted * allocation_state isn't used at all for physical DRCs * The indicators are write only so they don't really have a well defined current value independent of each other This replaces these variables with a single state variable, whose names and numbers are based on the diagram in LoPAPR section 13.4. Along with this we add code to check the current state on various operations and make sure the requested transition is permitted. Strictly speaking, this makes guest visible changes to behaviour (since we probably allowed some transitions we shouldn't have before). However, a hypothetical guest broken by that wasn't PAPR compliant, and probably wouldn't have worked under PowerVM. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Daniel Barboza <danielhb@linux.vnet.ibm.com> Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>	2017-07-17 15:07:05 +10:00
David Gibson	f1c52354e5	spapr: Cleanups relating to DRC awaiting_release field 'awaiting_release' indicates that the host has requested an unplug of the device attached to the DRC, but the guest has not (yet) put the device into a state where it is safe to complete removal. 1. Rename it to 'unplug_requested' which to me at least is clearer 2. Remove the ->release_pending() method used to check this from outside spapr_drc.c. The method only plausibly has one implementation, so use a plain function (spapr_drc_unplug_requested()) instead. 3. Remove it from the migration stream. Attempting to migrate mid-unplug is broken not just for spapr - in general management has no good way to determine if the device should be present on the destination or not. So, until that's fixed, there's no point adding extra things to the stream. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>	2017-07-17 15:07:05 +10:00
David Gibson	a8dc47fd82	spapr: Refactor spapr_drc_detach() This function has two unused parameters - remove them. It also sets awaiting_release on all paths, except one. On that path setting it is harmless, since it will be immediately cleared by spapr_drc_release(). So factor it out of the if statements. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>	2017-07-17 15:07:05 +10:00
David Gibson	82a93a1d30	spapr: Remove 'awaiting_allocation' DRC flag The awaiting_allocation flag in the DRC was introduced by `aab9913` "spapr_drc: Prevent detach racing against attach for CPU DR", allegedly to prevent a guest crash on racing attach and detach. Except.. information from the BZ actually suggests a qemu crash, not a guest crash. And there shouldn't be a problem here anyway: if the guest has already moved the DRC away from UNUSABLE state, the detach would already be deferred, and if it hadn't it should be safe to detach it (the guest should fail gracefully when it attempts to change the allocation state). I think this was probably just a bandaid for some other problem in the state management. So, remove awaiting_allocation and associated code. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Tested-by: Greg Kurz <groug@kaod.org> Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>	2017-07-17 15:07:05 +10:00
Laurent Vivier	94fd9cbaa3	spapr: Treat devices added before inbound migration as coldplugged When migrating a guest which has already had devices hotplugged, libvirt typically starts the destination qemu with -incoming defer, adds those hotplugged devices with qmp, then initiates the incoming migration. This causes problems for the management of spapr DRC state. Because the device is treated as hotplugged, it goes into a DRC state for a device immediately after it's plugged, but before the guest has acknowledged its presence. However, chances are the guest on the source machine has acknowledged the device's presence and configured it. If the source has fully configured the device, then DRC state won't be sent in the migration stream: for maximum migration compatibility with earlier versions we don't migrate DRCs in coldplug-equivalent state. That means that the DRC effectively changes state over the migrate, causing problems later on. In addition, logging hotplug events for these devices isn't what we want because a) those events should already have been issued on the source host and b) the event queue should get wiped out by the incoming state anyway. In short, what we really want is to treat devices added before an incoming migration as if they were coldplugged. To do this, we first add a spapr_drc_hotplugged() helper which determines if the device is hotplugged in the sense relevant for DRC state management. We only send hotplug events when this is true. Second, when we add a device which isn't hotplugged in this sense, we force a reset of the DRC state - this ensures the DRC is in a coldplug-equivalent state (there isn't usually a system reset between these device adds and the incoming migration). This is based on an earlier patch by Laurent Vivier, cleaned up and extended. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>	2017-07-17 15:07:05 +10:00
David Gibson	5341258e86	spapr: Minor cleanups to events handling The rtas_error_log structure is marked packed, which strongly suggests its precise layout is important to match an external interface. Along with that one could expect it to have a fixed endianness to match the same interface. That used to be the case - matching the layout of PAPR RTAS event format and requiring BE fields. Now, however, it's only used embedded within sPAPREventLogEntry with the fields in native order, since they're processed internally. Clear that up by removing the nested structure in sPAPREventLogEntry. struct rtas_error_log is moved back to spapr_events.c where it is used as a temporary to help convert the fields in sPAPREventLogEntry to the correct in memory format when delivering an event to the guest. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-07-17 15:07:05 +10:00
Daniel Henrique Barboza	fd38804b38	spapr: migrate pending_events of spapr state In racing situations between hotplug events and migration operation, a rtas hotplug event could have not yet be delivered to the source guest when migration is started. In this case the pending_events of spapr state need be transmitted to the target so that the hotplug event can be finished on the target. To achieve the minimal VMSD possible to migrate the pending_events list, this patch makes the changes in spapr_events.c: - 'log_type' of sPAPREventLogEntry struct deleted. This information can be derived by inspecting the rtas_error_log summary field. A new function called 'spapr_event_log_entry_type' was added to retrieve the type of a given sPAPREventLogEntry. - sPAPREventLogEntry, epow_log_full and hp_log_full were redesigned. The only data we're going to migrate in the VMSD is the event log data itself, which can be divided in two parts: a rtas_error_log header and an extended event log field. The rtas_error_log header contains information about the size of the extended log field, which can be used inside VMSD as the size parameter of the VBUFFER_ALOC field that will store it. To allow this use, the header.extended_length field must be exposed inline to the VMSD instead of embedded into a 'data' field that holds everything. With this in mind, the following changes were done: * a new 'header' field was added to sPAPREventLogEntry. This field holds a a struct rtas_error_log inline. * the declaration of the 'rtas_error_log' struct was moved to spapr.h to be visible to the VMSD macros. * 'data' field of sPAPREventLogEntry was renamed to 'extended_log' and now holds only the contents of the extended event log. * 'struct rtas_error_log hdr' were taken away from both epow_log_full and hp_log_full. This information is now available at the header field of sPAPREventLogEntry. * epow_log_full and hp_log_full were renamed to epow_extended_log and hp_extended_log respectively. This rename makes it clearer to understand the new purpose of both structures: hold the information of an extended event log field. * spapr_powerdown_req and spapr_hotplug_req_event now creates a sPAPREventLogEntry structure that contains the full rtas log entry. * rtas_event_log_queue and rtas_event_log_dequeue now receives a sPAPREventLogEntry pointer as a parameter instead of a void pointer. - the endianess of the sPAPREventLogEntry header is now native instead of be32. We can use the fields in native endianess internally and write them in be32 in the guest physical memory inside 'check_exception'. This allows the VMSD inside spapr.c to read the correct size of the entended_log field. - inside spapr.c, pending_events is put in a subsection in the spapr state VMSD to make sure migration across different versions is not broken. A small change in rtas_event_log_queue and rtas_event_log_dequeue were also made: instead of calling qdev_get_machine(), both functions now receive a pointer to the sPAPRMachineState. This pointer is already available in the callers of these functions and we don't need to waste resources calling qdev() again. Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-07-17 15:07:05 +10:00
Alexey Kardashevskiy	1221a47467	memory/iommu: introduce IOMMUMemoryRegionClass This finishes QOM'fication of IOMMUMemoryRegion by introducing a IOMMUMemoryRegionClass. This also provides a fastpath analog for IOMMU_MEMORY_REGION_GET_CLASS(). This makes IOMMUMemoryRegion an abstract class. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20170711035620.4232-3-aik@ozlabs.ru> Acked-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-07-14 12:04:41 +02:00
Alexey Kardashevskiy	3df9d74806	memory/iommu: QOM'fy IOMMU MemoryRegion This defines new QOM object - IOMMUMemoryRegion - with MemoryRegion as a parent. This moves IOMMU-related fields from MR to IOMMU MR. However to avoid dymanic QOM casting in fast path (address_space_translate, etc), this adds an @is_iommu boolean flag to MR and provides new helper to do simple cast to IOMMU MR - memory_region_get_iommu. The flag is set in the instance init callback. This defines memory_region_is_iommu as memory_region_get_iommu()!=NULL. This switches MemoryRegion to IOMMUMemoryRegion in most places except the ones where MemoryRegion may be an alias. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20170711035620.4232-2-aik@ozlabs.ru> Acked-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-07-14 12:04:41 +02:00
Cédric Le Goater	f2b14e3a9f	spapr: introduce the XIVE_EXPLOIT option in CAS On POWER9, the Client Architecture Support (CAS) negotiation process determines whether the guest operates in XIVE Legacy compatibility (the former POWER8 interrupt model) or in XIVE exploitation mode (the newer POWER9 interrupt model). Bit 7 of Byte 23 of vector 5 is used for this purpose. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-07-11 11:04:02 +10:00
David Gibson	5c1da81215	spapr: Remove unnecessary differences between hotplug and coldplug paths spapr_drc_attach() has a 'coldplug' parameter which sets the DRC into configured state initially, instead of the usual ISOLATED/UNUSABLE state. It turns out this is unnecessary: although coldplugged devices do need to be in CONFIGURED state once the guest starts, that will already be accomplished by the reset code which will move DRCs for already plugged devices into a coldplug equivalent state. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org>	2017-07-11 11:04:01 +10:00
David Gibson	6b762f29a8	spapr: Add DRC release method At the moment, spapr_drc_release() has an ugly switch on the DRC type to call the right, device-specific release function. This cleans it up by doing that via a proper QOM method. It's still arguably an abstraction violation for the DRC code to call into the specific device code, but one mess at a time. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org>	2017-07-11 11:04:01 +10:00
Greg Kurz	498cd99544	spapr: refresh "platform-specific" hcalls comment We have more of these since the addition of KVMPPC_H_LOGICAL_MEMOP in 2012. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-07-11 11:04:01 +10:00
Greg Kurz	04d0ffbd52	spapr: make spapr_populate_hotplug_cpu_dt() static Since commit `ff9006ddbf` ("spapr: move spapr_core_[foo]plug() callbacks close to machine code in spapr.c"), this function doesn't need to be extern anymore. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-07-11 11:04:01 +10:00
David Gibson	0dfabd39d5	spapr: Clean up DRC set_isolation_state() path There are substantial differences in the various paths through set_isolation_state(), both for setting to ISOLATED versus UNISOLATED state and for logical versus physical DRCs. So, split the set_isolation_state() method into isolate() and unisolate() methods, and give it different implementations for the two DRC types. Factor some minimal common checks, including for valid indicator values (which we weren't previously checking) into rtas_set_isolation_state(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>	2017-06-30 14:03:32 +10:00
David Gibson	617367321e	spapr: Clean up DRC set_allocation_state path The allocation-state indicator should only actually be implemented for "logical" DRCs, not physical ones. Factor a check for this, and also for valid indicator state values into rtas_set_allocation_state(). Because they don't exist for physical DRCs, there's no reason that we'd ever want more than one method implementation, so it can just be a plain function. In addition, the setting to USABLE and setting to UNUSABLE paths in set_allocation_state() don't actually have much in common. So, split the method separate functions for each parameter value (drc_set_usable() and drc_set_unusable()). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>	2017-06-30 14:03:32 +10:00
David Gibson	307b7715d0	spapr: Eliminate DRC 'signalled' state variable The 'signalled' field in the DRC appears to be entirely a torturous workaround for the fact that PCI devices were started in UNISOLATED state for unclear reasons. 1) 'signalled' is already meaningless for logical (so far, all non PCI) DRCs. It's always set to true (at least at any point it might be tested), and can't be assigned any real meaning due to the way signalling works for logical DRCs. 2) For PCI DRCs, the only time signalled would be false is when non-zero functions of a multifunction device are hotplugged, followed by function zero (the other way around is explicitly not permitted). In that case the secondary function DRCs are attached, but the notification isn't sent to the guest until function 0 is plugged. 3) signalled being false is used to allow a DRC detach to switch mode back to ISOLATED state, which allows a secondary function to be hotplugged then unplugged with function 0 never inserted. Without this a secondary function starting in UNISOLATED state couldn't be detached again without function 0 being inserted, all the functions configured by the guest, then sent back to ISOLATED state. 4) But now that PCI DRCs start in ISOLATED state, there's nothing to be done. If the guest doesn't get the notification, it won't switch the device to UNISOLATED state, so nothing prevents it from being unplugged. If the guest does move it to UNISOLATED state without the signal (due to a manual drmgr call, for instance) then it really isn't safe to unplug it. So, this patch removes the signalled variable and all code related to it. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>	2017-06-30 14:03:32 +10:00
Greg Kurz	46f7afa370	spapr: fix migration of ICPState objects from/to older QEMU Commit `5bc8d26de2` ("spapr: allocate the ICPState object from under sPAPRCPUCore") moved ICPState objects from the machine to CPU cores. This is an improvement since we no longer allocate ICPState objects that will never be used. But it has the side-effect of breaking migration of older machine types from older QEMU versions. This patch allows spapr to register dummy "icp/server" entries to vmstate. These entries use a dedicated VMStateDescription that can swallow and discard state of an incoming migration stream, and that don't send anything on outgoing migration. As for real ICPState objects, the instance_id is the cpu_index of the corresponding vCPU, which happens to be equal to the generated instance_id of older machine types. The machine can unregister/register these entries when CPUs are dynamically plugged/unplugged. This is only available for pseries-2.9 and older machines, thanks to a compat property. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-06-30 14:03:31 +10:00
David Gibson	7843c0d60d	pseries: Move CPU compatibility property to machine Server class POWER CPUs have a "compat" property, which is used to set the backwards compatibility mode for the processor. However, this only makes sense for machine types which don't give the guest access to hypervisor privilege - otherwise the compatibility level is under the guest's control. To reflect this, this removes the CPU 'compat' property and instead creates a 'max-cpu-compat' property on the pseries machine. Strictly speaking this breaks compatibility, but AFAIK the 'compat' option was never (directly) used with -device or device_add. The option was used with -cpu. So, to maintain compatibility, this patch adds a hack to the cpu option parsing to strip out any compat options supplied with -cpu and set them on the machine property instead of the now deprecated cpu property. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Tested-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: Greg Kurz <groug@kaod.org> Tested-by: Greg Kurz <groug@kaod.org> Tested-by: Andrea Bolognani <abologna@redhat.com>	2017-06-30 14:03:31 +10:00
Laurent Vivier	593080936a	Revert "spapr: fix memory hot-unplugging" This reverts commit `fe6824d126`. Conflicts hw/ppc/spapr_drc.c, because get_index() has been renamed spapr_get_index(). This didn't fix the problem. Once the hotplug has been started some memory is allocated and some structures are allocated. We don't free it when we ignore the unplug, and we can't because they can be in use by the kernel. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-06-09 12:35:46 +10:00
Greg Kurz	b1fd36c363	xics: drop ICPStateClass::cpu_setup() handler The cpu_setup() handler is only implemented by xics_kvm, where it really does a typical "realize" job. Moreover, the realize() handler is called shortly after cpu_setup(), on the same path. This patch converts xics_kvm to implement realize() instead of cpu_setup(). Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-06-09 12:17:59 +10:00
Greg Kurz	9ed656631d	xics: setup cpu at realize time Until recently, spapr used to allocate ICPState objects for the lifetime of the machine. They would only be associated to vCPUs in xics_cpu_setup() when plugging a CPU core. Now that ICPState objects have the same lifecycle as vCPUs, it is possible to associate them during realization. This patch hence open-codes xics_cpu_setup() in icp_realize(). The vCPU is passed as a property. Note that vCPU now needs to be realized first for the IRQs to be allocated. It also needs to resetted before ICPState realization in order to synchronize with KVM. Since ICPState objects are freed when unrealized, xics_cpu_destroy() isn't needed anymore and can be safely dropped. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-06-09 12:15:57 +10:00
Greg Kurz	100f738850	xics: pass appropriate types to realize() handlers. It makes more sense to pass an IPCState * to handlers of ICPStateClass instead of a DeviceState *, if only to benefit from compile time type checking. The same goes with ICSStateClass. While here, we also change the declaration of ICPStateClass in xics.h for consistency. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-06-09 12:12:34 +10:00
Greg Kurz	ad265631c0	xics: introduce macros for ICP/ICS link properties These properties are part of the XICS API. They deserve to appear explicitely in the XICS header file. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-06-09 12:12:34 +10:00
Greg Kurz	a4d4edce7a	xics: add reset() handler to ICPStateClass Taking into account that qemu_set_irq() returns immediatly if its first argument is NULL, icp_kvm_reset() largely duplicates icp_reset(). This patch introduces a reset() handler, so that the common logic can be implemented in icp_reset() only. While there we can also drop icp_kvm_realize() and icp_kvm_unrealize(). This causes icp-kvm to be realized in icp_realize(), which sets icp->xics, but it has no impact. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-06-08 14:38:27 +10:00
David Gibson	7980833619	spapr: Rework DRC name handling DRC objects have a get_name method which returns the DRC name generated when the DRC is created. Replace that with a fixed spapr_drc_name() function which generates the name on the fly from other information. This means: * We get rid of a method with only one implementation, and only local callers * We don't have to carry the name string around for the lifetime of the DRC * We use information added to the class structure to generate the name in standard format, so we don't need an explicit switch on drc type any more We also eliminate the 'name' property; it's basically useless since the only information in it can easily be deduced from other things. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>	2017-06-08 14:38:27 +10:00
David Gibson	0be4e88621	spapr: Change DRC attach & detach methods to functions DRC objects have attach & detach methods, but there's only one implementation. Although there are some differences in its behaviour for different DRC types, the overall structure is the same, so while we might want different method implementations for some parts, we're unlikely to want them for the top-level functions. So, replace them with direct function calls. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>	2017-06-08 14:38:26 +10:00
David Gibson	cd74d27e42	spapr: Clean up handling of DR-indicator There are 3 types of "indicator" associated with hotplug in the PAPR spec the "allocation state", "isolation state" and "DR-indicator". The first two are intimately tied to the various state transitions associated with hotplug. The DR-indicator, however, is different and simpler. It's basically just a guest controlled variable which can be used by the guest to flag state or problems associated with a device. The idea is that the hypervisor can use it to present information back on management consoles (on some machines with PowerVM it may even control physical LEDs on the machine case associated with the relevant device). For that reason, there's only ever likely to be a single update implementation so the set_indicator_state method isn't useful. Replace it with a direct function call. While we're there, make some small associated cleanups: * PAPR doesn't use the term "indicator state", just "DR-indicator" and the allocation state and isolation state are also considered "indicators". Rename things to be less confusing * Fold set_indicator_state() and rtas_set_indicator_state() into a single rtas_set_dr_indicator() function. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>	2017-06-08 14:38:26 +10:00
David Gibson	f224d35be9	spapr: Clean up DR entity sense handling DRC classes have an entity_sense method to determine (in a specific PAPR sense) the presence or absence of a device plugged into a DRC. However, we only have one implementation of the method, which explicitly tests for different DRC types. This changes it to instead have different method implementations for the two cases: "logical" and "physical" DRCs. While we're at it, the entity sense method always returns RTAS_OUT_SUCCESS, and the interesting value is returned via pass-by-reference. Simplify this to directly return the value we care about Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>	2017-06-08 14:38:26 +10:00
David Gibson	1693ea1685	spapr: Eliminate spapr_drc_get_type_str() This function was used in generating the device tree. However, now that we have different QOM types for different DRC types we can easily store the information we need in the class structure and avoid this specialized lookup function. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>	2017-06-06 09:24:21 +10:00
David Gibson	b8fdd530be	spapr: Move configure-connector state into DRC Currently the sPAPRMachineState contains a list of sPAPRConfigureConnector structures which store intermediate state for the ibm,configure-connector RTAS call. This was an attempt to separate this state from the core of the DRC state. However the configure connector process is intimately tied to the DRC model, so there's really no point trying to have two levels of interface here. Moving the configure-connector state into its corresponding DRC allows removal of a number of helpers for maintaining the anciliary list. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>	2017-06-06 09:24:17 +10:00
David Gibson	fbf5539718	spapr: Clean up spapr_dr_connector_by_() Change names to something less ludicrously verbose * Now that we have QOM subclasses for the different DRC types, use a QOM typename instead of a PAPR type value parameter The latter allows removal of the get_type_shift() helper. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>	2017-06-06 09:24:08 +10:00
David Gibson	2d33581899	spapr: Introduce DRC subclasses Currently we only have a single QOM type for all DRCs, but lots of places where we switch behaviour based on the DRC's PAPR defined type. This is a poor use of our existing type system. So, instead create QOM subclasses for each PAPR defined DRC type. We also introduce intermediate subclasses for physical and logical DRCs, a division which will be useful later on. Instead of being stored in the DRC object itself, the PAPR type is now stored in the class structure. There are still many places where we switch directly on the PAPR type value, but this at least provides the basis to start to remove those. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>	2017-06-06 09:23:46 +10:00
David Gibson	0b55aa91c9	spapr: Make DRC get_index and get_type methods into plain functions These two methods only have one implementation, and the spec they're implementing means any other implementation is unlikely, verging on impossible. So replace them with simple functions. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>	2017-06-06 08:53:24 +10:00
David Gibson	4f65ce00ab	spapr: Abolish DRC set_configured method DRConnectorClass has a set_configured method, however: * There is only one implementation, and only ever likely to be one * There's exactly one caller, and that's (now) local * The implementation is very straightforward So abolish the method entirely, and just open-code what we need. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>	2017-06-06 08:53:24 +10:00
David Gibson	88af6ea568	spapr: Abolish DRC get_fdt method The DRConnectorClass includes a get_fdt method. However * There's only one implementation, and there's only likely to ever be one * Both callers are local to spapr_drc * Each caller only uses one half of the actual implementation So abolish get_fdt() entirely, and just open-code what we need. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>	2017-06-06 08:53:24 +10:00
Daniel Henrique Barboza	318347234d	hw/ppc: removing drc->detach_cb and drc->detach_cb_opaque The pointer drc->detach_cb is being used as a way of informing the detach() function inside spapr_drc.c which cb to execute. This information can also be retrieved simply by checking drc->type and choosing the right callback based on it. In this context, detach_cb is redundant information that must be managed. After the previous spapr_lmb_release change, no detach_cb_opaques are being used by any of the three callbacks functions. This is yet another information that is now unused and, on top of that, can't be migrated either. This patch makes the following changes: - removal of detach_cb_opaque. the 'opaque' argument was removed from the callbacks and from the detach() function of sPAPRConnectorClass. The attribute detach_cb_opaque of sPAPRConnector was removed. - removal of detach_cb from the detach() call. The function pointer detach_cb of sPAPRConnector was removed. detach() now uses a switch(drc->type) to execute the apropriate callback. To achieve this, spapr_core_release, spapr_lmb_release and spapr_phb_remove_pci_device_cb callbacks were made public to be visible inside detach(). Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-05-25 11:31:33 +10:00
David Gibson	0cffce56ae	hw/ppc/spapr.c: adding pending_dimm_unplugs to sPAPRMachineState The LMB DRC release callback, spapr_lmb_release(), uses an opaque parameter, a sPAPRDIMMState struct that stores the current LMBs that are allocated to a DIMM (nr_lmbs). After each call to this callback, the nr_lmbs is decremented by one and, when it reaches zero, the callback proceeds with the qdev calls to hot unplug the LMB. Using drc->detach_cb_opaque is problematic because it can't be migrated in the future DRC migration work. This patch makes the following changes to eliminate the usage of this opaque callback inside spapr_lmb_release: - sPAPRDIMMState was moved from spapr.c and added to spapr.h. A new attribute called 'addr' was added to it. This is used as an unique identifier to associate a sPAPRDIMMState to a PCDIMM element. - sPAPRMachineState now hosts a new QTAILQ called 'pending_dimm_unplugs'. This queue of sPAPRDIMMState elements will store the DIMM state of DIMMs that are currently going under an unplug process. - spapr_lmb_release() will now retrieve the nr_lmbs value by getting the correspondent sPAPRDIMMState. A helper function called spapr_dimm_get_address was created to fetch the address of a PCDIMM device inside spapr_lmb_release. When nr_lmbs reaches zero and the callback proceeds with the qdev hot unplug calls, the sPAPRDIMMState struct is removed from spapr->pending_dimm_unplugs. After these changes, the opaque argument for spapr_lmb_release is now unused and is passed as NULL inside spapr_del_lmbs. This and the other opaque arguments can now be safely removed from the code. As an additional cleanup made by this patch, the spapr_del_lmbs function was merged with spapr_memory_unplug_request. The former was being called only by the latter and both were small enough to fit one single function. Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> [dwg: Minor stylistic cleanups] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-05-25 11:31:28 +10:00
Daniel Henrique Barboza	bff3063837	hw/ppc/spapr_events.c: removing 'exception' from sPAPREventLogEntry Currenty we do not have any RTAS event that is reported by the event-scan interface. The existing events, RTAS_LOG_TYPE_EPOW and RTAS_LOG_TYPE_HOTPLUG, are being reported by the check-exception interface and, as such, marked as 'exception=true'. Commit `79853e18d9`, 'spapr_events: event-scan RTAS interface', added the event_scan interface because the guest kernel requires it to initialize other required interfaces. It is acting since then as a stub because no events that would be reported by it were added since then. However, the existence of the 'exception' boolean adds an unnecessary load in the future migration of the pending_events, sPAPREventLogEntry QTAILQ that hosts the pending RTAS events. To make the code cleaner and ease the future migration changes, this patch makes the following changes: - remove the 'exception' boolean that filter these events. There is nothing to filter since all events are reported by check-exception; - functions rtas_event_log_queue, rtas_event_log_dequeue and rtas_event_log_contains don't receive the 'exception' boolean as parameter; - event_scan function was simplified. It was calling 'rtas_event_log_dequeue(mask, false)' that was always returning 'NULL' because we have no events that are created with exception=false, thus in the end it would execute a jump to 'out_no_events' all the time. The function now assumes that this will always be the case and all the remaining logic were deleted. In the future, when or if we add new RTAS events that should be reported with the event_scan interface, we can refer to the changes made in this patch to add the event_scan logic back. Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-05-24 11:39:53 +10:00
Greg Kurz	de86eccc0c	xics_kvm: cache already enabled vCPU ids Since commit `a45863bda9` ("xics_kvm: Don't enable KVM_CAP_IRQ_XICS if already enabled"), we were able to re-hotplug a vCPU that had been hot- unplugged ealier, thanks to a boolean flag in ICPState that we set when enabling KVM_CAP_IRQ_XICS. This could work because the lifecycle of all ICPState objects was the same as the machine. Commit `5bc8d26de2` ("spapr: allocate the ICPState object from under sPAPRCPUCore") broke this assumption and now we always pass a freshly allocated ICPState object (ie, with the flag unset) to icp_kvm_cpu_setup(). This cause re-hotplug to fail with: Unable to connect CPU8 to kernel XICS: Device or resource busy Let's fix this by caching all the vCPU ids for which KVM_CAP_IRQ_XICS was enabled. This also drops the now useless boolean flag from ICPState. Reported-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Greg Kurz <groug@kaod.org> Tested-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-05-24 11:39:52 +10:00
Bharata B Rao	06ec79e865	spapr: Consolidate HPT freeing code into a routine Consolidate the code that frees HPT into a separate routine spapr_free_hpt() as the same chunk of code is called from two places. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-05-24 11:39:52 +10:00
Greg Kurz	f63ebfe0ac	ppc/xics: simplify prototype of xics_spapr_init() This function only does hypercall and RTAS-call registration, and thus never returns an error. This patch adapt the prototype to reflect that. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-05-24 11:39:52 +10:00
Stefan Hajnoczi	ba9915e1f8	x86 and machine queue, 2017-05-11 Highlights: * New "-numa cpu" option * NUMA distance configuration * migration/i386 vmstatification -----BEGIN PGP SIGNATURE----- iQIcBAABCAAGBQJZFLh3AAoJECgHk2+YTcWmppQQAJe9Y5a3VwHXqvHbwBHX2ysn RZDAUPd9DWpbM+UUydyKOVIZ7u5RXbbVq4E0NeCD8VYYd+grZB5Wo1cAzy3b4U2j 2s+MDqaPMtZtGoqxTsyQOVoVxazT5Kf1zglK+iUEzik44J7LGdro+ty2Z7Ut2c11 q9rE/GNS78czBm7c4lxgkxXW4N95K/tEGlLtDQ7uct//3U/ZimF+mO6GcbVFlOWT 4iEbOz2sqvBVv22nLJRufiPgFNIW4hizAz5KBWxwGFCCKvT3N6yYNKKjzEpCw+jE lpjIRODU02yIZZZY841fLRtyrk7p4zORS8jRaHTdEJgb5bGc/YazxxVL8nzRQT1W VxFwAMd+UNrDkV24hpN++Ln2O+b3kwcGZ7uA/qu9d5WvSYUKXlHqcMJ35q6zuhAI /ecfYO7EZfVP86VjIt5IH04iV8RChA9Q6de+kQEFa6wHUxufeCOwCFqukGo8zj07 plX8NcjnzYmSXKnYjHOHao4rKT+DiJhRB60rFiMeKP/qvKbZPjtgsIeonhHm53qZ /QwkhowahHKkpAnetIl0QHm8KS4YudAofMi/Fl+he4gRkEbSQVAo6iQb2L4cjcLC LNSDDsIVWGem4gCR+vcsFqB3lggRDfltHXm15JKh92UMpOr6RI6s8pD55T7EdnPC CfdxWB5kYM6/lLbOHj94 =48wH -----END PGP SIGNATURE----- Merge remote-tracking branch 'ehabkost/tags/x86-and-machine-pull-request' into staging x86 and machine queue, 2017-05-11 Highlights: * New "-numa cpu" option * NUMA distance configuration * migration/i386 vmstatification # gpg: Signature made Thu 11 May 2017 08:16:07 PM BST # gpg: using RSA key 0x2807936F984DC5A6 # gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>" # gpg: Note: This key has expired! # Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF D1AA 2807 936F 984D C5A6 * ehabkost/tags/x86-and-machine-pull-request: (29 commits) migration/i386: Remove support for pre-0.12 formats vmstatification: i386 FPReg migration/i386: Remove old non-softfloat 64bit FP support tests: check -numa node,cpu=props_list usecase numa: add '-numa cpu,...' option for property based node mapping numa: remove node_cpu bitmaps as they are no longer used numa: use possible_cpus for not mapped CPUs check machine: call machine init from wrapper numa: remove no longer need numa_post_machine_init() tests: numa: add case for QMP command query-cpus QMP: include CpuInstanceProperties into query_cpus output output virt-arm: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu() spapr: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu() pc: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu() numa: do default mapping based on possible_cpus instead of node_cpu bitmaps numa: mirror cpu to node mapping in MachineState::possible_cpus numa: add check that board supports cpu_index to node mapping virt-arm: add node-id property to CPU pc: add node-id property to CPU spapr: add node-id property to sPAPR core ... Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-05-15 14:12:03 +01:00
Igor Mammedov	0b8497f08c	spapr: add node-id property to sPAPR core it will allow switching from cpu_index to core based numa mapping in follow up patches. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <1494415802-227633-3-git-send-email-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2017-05-11 16:08:48 -03:00
David Gibson	eaf87a3976	pnv: Fix build failures on some host platforms This makes some changes to fix build failures on the 'min-glib' docker image, and maybe other platforms with a buildchain that's less tolerant about duplicated typedefs. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-05-11 09:45:15 +10:00
Paolo Bonzini	5c6b487d67	ppc: xics: fix compilation with CentOS 6 The PowerPCCPU typedef is included twice if a file includes both hw/ppc/xics.h and target/ppc/cpu-qom.h. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-05-11 09:45:15 +10:00
Sam Bobroff	229e16fd24	ppc/xics: preserve P and Q bits for KVM IRQs Kernel commit 17d48610ae0f ("KVM: PPC: Book 3S: XICS: Implement ICS P/Q states") added new bits to the state used by KVM IRQs. Currently, QEMU does not preserve these bits, so migrating (or otherwise saving and restoring) the guest state causes the P and Q bits to be cleared. Clearing the P bit has no effect, because the kernel will set it based on other data, but the loss of a set Q bit will cause a lost interrupt. This patch preserves the P and Q bits, correcting the problem. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-05-11 09:45:15 +10:00
Cédric Le Goater	bce0b69159	ppc/pnv: generate an OEM SEL event on shutdown OpenPOWER systems expect to be notified with such an event before a shutdown or a reboot. An OEM SEL message is sent with specific identifiers and a user data containing the request : OFF or REBOOT. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:41:56 +10:00
Cédric Le Goater	aeaef83dab	ppc/pnv: add initial IPMI sensors for the BMC simulator Skiboot, the firmware for the PowerNV platform, expects the BMC to provide some specific IPMI sensors. These sensors are exposed in the device tree and their values are updated by the firmware at boot time. Sensors of interest are : "FW Boot Progress" "Boot Count" As such a device is defined on the command line, we can only detect its presence at reset time. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:41:56 +10:00
Benjamin Herrenschmidt	4d1df88b63	ppc/pnv: Add support for POWER8+ LPC Controller It adds the Naples chip which supports proper LPC interrupts via the LPC controller rather than via an external CPLD. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [clg: - updated for qemu-2.9 - ported on latest PowerNV patchset - moved the IRQ handler in pnv_lpc.c - introduced pnv_lpc_isa_irq_create() to create the ISA IRQs ] Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:41:55 +10:00
Cédric Le Goater	71cd4dace9	spapr: remove the 'nr_servers' field from the machine xics_system_init() does not need 'nr_servers' anymore as it is only used to define the 'interrupt-controller' node in the device tree. So let's just compute the value when calling spapr_dt_xics(). This also gives us an opportunity to simplify the xics_system_init() routine and introduce a specific spapr_ics_create() helper to create the sPAPR ICS object. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:41:55 +10:00
Benjamin Herrenschmidt	0722d05ad8	ppc/pnv: Add OCC model stub with interrupt support The OCC is an on-chip microcontroller based on a ppc405 core used for various power management tasks. It comes with a pile of additional hardware sitting on the PIB (aka XSCOM bus). At this point we don't emulate it (nor plan to do so). However there is one facility which is provided by the surrounding hardware that we do need, which is the interrupt generation facility. OPAL uses it to send itself interrupts under some circumstances and there are other uses around the corner. So this implement just enough to support this. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [clg: - updated for qemu-2.9 - changed the XSCOM interface to fit new model - QOMified the model ] Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:00:42 +10:00
Cédric Le Goater	54f59d786c	ppc/pnv: Add cut down PSI bridge model and hookup external interrupt The Processor Service Interface (PSI) Controller is one of the engines of the "Bridge" unit which connects the different interfaces to the Power Processor. This adds just enough of the PSI bridge to handle various on-chip and the one external interrupt. The rest of PSI has to do with the link to the IBM FSP service processor which we don't plan to emulate (not used on OpenPower machines). The ics_get() and ics_resend() handlers of the XICSFabric interface of the PowerNV machine are now defined to handle the Interrupt Control Source of PSI. The InterruptStatsProvider interface is also modified to dump the new ICS. Originally from Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:00:42 +10:00
Cédric Le Goater	bf5615e77c	ppc/pnv: add memory regions for the ICP registers This provides to a PowerNV chip (POWER8) access to the Interrupt Management area, which contains the registers of the Interrupt Control Presenters of each thread. These are used to accept, return, forward interrupts in the system. This area is modeled with a per-chip container memory region holding all the ICP registers. Each thread of a chip is then associated with its ICP registers using a memory subregion indexed by its PIR number in the overall region. The device tree is populated accordingly. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:00:42 +10:00
Cédric Le Goater	5509db4aec	ppc/pnv: add a helper to calculate MMIO addresses registers Some controllers (ICP, PSI) have a base register address which is calculated using the chip id. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:00:42 +10:00
Cédric Le Goater	99285aae16	ppc/pnv: add a PnvICPState object This provides a new ICPState object for the PowerNV machine (POWER8). Access to the Interrupt Management area is done though a memory region. It contains the registers of the Interrupt Control Presenters of each thread which are used to accept, return, forward interrupts in the system. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:00:42 +10:00
Cédric Le Goater	439071a92d	ppc/xics: add a realize() handler to ICPStateClass It will be used by derived classes in PowerNV for customization. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:00:42 +10:00
Cédric Le Goater	5bc8d26de2	spapr: allocate the ICPState object from under sPAPRCPUCore Today, all the ICPs are created before the CPUs, stored in an array under the sPAPR machine and linked to the CPU when the core threads are realized. This modeling brings some complexity when a lookup in the array is required and it can be simplified by allocating the ICPs when the CPUs are. This is the purpose of this proposal which introduces a new 'icp_type' field under the machine and creates the ICP objects of the right type (KVM or not) before the PowerPCCPU object are. This change allows more cleanups : the removal of the icps array under the sPAPR machine and the removal of the xics_get_cpu_index_by_dt_id() helper. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:00:42 +10:00
Cédric Le Goater	ad5d1add86	ppc/xics: introduce an 'intc' backlink under PowerPCCPU Today, the ICPState array of the sPAPR machine is indexed with 'cpu_index' of the CPUState. This numbering of CPUs is internal to QEMU and the guest only knows about what is exposed in the device tree, that is the 'cpu_dt_id'. This is why sPAPR uses the helper xics_get_cpu_index_by_dt_id() to do the mapping in a couple of places. To provide a more generic XICS layer, we need to abstract the IRQ 'server' number and remove any assumption made on its nature. It should not be used as a 'cpu_index' for lookups like xics_cpu_setup() and xics_cpu_destroy() do. To reach that goal, we choose to introduce a generic 'intc' backlink under PowerPCCPU, and let the machine core init routine do the ICPState lookup. The resulting object is passed on to xics_cpu_setup() which does the store under PowerPCCPU. The IRQ 'server' number in XICS is now generic. sPAPR uses 'cpu_dt_id' and PowerNV will use 'PIR' number. This also has the benefit of simplifying the sPAPR hcall routines which do not need to do any ICPState lookups anymore. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:00:42 +10:00
Sam Bobroff	e957f6a9b9	spapr: Workaround for broken radix guests For a little while around 4.9, Linux kernels that saw the radix bit in ibm,pa-features would attempt to set up the MMU as if they were a hypervisor, even if they were a guest, which would cause them to crash. Work around this by detecting pre-ISA 3.0 guests by their lack of that bit in option vector 1, and then removing the radix bit from ibm,pa-features. Note: This now requires regeneration of that node after CAS negotiation. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> [dwg: Fix style nits] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:00:41 +10:00
Sam Bobroff	9fb4541f58	spapr: Enable ISA 3.0 MMU mode selection via CAS Add the new node, /chosen/ibm,arch-vec-5-platform-support to the device tree. This allows the guest to determine which modes are supported by the hypervisor. Update the option vector processing in h_client_architecture_support() to handle the new MMU bits. This allows guests to request hash or radix mode and QEMU to create the guest's HPT at this time if it is necessary but hasn't yet been done. QEMU will terminate the guest if it requests an unavailable mode, as required by the architecture. Extend the ibm,pa-features node with the new ISA 3.0 values and set the radix bit if KVM supports radix mode. This probably won't be used directly by guests to determine the availability of radix mode (that is indicated by the new node added above) but the architecture requires that it be set when the hardware supports it. If QEMU is using KVM, and KVM is capable of running in radix mode, guests can be run in real-mode without allocating a HPT (because KVM will use a minimal RPT). So in this case, we avoid creating the HPT at reset time and later (during CAS) create it if it is necessary. ISA 3.0 guests will now begin to call h_register_process_table(), which has been added previously. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> [dwg: Strip some unneeded prefix from error messages] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:00:41 +10:00
Suraj Jitindar Singh	b4db54132f	target/ppc: Implement H_REGISTER_PROCESS_TABLE H_CALL The H_REGISTER_PROCESS_TABLE H_CALL is used by a guest to indicate to the hypervisor where in memory its process table is and how translation should be performed using this process table. Provide the implementation of this H_CALL for a guest. We first check for invalid flags, then parse the flags to determine the operation, and then check the other parameters for valid values based on the operation (register new table/deregister table/maintain registration). The process table is then stored in the appropriate location and registered with the hypervisor (if running under KVM), and the LPCR_[UPRT/GTSE] bits are updated as required. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> [dwg: Correct missing prototype and uninitialized variable] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:00:41 +10:00
Suraj Jitindar Singh	d77a98b015	target/ppc: Add new H-CALL shells for in memory table translation The use of the new in memory tables introduced in ISAv3.00 for translation, also referred to as process tables, requires the introduction of 3 new H-CALLs; H_REGISTER_PROCESS_TABLE, H_CLEAN_SLB, and H_INVALIDATE_PID. Add shells for each of these and register them as the hypercall handlers. Currently they all log an unimplemented hypercall and return H_FUNCTION. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> [dwg: Fix style nits] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:00:41 +10:00
Cédric Le Goater	147ff8079e	ppc/spapr: QOM'ify sPAPRRTCState Also use an 'sPAPRRTCState' attribute under the sPAPR machine to hold the RTC object. Overall, these changes remove an unnecessary and implicit dependency on SysBus. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:00:41 +10:00
Laurent Vivier	fe6824d126	spapr: fix memory hot-unplugging If, once the kernel has booted, we try to remove a memory hotplugged while the kernel was not started, QEMU crashes on an assert: qemu-system-ppc64: hw/virtio/vhost.c:651: vhost_commit: Assertion `r >= 0' failed. ... #4 in vhost_commit #5 in memory_region_transaction_commit #6 in pc_dimm_memory_unplug #7 in spapr_memory_unplug #8 spapr_machine_device_unplug #9 in hotplug_handler_unplug #10 in spapr_lmb_release #11 in detach #12 in set_allocation_state #13 in rtas_set_indicator ... If we take a closer look to the guest kernel log, we can see when we try to unplug the memory: pseries-hotplug-mem: Attempting to hot-add 4 LMB(s) What happens: 1- The kernel has ignored the memory hotplug event because it was not started when it was generated. 2- When we hot-unplug the memory, QEMU starts to remove the memory, generates an hot-unplug event, and signals the kernel of the incoming new event 3- as the kernel is started, on the QEMU signal, it reads the event list, decodes the hotplug event and tries to finish the hotplugging. 4- QEMU receive the the hotplug notification while it is trying to hot-unplug the memory. This moves the memory DRC to an invalid state This patch prevents this by not allowing to set the allocation state to USABLE while the DRC is awaiting release. RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1432382 Signed-off-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-29 11:35:16 +11:00

1 2 3 4 5 ...

366 Commits