linux/arch/arm
Lorenzo Pieralisi 64270d82d4 ARM: vexpress: tc2: fix hotplug/idle/kexec race on cluster power down
On the TC2 testchip, when all CPUs in a cluster enter standbywfi
and commit a power down request, the power controller will wait
for standbywfil2 coming from L2 cache controller to shut the
cluster down.
By the time all CPUs in a cluster commit a power down request
and enter wfi, the power controller cannot backtrack, or put it
another way, a CPU must not be allowed to complete execution
independently of the power controller, the only way for it to
resume properly must be upon wake-up IRQ pending and subsequent
reset triggered from the power controller.

Current MCPM back-end for TC2 disables the GIC CPU IF only when
power down is committed through the tc2_pm_suspend() method, that
makes sense since a suspended CPU is still online and can receive
interrupts whereas a hotplugged CPU, since it is offline,
migrated all IRQs and shutdown the per-CPU peripherals, hence
their PPIs.

The flaw with this reasoning is the following. If all CPUs in
a clusters are entering a power down state either through CPU
idle or CPU hotplug, when the last man successfully completes
the MCPM power down sequence (and executes wfi), power controller
waits for L2 wfi signal to quiesce the cluster and shut it down.
If, when all CPUs are sitting in wfi, an online CPU hotplugs back
in one of the CPUs in the cluster being shutdown, that CPU
receives an IPI that causes wfi to complete (since tc2_pm_down()
method does not disable the GIC CPU IF in that case - CPU being
hotplugged out, not idle) and the power controller will never see
the stanbywfil2 signal coming from L2 that is required for
shutdown to happen and the system deadlocks.

Further to this issue, kexec hotplugs secondary CPUs out during
kernel reload/restart.
Because kexec may (deliberately) trash the old kernel text, it is
not OK for CPUs to follow the MCPM soft reboot path, since
instructions after the WFI may have been replaced by kexec.

If tc2_pm_down() does not disable the GIC cpu interface, there is a
race between CPU powerdown in the old kernel and the IPI from the
new kernel that triggers secondary boot, particularly if the
powerdown is slow (due to L2 cache cleaning for example).  If the
new kernel wins the race, the affected CPU(s) will not really be
reset and may execute garbage after the WFI.

The only solution to this problem consists in disabling the GIC
CPU IF on a CPU committed to power down regardless of the power
down entry method (CPU hotplug or CPU idle). This way, CPU wake-up
is under power controller control, which prevents unexpected wfi
exit caused by a pending IRQ.

This patch moves the GIC CPU IF disable call in the TC2 MCPM
implementation from the tc2_pm_suspend() method to the
tc2_pm_down() method to fix the mentioned race condition(s).

Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Tested-by: Dave Martin <Dave.Martin@arm.com> (for kexec)
Signed-off-by: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Olof Johansson <olof@lixom.net>
2013-09-30 09:27:36 -07:00
..
boot Fourth Round of Renesas ARM based SoC fixes for v3.12 2013-09-30 09:24:20 -07:00
common Merge branch 'timers/core' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-09-16 16:10:26 -04:00
configs ARM: multi_v7: add HREFv60 to multi_v7 defconfig 2013-09-18 12:16:08 -07:00
crypto
include ARM: SoC late changes for v3.12 2013-09-09 16:35:29 -07:00
kernel ARM: SoC platform changes for 3.12 2013-09-06 13:30:06 -07:00
kvm Merge branch 'for-linus' of git://git.linaro.org/people/rmk/linux-arm 2013-09-05 18:07:32 -07:00
lib ARM: 7835/2: fix modular build of xor_blocks() with NEON enabled 2013-09-09 15:24:47 +01:00
mach-at91 ARM: at91: remove IRQF_DISABLED 2013-09-19 15:36:35 +02:00
mach-bcm
mach-bcm2835
mach-clps711x
mach-cns3xxx
mach-davinci ARM: davinci: dm365 evm: fix unused variable warning 2013-09-19 14:56:03 +05:30
mach-dove ARM: SoC board updates for 3.12 2013-09-06 13:34:43 -07:00
mach-ebsa110
mach-ep93xx ARM: SoC fixes for 3.12 2013-09-12 13:59:31 -07:00
mach-exynos ARM: SoC late changes for v3.12 2013-09-09 16:35:29 -07:00
mach-footbridge
mach-gemini
mach-highbank ARM: SoC late changes for v3.12 2013-09-09 16:35:29 -07:00
mach-imx ARM: imx: i.mx6d/q: disable the double linefill feature of PL310 2013-09-17 10:04:24 +08:00
mach-integrator ARM: mach-integrator: Add stub for pci_v3_early_init() for !CONFIG_PCI 2013-09-25 21:59:52 -07:00
mach-iop13xx
mach-iop32x
mach-iop33x
mach-ixp4xx
mach-keystone Omap fixes for the merge window that are not urgent enough 2013-08-29 19:12:04 -07:00
mach-kirkwood ARM: SoC board updates for 3.12 2013-09-06 13:34:43 -07:00
mach-ks8695
mach-lpc32xx
mach-mmp ARM: SoC late changes for v3.12 2013-09-09 16:35:29 -07:00
mach-msm ARM: SoC cleanups for 3.12 2013-09-06 13:21:16 -07:00
mach-mv78xx0 ARM: SoC platform changes for 3.12 2013-09-06 13:30:06 -07:00
mach-mvebu ARM: mvebu: add missing of_node_put() to fix reference leak 2013-09-18 16:40:53 +00:00
mach-mxs
mach-netx
mach-nomadik
mach-nspire
mach-omap1
mach-omap2 ARM: OMAP2+: mux: fix trivial typo in name 2013-09-18 12:02:01 -07:00
mach-orion5x ARM: SoC board updates for 3.12 2013-09-06 13:34:43 -07:00
mach-picoxcell
mach-prima2 ARM: SoC platform changes for 3.12 2013-09-06 13:30:06 -07:00
mach-pxa ARM: SoC DT updates for 3.12 2013-09-06 13:26:27 -07:00
mach-realview ARM: SoC cleanups for 3.12 2013-09-06 13:21:16 -07:00
mach-rockchip
mach-rpc
mach-s3c24xx
mach-s3c64xx
mach-s5p64x0
mach-s5pc100
mach-s5pv210
mach-sa1100 ARM: sa1100: collie.c: fall back to jedec_probe flash detection 2013-09-18 08:20:27 -07:00
mach-shark
mach-shmobile ARM: shmobile: armadillo: fixup ether pinctrl naming 2013-09-22 21:10:31 +09:00
mach-socfpga
mach-spear ARM: SoC cleanups for 3.12 2013-09-06 13:21:16 -07:00
mach-sti
mach-sunxi
mach-tegra ARM: SoC platform changes for 3.12 2013-09-06 13:30:06 -07:00
mach-u300 ARM: u300: hide submenus 2013-09-18 08:16:46 -07:00
mach-ux500 ARM: ux500: disable outer cache debug 2013-09-17 09:08:13 -07:00
mach-versatile Merge branch 'versatile/fixes' into fixes 2013-09-09 17:31:04 -07:00
mach-vexpress ARM: vexpress: tc2: fix hotplug/idle/kexec race on cluster power down 2013-09-30 09:27:36 -07:00
mach-virt
mach-vt8500
mach-w90x900
mach-zynq
mm arch: mm: pass userspace fault flag to generic fault handler 2013-09-12 15:38:01 -07:00
net
nwfpe
oprofile
plat-iop
plat-omap ARM: SoC platform changes for 3.12 2013-09-06 13:30:06 -07:00
plat-orion
plat-pxa ARM: pxa: ssp: Check return values from phandle lookups 2013-09-09 17:14:09 -07:00
plat-samsung ARM: SoC cleanups for 3.12 2013-09-06 13:21:16 -07:00
plat-versatile
tools
vfp
xen Linux 3.11-rc7 2013-09-09 12:05:37 -04:00
Kconfig Remove GENERIC_HARDIRQ config option 2013-09-13 15:09:52 +02:00
Kconfig-nommu
Kconfig.debug ARM: SoC cleanups for 3.12 2013-09-06 13:21:16 -07:00
Makefile