All instances of this device are now coming from device tree-
enabled platforms probing without using platform data.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Since we still need to rely on a mix of device tree initialized
drivers and legacy platform data initialize drivers, let's fix
the passing of platform data to twl4030-gpio.
As the twl4030 GPIO is initialized by twl-core.c, we need to register
the auxdata for twl4030 GPIO in twl-core.c.
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Since currently nobody uses TWL603x platform data and all new
users will supply it through device tree, handling of these
data within twl-core will never be used, so remove it.
Signed-off-by: Ruslan Ruslichenko <ruslan.ruslichenko@globallogic.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Add "const" to "static struct regmap_irq" and "static struct
regmap_config".
Acked-by: Sangbeom Kim <sbkim73@samsung.com>
Reviewed-by: Mark Brown <broonie@linaro.org>
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Match max14577 regulator driver by of_compatible specified in mfd_cell.
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
This patch adds max14577 core/irq driver to support MUIC(Micro USB IC)
device and charger device and support irq domain method to control
internal interrupt of max14577 device. Also, this patch supports DT
binding with max14577_i2c_parse_dt().
The MAXIM 14577 chip contains Micro-USB Interface Circuit and Li+ Battery
Charger. It contains accessory and USB charger detection logic. It supports
USB 2.0 Hi-Speed, UART and stereo audio signals over Micro-USB connector.
The battery charger is compliant with the USB Battery Charging Specification
Revision 1.1. It has also SFOUT LDO output for powering USB devices.
Reviewed-by: Mark Brown <broonie@linaro.org>
Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Add PM suspend/resume ops to the sec MFD core driver and make it a wake
up source. This allows proper waking from suspend to RAM and also fixes
broken interrupts after resuming:
[ 42.705703] sec_pmic 7-0066: Failed to read IRQ status: -5
Interrupts stop working after first resume initiated by them (e.g. by
RTC Alarm interrupt) because interrupt registers were not cleared properly.
When device is woken up from suspend by RTC Alarm, an interrupt occurs
before resuming I2C bus controller. The interrupt is handled by
regmap_irq_thread which tries to read RTC registers. This read fails
(I2C is still suspended) and RTC Alarm interrupt is disabled.
Disable the S5M8767 interrupts during suspend (disable_irq()) and enable
them during resume so the device will be still woken up but the interrupt
won't happen before resuming I2C bus.
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Rename cros_ec_{probe,remove}_i2c() to cros_ec_i2c_{probe,remove}() for
consistency.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Rename cros_ec_{probe,remove}_spi() to cros_ec_spi_{probe,remove}() for
consistency.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
The driver core clears the driver data to NULL after device_release
or on probe failure. Thus, it is not needed to manually clear the
device driver data to NULL.
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Enable regmap's regcache for the audio registers:
i2c address 0x49, register range 0x01 - 0x49
Mark all other registers as volatile to avoid any side effect for the non
audio functions behind 0x49 i2c address.
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Acked-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
If the regcache is enabled on the regmap module drivers might need to access
to HW register(s) in certain cases in cache bypass mode.
As an example of this is the audio block's ANAMICL register. In normal
operation the content can be cached but during initialization one bit from
the register need to be monitored. With the twl_set_regcache_bypass() the
client driver can switch regcache bypass on and off when it is needed so
we can utilize the regcache for more registers.
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Acked-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
The new twl_get_regmap() function will return a pointer to the regmap needed
for the given module.
Since both read and write function were using the same code to do the lookup
we can reuse this in both places to simplify the code.
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Acked-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
The ADC driver always programs all possible ADC values and discards
them except for the value IIO asked for. On the am335x-evm the driver
programs four values and it takes 500us to gather them. Reducing the number
of conversations down to the (required) one also reduces the busy loop down
to 125us.
This leads to another error, namely the FIFOCOUNT register is sometimes
(like one out of 10 attempts) not updated in time leading to EBUSY.
The next read has the FIFOCOUNT register updated.
Checking for the ADCSTAT register for being idle isn't a good choice either.
The problem is that if TSC is used at the same time, the HW completes the
conversation for ADC *and* before the driver noticed it, the HW begins to
perform a TSC conversation and so the driver never seen the HW idle. The
next time we would have two values in the FIFO but since the driver reads
everything we always see the current one.
So instead of polling for the IDLE bit in ADCStatus register, we should
check the FIFOCOUNT register. It should be one instead of zero because we
request one value.
This change in turn leads to another error. Sometimes if TSC & ADC are
used together the TSC starts generating interrupts even if nobody
actually touched the touchscreen. The interrupts seem valid because TSC's
FIFO is filled with values for each channel of the TSC. This condition stops
after a few ADC reads but will occur again. Not good.
On top of this (even without the changes I just mentioned) there is a ADC
& TSC lockup condition which was reported to me by Jeff Lance including the
following test case:
A busy loop of "cat /sys/bus/iio/devices/iio\:device0/in_voltage4_raw"
and a mug on touch screen. With this setup, the hardware will lockup after
something between 20 minutes and it could take up to a couple of hours.
During that lockup, the ADCSTAT register says 0x30 (or 0x70) which means
STEP_ID = IDLE and FSM_BUSY = yes. That means the hardware says that it is
idle and busy at the same time which is an invalid condition.
For all this reasons I decided to rework this TSC/ADC part and add a
handshake / synchronization here:
First the ADC signals that it needs the HW and writes a 0 mask into the
SE register. The HW (if active) will complete the current conversation
and become idle. The TSC driver will gather the values from the FIFO
(woken up by an interrupt) and won't "enable" another conversation.
Instead it will wake up the ADC driver which is already waiting. The ADC
driver will start "its" conversation and once it is done, it will
enable the TSC steps so the TSC will work again.
After this rework I haven't observed the lockup so far. Plus the busy
loop has been reduced from 500us to 125us.
The continues-read mode remains unchanged.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
The update of the SE register in MFD doesn't look right as it has
nothing to do with it. The better place to do it is in TSC driver (which
is already doing it) and in the ADC driver which needs this only in the
continues mode.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
The purpose of reg_se_cache has been defeated. It should avoid the
read-back of the register to avoid the latency and the fact that the
bits are reset to 0 after the individual conversation took place.
The reason why this is required like this to work, is that read-back of
the register removes the bits of the ADC so they do not start another
conversation after the register is re-written from the TSC side for the
update.
To avoid the not required read-back I introduce a "set once" variant which
does not update the cache mask. After the conversation completes, the
bit is removed from the SE register anyway and we don't plan a new
conversation "any time soon". The current set function is renamed to
set_cache to distinguish the two operations.
This is a small preparation for a larger sync-rework.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Acked-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Since the "recent" changes, am335x_tsc_se_update() has no longer any
users outside of this file so make it local.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
It somehow looks like the ending bracket belongs to the if statement but
it does belong to the while loop. This patch moves the bracket where it
belongs.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
As of commit 03e361b25e ("mfd: Stop setting
refcounting pointers in original mfd_cell arrays"), the "cell" parameter of
mfd_add_devices() is "const" again. Hence make all cell data passed to
mfd_add_devices() const where possible.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
As of commit 03e361b25e ("mfd: Stop setting
refcounting pointers in original mfd_cell arrays"), the "cell" parameter of
mfd_add_devices() is "const" again. Hence make all cell data passed to
mfd_add_devices() const where possible.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
As of commit 03e361b25e ("mfd: Stop setting
refcounting pointers in original mfd_cell arrays"), the "cell" parameter of
mfd_add_devices() is "const" again. Hence make all cell data passed to
mfd_add_devices() const where possible.
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: devel@driverdev.osuosl.org
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
As of commit 03e361b25e ("mfd: Stop setting
refcounting pointers in original mfd_cell arrays"), the "cell" parameter of
mfd_add_devices() is "const" again. Hence make all cell data passed to
mfd_add_devices() const where possible.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
As of commit 03e361b25e ("mfd: Stop setting
refcounting pointers in original mfd_cell arrays"), the "cell" parameter of
mfd_add_devices() is "const" again. Hence make all cell data passed to
mfd_add_devices() const where possible.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
As of commit 03e361b25e ("mfd: Stop setting
refcounting pointers in original mfd_cell arrays"), the "cell" parameter of
mfd_add_devices() is "const" again. Hence make all cell data passed to
mfd_add_devices() const where possible.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
As of commit 03e361b25e ("mfd: Stop setting
refcounting pointers in original mfd_cell arrays"), the "cell" parameter of
mfd_add_devices() is "const" again. Hence make all cell data passed to
mfd_add_devices() const where possible.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
As of commit 03e361b25e ("mfd: Stop setting
refcounting pointers in original mfd_cell arrays"), the "cell" parameter of
mfd_add_devices() is "const" again. Hence make all cell data passed to
mfd_add_devices() const where possible.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
As of commit 03e361b25e ("mfd: Stop setting
refcounting pointers in original mfd_cell arrays"), the "cell" parameter of
mfd_add_devices() is "const" again. Hence make all cell data passed to
mfd_add_devices() const where possible.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
As of commit 03e361b25e ("mfd: Stop setting
refcounting pointers in original mfd_cell arrays"), the "cell" parameter of
mfd_add_devices() is "const" again. Hence make all cell data passed to
mfd_add_devices() const where possible.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
The dev_cont() symbol doesn't exist, so replace it with pr_cont(). While
at it, also append a newline to the debug output to make it look nicer.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
50 us is not a long enough delay between EC transactions. At least 70 us
are needed for the 16 MHz STM32L part. Increase the delay to 200 us for
an extra safety margin.
Reviewed-by: Randall Spangler <rspangler@chromium.org>
Signed-off-by: Derek Basehore <dbasehore@chromium.org>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
All OMAP IP blocks expect LE data, but CPU may operate in BE mode.
Need to use endian neutral functions to read/write h/w registers.
I.e instead of __raw_read[lw] and __raw_write[lw] functions code
need to use read[lw]_relaxed and write[lw]_relaxed functions.
If the first simply reads/writes register, the second will byteswap
it if host operates in BE mode.
Changes are trivial sed like replacement of __raw_xxx functions
with xxx_relaxed variant.
Signed-off-by: Victor Kamensky <victor.kamensky@linaro.org>
Signed-off-by: Taras Kondratiuk <taras.kondratiuk@linaro.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
All OMAP IP blocks expect LE data, but CPU may operate in BE mode.
Need to use endian neutral functions to read/write h/w registers.
I.e instead of __raw_read[lw] and __raw_write[lw] functions code
need to use read[lw]_relaxed and write[lw]_relaxed functions.
If the first simply reads/writes register, the second will byteswap
it if host operates in BE mode.
Changes are trivial sed like replacement of __raw_xxx functions
with xxx_relaxed variant.
Signed-off-by: Victor Kamensky <victor.kamensky@linaro.org>
Signed-off-by: Taras Kondratiuk <taras.kondratiuk@linaro.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
We dereference "desc" before check if it is NULL. I've shifted it
around so we check first before dereferencing.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
- VGA switcheroo was broken for some users as a result of the ACPI-based
PCI hotplug (ACPIPHP) changes in 3.12, because some previously ignored
hotplug events started to be handled. The fix causes them to be
ignored again.
- There are two more issues related to cpufreq's suspend/resume handling
changes from the 3.12 cycle addressed by Viresh Kumar's fixes.
- intel_pstate triggers a divide error in a timer function if the P-state
information it needs is missing during initialization. This leads to
kernel panics on nested KVM clients and is fixed by failing the
initialization cleanly in those cases.
- PCI initalization code changes during the 3.9 cycle uncovered BIOS
issues related to ACPI wakeup notifications (some BIOSes send them
for devices that aren't supposed to support ACPI wakeup). Work around
them by installing an ACPI wakeup notify handler for all PCI devices
with ACPI support.
- The Calxeda cpuilde driver's probe function is tagged as __init, which
is incorrect and causes a section mismatch to occur during build. Fix
from Andre Przywara removes the __init tag from there.
- During the 3.12 cycle ACPIPHP started to print warnings about missing
_ADR for devices that legitimately don't have it. Fix from Toshi Kani
makes it only print the warnings where they make sense.
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iQIcBAABCAAGBQJSxrJgAAoJEILEb/54YlRxRPkP/ifzrrVhdzqXIEy44b93JeDx
oSmZW6yTO51GZlDx2bjt6CGJcIUDC4ExYV6S2tB44/DL19CYdIxi7oBaXtUvzGRs
oZ6B1wfvKOIxZ0RQguaGd1uerQU304CGwUXu/jpRZ/UuZZFKq5Uts6O3bilGzCfR
Y+MUH+qECwdBXaFHUISdWFsa3lxj0U0kglszh+DsxwS4gy/pLbCu5fKLgHLuNNQC
hhEEToQ6uF4o8hbkGJvgUPo3V3aUSXObgvJh4ntP09YE1AEJScLB4wKmqL0zN8Qj
pbBf1WC5OpGXv8zGM9ErrY64YaKA36uhJvOi6RtBGLbG+pYM6E6IM9zNf4Ku+T79
JNEulpq27aEx2JghNSgMFYQZEOGTH+q24iXZdZlOIvqWpMymATlqP/gAQQIpg3VC
OIIdocMFRsbgwFXf41uyUqs458fg5xREz5k6geWZeyriM45wFShR+JnMopQWc5OB
a3sbcWUShFBL1T0pqYR4SDLDvH4NdEP2NKO2jlqMXUewLXsVRRt/42etGoe0rI3C
cMWPQq7z0GNN+NboUviqwHdxUKqONWGt+pd/3u8FI/Y1IlXEeXQYGawhSu81uCpT
5gLaKDkwOrCSwOw68Msuod0Cce6TnoTowi6hP2aAEu8mDJwQY+toqA3+CPoO8nty
DdhZjP1afEgsVVyjErX4
=LXh0
-----END PGP SIGNATURE-----
Merge tag 'pm+acpi-3.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI and PM fixes and new device IDs from Rafael Wysocki:
"These commits, except for one, are regression fixes and the remaining
one fixes a divide error leading to a kernel panic. The majority of
the regressions fixed here were introduced during the 3.12 cycle, one
of them is from this cycle and one is older.
Specifics:
- VGA switcheroo was broken for some users as a result of the
ACPI-based PCI hotplug (ACPIPHP) changes in 3.12, because some
previously ignored hotplug events started to be handled. The fix
causes them to be ignored again.
- There are two more issues related to cpufreq's suspend/resume
handling changes from the 3.12 cycle addressed by Viresh Kumar's
fixes.
- intel_pstate triggers a divide error in a timer function if the
P-state information it needs is missing during initialization.
This leads to kernel panics on nested KVM clients and is fixed by
failing the initialization cleanly in those cases.
- PCI initalization code changes during the 3.9 cycle uncovered BIOS
issues related to ACPI wakeup notifications (some BIOSes send them
for devices that aren't supposed to support ACPI wakeup). Work
around them by installing an ACPI wakeup notify handler for all PCI
devices with ACPI support.
- The Calxeda cpuilde driver's probe function is tagged as __init,
which is incorrect and causes a section mismatch to occur during
build. Fix from Andre Przywara removes the __init tag from there.
- During the 3.12 cycle ACPIPHP started to print warnings about
missing _ADR for devices that legitimately don't have it. Fix from
Toshi Kani makes it only print the warnings where they make sense"
* tag 'pm+acpi-3.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPIPHP / radeon / nouveau: Fix VGA switcheroo problem related to hotplug
intel_pstate: Fail initialization if P-state information is missing
ARM/cpuidle: remove __init tag from Calxeda cpuidle probe function
PCI / ACPI: Install wakeup notify handlers for all PCI devs with ACPI
cpufreq: preserve user_policy across suspend/resume
cpufreq: Clean up after a failing light-weight initialization
ACPI / PCI / hotplug: Avoid warning when _ADR not present
Merge patches from Andrew Morton:
"Ten fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
epoll: do not take the nested ep->mtx on EPOLL_CTL_DEL
sh: add EXPORT_SYMBOL(min_low_pfn) and EXPORT_SYMBOL(max_low_pfn) to sh_ksyms_32.c
drivers/dma/ioat/dma.c: check DMA mapping error in ioat_dma_self_test()
mm/memory-failure.c: transfer page count from head page to tail page after split thp
MAINTAINERS: set up proper record for Xilinx Zynq
mm: remove bogus warning in copy_huge_pmd()
memcg: fix memcg_size() calculation
mm: fix use-after-free in sys_remap_file_pages
mm: munlock: fix deadlock in __munlock_pagevec()
mm: munlock: fix a bug where THP tail page is encountered
The EPOLL_CTL_DEL path of epoll contains a classic, ab-ba deadlock.
That is, epoll_ctl(a, EPOLL_CTL_DEL, b, x), will deadlock with
epoll_ctl(b, EPOLL_CTL_DEL, a, x). The deadlock was introduced with
commmit 67347fe4e6 ("epoll: do not take global 'epmutex' for simple
topologies").
The acquistion of the ep->mtx for the destination 'ep' was added such
that a concurrent EPOLL_CTL_ADD operation would see the correct state of
the ep (Specifically, the check for '!list_empty(&f.file->f_ep_links')
However, by simply not acquiring the lock, we do not serialize behind
the ep->mtx from the add path, and thus may perform a full path check
when if we had waited a little longer it may not have been necessary.
However, this is a transient state, and performing the full loop
checking in this case is not harmful.
The important point is that we wouldn't miss doing the full loop
checking when required, since EPOLL_CTL_ADD always locks any 'ep's that
its operating upon. The reason we don't need to do lock ordering in the
add path, is that we are already are holding the global 'epmutex'
whenever we do the double lock. Further, the original posting of this
patch, which was tested for the intended performance gains, did not
perform this additional locking.
Signed-off-by: Jason Baron <jbaron@akamai.com>
Cc: Nathan Zimmer <nzimmer@sgi.com>
Cc: Eric Wong <normalperson@yhbt.net>
Cc: Nelson Elhage <nelhage@nelhage.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Min_low_pfn and max_low_pfn were used in pfn_valid macro if defined
CONFIG_FLATMEM. When the functions that use the pfn_valid is used in
driver module, max_low_pfn and min_low_pfn is to undefined, and fail to
build.
ERROR: "min_low_pfn" [drivers/block/aoe/aoe.ko] undefined!
ERROR: "max_low_pfn" [drivers/block/aoe/aoe.ko] undefined!
make[2]: *** [__modpost] Error 1
make[1]: *** [modules] Error 2
This patch fix this problem.
Signed-off-by: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com>
Cc: Kuninori Morimoto <kuninori.morimoto.gx@gmail.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Memory failures on thp tail pages cause kernel panic like below:
mce: [Hardware Error]: Machine check events logged
MCE exception done on CPU 7
BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
IP: [<ffffffff811b7cd1>] dequeue_hwpoisoned_huge_page+0x131/0x1e0
PGD bae42067 PUD ba47d067 PMD 0
Oops: 0000 [#1] SMP
...
CPU: 7 PID: 128 Comm: kworker/7:2 Tainted: G M O 3.13.0-rc4-131217-1558-00003-g83b7df08e462 #25
...
Call Trace:
me_huge_page+0x3e/0x50
memory_failure+0x4bb/0xc20
mce_process_work+0x3e/0x70
process_one_work+0x171/0x420
worker_thread+0x11b/0x3a0
? manage_workers.isra.25+0x2b0/0x2b0
kthread+0xe4/0x100
? kthread_create_on_node+0x190/0x190
ret_from_fork+0x7c/0xb0
? kthread_create_on_node+0x190/0x190
...
RIP dequeue_hwpoisoned_huge_page+0x131/0x1e0
CR2: 0000000000000058
The reasoning of this problem is shown below:
- when we have a memory error on a thp tail page, the memory error
handler grabs a refcount of the head page to keep the thp under us.
- Before unmapping the error page from processes, we split the thp,
where page refcounts of both of head/tail pages don't change.
- Then we call try_to_unmap() over the error page (which was a tail
page before). We didn't pin the error page to handle the memory error,
this error page is freed and removed from LRU list.
- We never have the error page on LRU list, so the first page state
check returns "unknown page," then we move to the second check
with the saved page flag.
- The saved page flag have PG_tail set, so the second page state check
returns "hugepage."
- We call me_huge_page() for freed error page, then we hit the above panic.
The root cause is that we didn't move refcount from the head page to the
tail page after split thp. So this patch suggests to do this.
This panic was introduced by commit 524fca1e73 ("HWPOISON: fix
misjudgement of page_action() for errors on mlocked pages"). Note that we
did have the same refcount problem before this commit, but it was just
ignored because we had only first page state check which returned "unknown
page." The commit changed the refcount problem from "doesn't work" to
"kernel panic."
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: <stable@vger.kernel.org> [3.9+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sasha Levin reported the following warning being triggered
WARNING: CPU: 28 PID: 35287 at mm/huge_memory.c:887 copy_huge_pmd+0x145/ 0x3a0()
Call Trace:
copy_huge_pmd+0x145/0x3a0
copy_page_range+0x3f2/0x560
dup_mmap+0x2c9/0x3d0
dup_mm+0xad/0x150
copy_process+0xa68/0x12e0
do_fork+0x96/0x270
SyS_clone+0x16/0x20
stub_clone+0x69/0x90
This warning was introduced by "mm: numa: Avoid unnecessary disruption
of NUMA hinting during migration" for paranoia reasons but the warning
is bogus. I was thinking of parallel races between NUMA hinting faults
and forks but this warning would also be triggered by a parallel reclaim
splitting a THP during a fork. Remote the bogus warning.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The mem_cgroup structure contains nr_node_ids pointers to
mem_cgroup_per_node objects, not the objects themselves.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Glauber Costa <glommer@openvz.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>