578116 Commits

Author SHA1 Message Date
Linus Torvalds
555f8160b2 regulator: Updates for v4.6
This has been an extremely quiet release for the regulator API, aside
 from bugfixes and small enhancements the only thing that really stands
 out are the new drivers for Action Semiconductors ACT8945A, HiSilicon
 HI665x, and the Maxim MAX20024 and MAX77620.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJW58EWAAoJECTWi3JdVIfQMGMH/1v4ojMy0X/yPlFk/523u7uq
 6ZJQ4zSTPyv0ccR+hsDjZ81BmM/vATrnQOllvTSwfnGMcs5amE3/9EhzdF5EciCB
 8gUff/GfugXNqpVgpfYYfPhbQVTGllOFnZPmXsLx2dGlu094i69f4fZtoJCFlkmo
 5c0L5Fnkt4w5XtFAA6ILuRtpwZ4jfJZ3VFKrtYwkgLcKw9GgSEqhVKvTPfkNqtjg
 hzEl2P2K/Ug2uTqeZusKH30Sa1J7MW4B+C9dUVyTRB29+uA60y0foGTQjW9uCzEi
 l1sqY9VDSKp9WwnshVrgVFViAhkrA7fWl4qaC/EDe7avvbh5SBpR/+gjZbmdRJU=
 =jEl0
 -----END PGP SIGNATURE-----

Merge tag 'regulator-v4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator

Pull regulator updates from Mark Brown:
 "This has been an extremely quiet release for the regulator API, aside
  from bugfixes and small enhancements the only thing that really stands
  out are the new drivers for Action Semiconductors ACT8945A, HiSilicon
  HI665x, and the Maxim MAX20024 and MAX77620"

* tag 'regulator-v4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: (46 commits)
  regulator: pwm: Add support to have multiple instance of pwm regulator
  regulator: pwm: Fix calculation of voltage-to-duty cycle
  regulator: of: Use of_property_read_u32() for reading min/max
  regulator: pv88060: fix incorrect clear of event register
  regulator: pv88090: fix incorrect clear of event register
  regulator: max77620: Add support to configure active-discharge
  regulator: core: Add support for active-discharge configuration
  regulator: helper: Add helper to configure active-discharge using regmap
  regulator: core: Add support for active-discharge configuration
  regulator: DT: Add DT property for active-discharge configuration
  regulator: act8865: Specify fixed voltage of 3.3V for ACT8600's REG9
  regulator: act8865: Rename platform_data field to init_data
  regulator: act8865: Remove "static" from local variable
  ASoC: cs4271: add regulator consumer support
  regulator: max77620: Remove duplicate module alias
  regulator: max77620: Eliminate duplicate code
  regulator: max77620: Remove unused fields
  regulator: core: fix crash in error path of regulator_register
  regulator: core: Request GPIO before creating sysfs entries
  regulator: gpio: don't print error on EPROBE_DEFER
  ...
2016-03-15 21:34:35 -07:00
Linus Torvalds
b7aae4a9d0 regmap: Updates for v4.6
This has been a very busy release for regmap, not just in cleaning up
 the mess we got ourselves into with the endianness handling but also in
 other areas too:
 
  - Fixes for the endianness handling so that we now explicltly default
    to little endian (the code used to do this by accident).  This
    fixes handling of explictly specified endianness on big endian
    systems.
  - Optimisation of the implementation of register striding.
  - A refectoring of the _update_bits() code to reduce duplication.
  - Fixes and enhancements for the interrupt implementation which
    make it easier to use in a wider range of systems.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJW5v4RAAoJECTWi3JdVIfQcP0H/R+22ZgPNo76YUHtnMi5zCiW
 /TtUzcKg3QXMXpCxQx2p8shRg1eckjb2zeImDJKPteIsO9y04lrlO2ljVnio/Ut9
 6uBGwmmcCa9/haL7waDrw5kQKmCjNAcXhAn7etsRcvpMaPdEwL71ZWfjCY3HtwyJ
 zGoArFNveHzTeZKRNzGZemSA7r1TOLIkHNvS4yRD4H9K7bGfn1HWwumCgurTvpiB
 nxuhpwO7GQy28fPQRfBlfg2TGI+B8GD+VV4qwWnGYR2pijyIE8Kxqawizxueq70f
 YsjiRNepxgPSdWHGMdjcUwQVB2iOSh8LvObxuyW8oGZEFeUxC3JHUiJ79zkKGsg=
 =PBtQ
 -----END PGP SIGNATURE-----

Merge tag 'regmap-v4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap

Pull regmap updates from Mark Brown:
 "This has been a very busy release for regmap, not just in cleaning up
  the mess we got ourselves into with the endianness handling but also
  in other areas too:

   - Fixes for the endianness handling so that we now explicitly default
     to little endian (the code used to do this by accident).  This
     fixes handling of explictly specified endianness on big endian
     systems.

   - Optimisation of the implementation of register striding.

   - A refectoring of the _update_bits() code to reduce duplication.

   - Fixes and enhancements for the interrupt implementation which make
     it easier to use in a wider range of systems"

* tag 'regmap-v4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap: (28 commits)
  regmap: irq: add devm apis for regmap_{add,del}_irq_chip
  regmap: replace regmap_write_bits()
  regmap: irq: Enable irq retriggering for nested irqs
  regmap: add regmap_fields_force_xxx() macros
  regmap: add regmap_field_force_xxx() macros
  regmap: merge regmap_fields_update_bits() into macro
  regmap: merge regmap_fields_write() into macro
  regmap: add regmap_fields_update_bits_base()
  regmap: merge regmap_field_update_bits() into macro
  regmap: merge regmap_field_write() into macro
  regmap: add regmap_field_update_bits_base()
  regmap: merge regmap_update_bits_check_async() into macro
  regmap: merge regmap_update_bits_check() into macro
  regmap: merge regmap_update_bits_async() into macro
  regmap: merge regmap_update_bits() into macro
  regmap: add regmap_update_bits_base()
  regcache: flat: Introduce register strider order
  regcache: Introduce the index parsing API by stride order
  regmap: core: Introduce register stride order
  regmap: irq: add devm apis for regmap_{add,del}_irq_chip
  ...
2016-03-15 21:22:26 -07:00
Linus Torvalds
ff280e3639 spi: Updates for v4.6
Not the biggest set of changes for SPI but a bit of a pickup in activity
 on the core:
 
  - Support for memory mapped read from flash devices via a SPI
    controller.
  - The beginnings of a message rewriting framework in the core which
    should in time allow us to support transforming messages to work
    around the limits of controllers or optimise the performance for
    controllers transparently to calling drivers.
  - Updates to the PXA2xx, the main functional change being to improve
    the ACPI support.
  - A new driver for the Analog Devices AXI SPI engine.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJW5vmhAAoJECTWi3JdVIfQbs8H/25sgjyLHNFUy97An1CbfF29
 YI3KCTTi/RlvtQSOl2WvfPu/FML3vJYClkCIG5/oUE1y5FfX+8vdh0oObtaG9ewn
 IwiuhHXH68GcAKTFNFzXrT52aLWo3V67jINwbDyleN2gwk4QchTqUsWvHpcqibCt
 KIOKOqmIhxLXeat/014KAMzsB1snaSVDBv2ao+1DstRBn/2Tb6uFKzHL4ehBAPDy
 mQ/f7ogBlTG3aNrldf+841NbgMl6ekMGapLIllec/j54C6H2Yig8hn0zoWmL8/BG
 eHumwBjdqQMJmLFAS9+cz8Q8YZy2EVG8gCimzQD6uQOMG6CrTGnfgBDmrFhqvRI=
 =WSqI
 -----END PGP SIGNATURE-----

Merge tag 'spi-v4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi updates from Mark Brown:
 "Not the biggest set of changes for SPI but a bit of a pickup in
  activity on the core:

   - Support for memory mapped read from flash devices via a SPI
     controller.

   - The beginnings of a message rewriting framework in the core which
     should in time allow us to support transforming messages to work
     around the limits of controllers or optimise the performance for
     controllers transparently to calling drivers.

   - Updates to the PXA2xx, the main functional change being to improve
     the ACPI support.

   - A new driver for the Analog Devices AXI SPI engine"

* tag 'spi-v4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: (66 commits)
  spi: Add gfp parameter to kernel-doc to fix build warning
  spi: Fix htmldocs build error due struct spi_replaced_transfers
  spi: rockchip: covert rsd_nsecs to u32 type
  spi: rockchip: header file cleanup
  spi: xilinx: Add devicetree binding for spi-xilinx
  spi: respect the maximum segment size of DMA device
  spi: rockchip: check requesting dma channel with EPROBE_DEFER
  spi: rockchip: migrate to dmaengine_terminate_async
  spi: rockchip: check return value of dmaengine_prep_slave_sg
  spi: core: Fix deadlock when sending messages
  spi/rockchip: fix endian mode for 16-bit transfers
  spi/rockchip: Make sure spi clk is on in rockchip_spi_set_cs
  spi: pxa2xx: Use newer more explicit DMAengine terminate API
  spi: pxa2xx: Add support for Intel Broxton B-Step
  spi: lp-8841: return correct error code from probe
  spi: imx: drop bogus tests for rx/tx bufs in DMA transfer
  spi: imx: set MX51_ECSPI_CTRL_SMC bit in setup function
  spi: imx: make some register defines simpler
  spi: imx: remove unnecessary bit clearing in mx51_ecspi_config
  spi: imx: add support for all SPI word width for DMA
  ...
2016-03-15 21:07:33 -07:00
Linus Torvalds
5ca5446ec5 Pin control changes for kernel v4.6:
An almost purely driver related set of changes with no
 major changes to the framework, only one patch adding
 an unlocked version of the pinctrl_find_gpio_range_from_pin()
 library call.
 
 New drivers:
 - ST Microelectronics STM32 MCU support: this is a non-MMU
   low-end platform for IoT things (etc).
 - Microchip PIC32 MCU support: same story as for STM32.
 
 New subdrivers:
 - Allwinner SunXi H3 R_PIO controller support.
 - Qualcomm IPQ4019 support.
 - MediaTek MT2701 and MT7623.
 - Allwinner A64
 
 Non-critical fixes:
 - gpio_disable_free() for the Vybrid.
 - pinctrl single: use a separate lockdep class.
 
 Misc:
 - Substantial cleanups and rewrites for the Super-H PFC
   driver and subdrivers.
 - Various fixes and cleanups, especially Paul Gortmakers
   work to make nonmodular drivers nonmodular.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJW5rS1AAoJEEEQszewGV1zskIQALNriHdVmPGNIuSZOUk2gqAv
 hFNIzFUYM6BK6AL2wD+ZVqQe/EbZtWUF2RqhI2juP8j2WBVMxo8B9ypGjm8qDZ7q
 Xtdnd8l22VP0fEmYKTwDgrjSCcMxbXFiPurZBlqCCyb/9raFqLKwx2kcxWqD4PR2
 dcC8i/t2JSjGYRRCMcGn+zcKW2zja36xci/ZOExdioHYgFmorCZb9+4NYz+coijv
 VEjEy98CG6itBFJ0MS3jHT949TyxpDzWp7hO8LiOAiLR50xCxTlZRT1dObOUQJqk
 qhiLK2sTUFJFlTNBUGN0gfMoJo2MpfJIkVre4EV5QNkfy8vjtXeLrbjnSnTXf0r7
 5OLMEJPK7mfHd7jsw/nwJMYUoqgeRO9VBXtL2JyGWpsNFBFlJv1YRXaT0AZXkgUq
 63BfhTrbtge2xECJ9iqWVdGmmNv2x7lMK6RWqDr72fjONdtmNLDusfNdgYaBWc50
 K910IijMX2t2HGQFzuqwQC5HgADPhqBRb2eMcilwwUs5rxzLXX/wer1rhc8guUtK
 4DvGZ+wPZd2znQvNAWxsG5azoSfO8J3ibVIkMmaW3NTqyLvnbT9KiNvYuUVg/9NQ
 Vcb1d9UnrhAIWzpfpeo4rzr+QIq4j46YHBqreiW/l952Apxpp2lJmV/btK0noPmA
 MDTknHc2772QYIBtn01D
 =TsdU
 -----END PGP SIGNATURE-----

Merge tag 'pinctrl-v4.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

Pull pin control updates from Linus Walleij:
 "An almost purely driver related set of changes with no major changes
  to the framework, only one patch adding an unlocked version of the
  pinctrl_find_gpio_range_from_pin() library call.

  New drivers:
   - ST Microelectronics STM32 MCU support: this is a non-MMU low-end
     platform for IoT things (etc).
   - Microchip PIC32 MCU support: same story as for STM32.

  New subdrivers:
   - Allwinner SunXi H3 R_PIO controller support.
   - Qualcomm IPQ4019 support.
   - MediaTek MT2701 and MT7623.
   - Allwinner A64

  Non-critical fixes:
   - gpio_disable_free() for the Vybrid.
   - pinctrl single: use a separate lockdep class.

  Misc:
   - Substantial cleanups and rewrites for the Super-H PFC driver and
     subdrivers.
   - Various fixes and cleanups, especially Paul Gortmakers work to make
     nonmodular drivers nonmodular"

* tag 'pinctrl-v4.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (75 commits)
  pinctrl: single: Use a separate lockdep class
  drivers: pinctrl: add driver for Allwinner A64 SoC
  pinctrl: Broadcom Northstar2 pinctrl device tree bindings
  pinctrl: amlogic: Make driver independent from two-domain configuration
  pinctrl: amlogic: Separate some pin functions for Meson8 / Meson8b
  pinctrl: at91: use __maybe_unused to hide pm functions
  pinctrl: sh-pfc: core: don't open code of_device_get_match_data()
  pinctrl: uniphier: rename CONFIG options and file names
  pinctrl: sunxi: make A80 explicitly non-modular
  pinctrl: stm32: make explicitly non-modular
  pinctrl: sh-pfc: make explicitly non-modular
  pinctrl: meson: make explicitly non-modular
  pinctrl: pinctrl-mt6397 driver explicitly non-modular
  pinctrl: sunxi: does not need module.h
  pinctrl: pxa2xx: export symbols
  pinctrl: sunxi: Change mux setting on PI irq pins
  pinctrl: sunxi: Remove non existing irq's
  pinctrl: imx: attach iomuxc device to gpr syscon
  pinctrl-bcm2835: Fix cut-and-paste error in "pull" parsing
  pinctrl: lpc1850-scu: document nxp,gpio-pin-interrupt
  ...
2016-03-15 20:23:13 -07:00
Ian Kent
63c06227a2 autofs4: fix string.h include in auto_dev-ioctl.h
Since including linux/string.h will now do the right thing remove the
conditional check.

Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Ian Kent
8a78d59304 autofs4: use pr_xxx() macros directly for logging
Use the standard pr_xxx() log macros directly for log prints instead of
the AUTOFS_XXX() macros.

Signed-off-by: Ian Kent <ikent@redhat.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Ian Kent
90967c87e3 autofs4: change log print macros to not insert newline
Common kernel coding practice is to include the newline of log prints
within the log text rather than hidden away in a macro.

To avoid introducing inconsistencies as changes are made change the log
macros to not include the newline.

Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Ian Kent
cab49f9ed8 autofs4: make autofs log prints consistent
Use the pr_*() print in AUTOFS_*() macros instead of printks and include
the module name in log message macros.  Also use the AUTOFS_*() macros
everywhere instead of raw printks.

Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Ian Kent
0266725ad4 autofs4: fix some white space errors
Fix some white space format errors.

Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Ian Kent
e3cd8067c1 autofs4: fix invalid ioctl return in autofs4_root_ioctl_unlocked()
The return from an ioctl if an invalid ioctl is passed in should be
EINVAL not ENOSYS.

Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Ian Kent
b3f67a988c autofs4: fix coding style line length in autofs4_wait()
The need for this is questionable but checkpatch.pl complains about the
line length and it's a straightfoward change.

Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Ian Kent
aa330ddc53 autofs4: fix coding style problem in autofs4_get_set_timeout()
Refactor autofs4_get_set_timeout() to eliminate coding style error.

Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Ian Kent
e9a7c2f1a5 autofs4: coding style fixes
Try and make the coding style completely consistent throughtout the
autofs module and inline with kernel coding style recommendations.

Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Stanislav Kinsburskiy
c83aa55d0b autofs: show pipe inode in mount options
This is required for CRIU (Checkpoint Restart In Userspace) to migrate a
mount point when write end in user space is closed.

Below is a brief description of the problem.

To migrate a non-catatonic autofs mount point, one has to restore the
control pipe between kernel and autofs master process.

One of the autofs masters is systemd, which closes pipe write end after
passing it to the kernel with mount call.

To be able to restore the systemd control pipe one has to know which
read pipe end in systemd corresponds to the write pipe end in the
kernel.  The pipe "fd" in mount options is not enough because it was
closed and probably replaced by some other descriptor.

Thus, some other attribute is required to be able to find the read pipe
end.  The best attribute to use to find the correct pipe end is inode
number becuase it's unique for the whole system and can't be reused
while the autofs mount exists.

This attribute can also be used to recognize a situation where an autofs
mount has no master (no process with specified "pgrp" or no file
descriptor with "pipe_ino", specified in autofs mount options).

Signed-off-by: Stanislav Kinsburskiy <skinsbursky@virtuozzo.com>
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Ard Biesheuvel
2213e9a66b kallsyms: add support for relative offsets in kallsyms address table
Similar to how relative extables are implemented, it is possible to emit
the kallsyms table in such a way that it contains offsets relative to
some anchor point in the kernel image rather than absolute addresses.

On 64-bit architectures, it cuts the size of the kallsyms address table
in half, since offsets between kernel symbols can typically be expressed
in 32 bits.  This saves several hundreds of kilobytes of permanent
.rodata on average.  In addition, the kallsyms address table is no
longer subject to dynamic relocation when CONFIG_RELOCATABLE is in
effect, so the relocation work done after decompression now doesn't have
to do relocation updates for all these values.  This saves up to 24
bytes (i.e., the size of a ELF64 RELA relocation table entry) per value,
which easily adds up to a couple of megabytes of uncompressed __init
data on ppc64 or arm64.  Even if these relocation entries typically
compress well, the combined size reduction of 2.8 MB uncompressed for a
ppc64_defconfig build (of which 2.4 MB is __init data) results in a ~500
KB space saving in the compressed image.

Since it is useful for some architectures (like x86) to retain the
ability to emit absolute values as well, this patch also adds support
for capturing both absolute and relative values when
KALLSYMS_ABSOLUTE_PERCPU is in effect, by emitting absolute per-cpu
addresses as positive 32-bit values, and addresses relative to the
lowest encountered relative symbol as negative values, which are
subtracted from the runtime address of this base symbol to produce the
actual address.

Support for the above is enabled by default for all architectures except
IA-64 and Tile-GX, whose symbols are too far apart to capture in this
manner.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Kees Cook <keescook@chromium.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Ard Biesheuvel
8c996940b3 kallsyms: don't overload absolute symbol type for percpu symbols
Commit c6bda7c988a5 ("kallsyms: fix percpu vars on x86-64 with
relocation") overloaded the 'A' (absolute) symbol type to signify that a
symbol is not subject to dynamic relocation.  However, the original A
type does not imply that at all, and depending on the version of the
toolchain, many A type symbols are emitted that are in fact relative to
the kernel text, i.e., if the kernel is relocated at runtime, these
symbols should be updated as well.

For instance, on sparc32, the following symbols are emitted as absolute
(kindly provided by Guenter Roeck):

  f035a420 A _etext
  f03d9000 A _sdata
  f03de8c4 A jiffies
  f03f8860 A _edata
  f03fc000 A __init_begin
  f041bdc8 A __init_text_end
  f0423000 A __bss_start
  f0423000 A __init_end
  f044457d A __bss_stop
  f044457d A _end

On x86_64, similar behavior can be observed:

  ffffffff81a00000 A __end_rodata_hpage_align
  ffffffff81b19000 A __vvar_page
  ffffffff81d3d000 A _end

Even if only a couple of them pass the symbol range check that results
in them to be taken into account for the final kallsyms symbol table, it
is obvious that 'A' does not mean the symbol does not need to be updated
at relocation time, and overloading its meaning to signify that is
perhaps not a good idea.

So instead, add a new percpu_absolute member to struct sym_entry, and
when --absolute-percpu is in effect, use it to record symbols whose
addresses should be emitted as final values rather than values that
still require relocation at runtime.  That way, we can drop the check
against the 'A' type.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Kees Cook <keescook@chromium.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michal Marek <mmarek@suse.cz>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Ard Biesheuvel
4d5d5664c9 x86: kallsyms: disable absolute percpu symbols on !SMP
scripts/kallsyms.c has a special --absolute-percpu command line option
which deals with the zero based per cpu offsets that are used when
building for SMP on x86_64.  This means that the option should only be
passed in that case, so add a Kconfig symbol with the correct predicate,
and use that instead.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Kees Cook <keescook@chromium.org>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Geyslan G. Bem
6b8c69e438 checkpatch: fix another left brace warning
This patch escapes a regex that uses left brace.

Using checkpatch.pl with Perl 5.22.0 generates the warning: "Unescaped
left brace in regex is deprecated, passed through in regex;"

Comment from regcomp.c in Perl source: "Currently we don't warn when the
lbrace is at the start of a construct.  This catches it in the middle of
a literal string, or when it's the first thing after something like
"\b"."

This works as a complement to 4e5d56bd ("checkpatch: fix left brace
warning").

Signed-off-by: Geyslan G. Bem <geyslan@gmail.com>
Signed-off-by: Joe Perches <joe@perches.com>
Suggested-by: Peter Senna Tschudin <peter.senna@gmail.com>
Cc: Eddie Kovsky <ewk@edkovsky.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Joe Perches
207a8e8465 checkpatch: improve UNSPECIFIED_INT test for bare signed/unsigned uses
Improve the test to allow casts to (unsigned) or (signed) to be found
and fixed if desired.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Joe Perches
a1ce18e4f9 checkpatch: warn on bare unsigned or signed declarations without int
Kernel style prefers "unsigned int <foo>" over "unsigned <foo>" and
"signed int <foo>" over "signed <foo>".

Emit a warning for these simple signed/unsigned <foo> declarations.  Fix
it too if desired.

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Joe Perches
42e152931d checkpatch: exclude asm volatile from complex macro check
asm volatile and all its variants like __asm__ __volatile__ ("<foo>")
are reported as errors with "Macros with with complex values should be
enclosed in parentheses".

Make an exception for these asm volatile macro definitions by converting
the "asm volatile" to "asm_volatile" so it appears as a single function
call and the error isn't reported.

Signed-off-by: Joe Perches <joe@perches.com>
Reported-by: Jeff Merkey <linux.mdb@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Johannes Weiner
9cf7666ace mm: memcontrol: drop unnecessary lru locking from mem_cgroup_migrate()
Migration accounting in the memory controller used to have to handle
both oldpage and newpage being on the LRU already; fuse's page cache
replacement used to pass a recycled newpage that had been uncharged but
not freed and removed from the LRU, and the memcg migration code used to
uncharge oldpage to "pass on" the existing charge to newpage.

Nowadays, pages are no longer uncharged when truncated from the page
cache, but rather only at free time, so if a LRU page is recycled in
page cache replacement it'll also still be charged.  And we bail out of
the charge transfer altogether in that case.  Tell commit_charge() that
we know newpage is not on the LRU, to avoid taking the zone->lru_lock
unnecessarily from the migration path.

But also, oldpage is no longer uncharged inside migration.  We only use
oldpage for its page->mem_cgroup and page size, so we don't care about
its LRU state anymore either.  Remove any mention from the kernel doc.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Suggested-by: Hugh Dickins <hughd@google.com>
Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Mateusz Guzik <mguzik@redhat.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Johannes Weiner
74485cf2bc mm: migrate: consolidate mem_cgroup_migrate() calls
Rather than scattering mem_cgroup_migrate() calls all over the place,
have a single call from a safe place where every migration operation
eventually ends up in - migrate_page_copy().

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Suggested-by: Hugh Dickins <hughd@google.com>
Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Mateusz Guzik <mguzik@redhat.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Joonsoo Kim
7cf91a98e6 mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous
There is a performance drop report due to hugepage allocation and in
there half of cpu time are spent on pageblock_pfn_to_page() in
compaction [1].

In that workload, compaction is triggered to make hugepage but most of
pageblocks are un-available for compaction due to pageblock type and
skip bit so compaction usually fails.  Most costly operations in this
case is to find valid pageblock while scanning whole zone range.  To
check if pageblock is valid to compact, valid pfn within pageblock is
required and we can obtain it by calling pageblock_pfn_to_page().  This
function checks whether pageblock is in a single zone and return valid
pfn if possible.  Problem is that we need to check it every time before
scanning pageblock even if we re-visit it and this turns out to be very
expensive in this workload.

Although we have no way to skip this pageblock check in the system where
hole exists at arbitrary position, we can use cached value for zone
continuity and just do pfn_to_page() in the system where hole doesn't
exist.  This optimization considerably speeds up in above workload.

Before vs After
  Max: 1096 MB/s vs 1325 MB/s
  Min: 635 MB/s 1015 MB/s
  Avg: 899 MB/s 1194 MB/s

Avg is improved by roughly 30% [2].

[1]: http://www.spinics.net/lists/linux-mm/msg97378.html
[2]: https://lkml.org/lkml/2015/12/9/23

[akpm@linux-foundation.org: don't forget to restore zone->contiguous on error path, per Vlastimil]
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Reported-by: Aaron Lu <aaron.lu@intel.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Aaron Lu <aaron.lu@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Joonsoo Kim
e1409c325f mm/compaction: pass only pageblock aligned range to pageblock_pfn_to_page
pageblock_pfn_to_page() is used to check there is valid pfn and all
pages in the pageblock is in a single zone.  If there is a hole in the
pageblock, passing arbitrary position to pageblock_pfn_to_page() could
cause to skip whole pageblock scanning, instead of just skipping the
hole page.  For deterministic behaviour, it's better to always pass
pageblock aligned range to pageblock_pfn_to_page().  It will also help
further optimization on pageblock_pfn_to_page() in the following patch.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Joonsoo Kim
623446e4dc mm/compaction: fix invalid free_pfn and compact_cached_free_pfn
free_pfn and compact_cached_free_pfn are the pointer that remember
restart position of freepage scanner.  When they are reset or invalid,
we set them to zone_end_pfn because freepage scanner works in reverse
direction.  But, because zone range is defined as [zone_start_pfn,
zone_end_pfn), zone_end_pfn is invalid to access.  Therefore, we should
not store it to free_pfn and compact_cached_free_pfn.  Instead, we need
to store zone_end_pfn - 1 to them.  There is one more thing we should
consider.  Freepage scanner scan reversely by pageblock unit.  If
free_pfn and compact_cached_free_pfn are set to middle of pageblock, it
regards that sitiation as that it already scans front part of pageblock
so we lose opportunity to scan there.  To fix-up, this patch do
round_down() to guarantee that reset position will be pageblock aligned.

Note that thanks to the current pageblock_pfn_to_page() implementation,
actual access to zone_end_pfn doesn't happen until now.  But, following
patch will change pageblock_pfn_to_page() so this patch is needed from
now on.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Alexander Kuleshov
5aa174801f mm/memblock.c: remove unnecessary memblock_type variable
We define struct memblock_type *type in the memblock_add_region() and
memblock_reserve_region() functions only for passing it to the
memlock_add_range() and memblock_reserve_range() functions.  Let's
remove these variables and will pass a type directly.

Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Christian Borntraeger
a75e1f637c x86: also use debug_pagealloc_enabled() for free_init_pages
we want to couple all debugging features with debug_pagealloc_enabled()
and not with the config option CONFIG_DEBUG_PAGEALLOC.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Suggested-by: David Rientjes <rientjes@google.com>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Laura Abbott <labbott@fedoraproject.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Christian Borntraeger
10917b8393 s390: query dynamic DEBUG_PAGEALLOC setting
We can use debug_pagealloc_enabled() to check if we can map the identity
mapping with 1MB/2GB pages as well as to print the current setting in
dump_stack.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Christian Borntraeger
288cf3c64e x86: query dynamic DEBUG_PAGEALLOC setting
We can use debug_pagealloc_enabled() to check if we can map the identity
mapping with 2MB pages.  We can also add the state into the dump_stack
output.

The patch does not touch the code for the 1GB pages, which ignored
CONFIG_DEBUG_PAGEALLOC.  Do we need to fence this as well?

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Kirill A. Shutemov
8df651c705 thp: cleanup split_huge_page()
After one of bugfixes to freeze_page(), we don't have freezed pages in
rmap, therefore mapcount of all subpages of freezed THP is zero.  And we
have assert for that.

Let's drop code which deal with non-zero mapcount of subpages.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Matthew Wilcox
88193f7ce6 mm: use linear_page_index() in do_fault()
do_fault() assumes that PAGE_SIZE is the same as PAGE_CACHE_SIZE.  Use
linear_page_index() to calculate pgoff in the correct units.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Johannes Weiner
fdf1cdb91b mm: remove unnecessary uses of lock_page_memcg()
There are several users that nest lock_page_memcg() inside lock_page()
to prevent page->mem_cgroup from changing.  But the page lock prevents
pages from moving between cgroups, so that is unnecessary overhead.

Remove lock_page_memcg() in contexts with locked contexts and fix the
debug code in the page stat functions to be okay with the page lock.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Johannes Weiner
62cccb8c8e mm: simplify lock_page_memcg()
Now that migration doesn't clear page->mem_cgroup of live pages anymore,
it's safe to make lock_page_memcg() and the memcg stat functions take
pages, and spare the callers from memcg objects.

[akpm@linux-foundation.org: fix warnings]
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Suggested-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Johannes Weiner
6a93ca8fde mm: migrate: do not touch page->mem_cgroup of live pages
Changing a page's memcg association complicates dealing with the page,
so we want to limit this as much as possible.  Page migration e.g.  does
not have to do that.  Just like page cache replacement, it can forcibly
charge a replacement page, and then uncharge the old page when it gets
freed.  Temporarily overcharging the cgroup by a single page is not an
issue in practice, and charging is so cheap nowadays that this is much
preferrable to the headache of messing with live pages.

The only place that still changes the page->mem_cgroup binding of live
pages is when pages move along with a task to another cgroup.  But that
path isolates the page from the LRU, takes the page lock, and the move
lock (lock_page_memcg()).  That means page->mem_cgroup is always stable
in callers that have the page isolated from the LRU or locked.  Lighter
unlocked paths, like writeback accounting, can use lock_page_memcg().

[akpm@linux-foundation.org: fix build]
[vdavydov@virtuozzo.com: fix lockdep splat]
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Johannes Weiner
23047a96d7 mm: workingset: per-cgroup cache thrash detection
Cache thrash detection (see a528910e12ec "mm: thrash detection-based
file cache sizing" for details) currently only works on the system
level, not inside cgroups.  Worse, as the refaults are compared to the
global number of active cache, cgroups might wrongfully get all their
refaults activated when their pages are hotter than those of others.

Move the refault machinery from the zone to the lruvec, and then tag
eviction entries with the memcg ID.  This makes the thrash detection
work correctly inside cgroups.

[sergey.senozhatsky@gmail.com: do not return from workingset_activation() with locked rcu and page]
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Johannes Weiner
612e44939c mm: workingset: eviction buckets for bigmem/lowbit machines
For per-cgroup thrash detection, we need to store the memcg ID inside
the radix tree cookie as well.  However, on 32 bit that doesn't leave
enough bits for the eviction timestamp to cover the necessary range of
recently evicted pages.  The radix tree entry would look like this:

[ RADIX_TREE_EXCEPTIONAL(2) | ZONEID(2) | MEMCGID(16) | EVICTION(12) ]

12 bits means 4096 pages, means 16M worth of recently evicted pages.
But refaults are actionable up to distances covering half of memory.  To
not miss refaults, we have to stretch out the range at the cost of how
precisely we can tell when a page was evicted.  This way we can shave
off lower bits from the eviction timestamp until the necessary range is
covered.  E.g.  grouping evictions into 1M buckets (256 pages) will
stretch the longest representable refault distance to 4G.

This patch implements eviction buckets that are automatically sized
according to the available bits and the necessary refault range, in
preparation for per-cgroup thrash detection.

The maximum actionable distance is currently half of memory, but to
support memory hotplug of up to 200% of boot-time memory, we size the
buckets to cover double the distance.  Beyond that, thrashing won't be
detectable anymore.

During boot, the kernel will print out the exact parameters, like so:

  [    0.113929] workingset: timestamp_bits=12 max_order=18 bucket_order=6

In this example, there are 12 radix entry bits available for the
eviction timestamp, to cover a maximum distance of 2^18 pages (this is a
1G machine).  Consequently, evictions must be grouped into buckets of
2^6 pages, or 256K.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Johannes Weiner
162453bfbd mm: workingset: separate shadow unpacking and refault calculation
Per-cgroup thrash detection will need to derive a live memcg from the
eviction cookie, and doing that inside unpack_shadow() will get nasty
with the reference handling spread over two functions.

In preparation, make unpack_shadow() clearly about extracting static
data, and let workingset_refault() do all the higher-level handling.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Johannes Weiner
689c94f03a mm: workingset: #define radix entry eviction mask
This is a compile-time constant, no need to calculate it on refault.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Johannes Weiner
81f8c3a461 mm: memcontrol: generalize locking for the page->mem_cgroup binding
These patches tag the page cache radix tree eviction entries with the
memcg an evicted page belonged to, thus making per-cgroup LRU reclaim
work properly and be as adaptive to new cache workingsets as global
reclaim already is.

This should have been part of the original thrash detection patch
series, but was deferred due to the complexity of those patches.

This patch (of 5):

So far the only sites that needed to exclude charge migration to
stabilize page->mem_cgroup have been per-cgroup page statistics, hence
the name mem_cgroup_begin_page_stat().  But per-cgroup thrash detection
will add another site that needs to ensure page->mem_cgroup lifetime.

Rename these locking functions to the more generic lock_page_memcg() and
unlock_page_memcg().  Since charge migration is a cgroup1 feature only,
we might be able to delete it at some point, and these now easy to
identify locking sites along with it.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Suggested-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Michal Hocko
0db2cb8da8 mm, vmscan: make zone_reclaimable_pages more precise
zone_reclaimable_pages() is used in should_reclaim_retry() which uses it
to calculate the target for the watermark check.  This means that
precise numbers are important for the correct decision.
zone_reclaimable_pages uses zone_page_state which can contain stale data
with per-cpu diffs not synced yet (the last vmstat_update might have run
1s in the past).

Use zone_page_state_snapshot() in zone_reclaimable_pages() instead.
None of the current callers is in a hot path where getting the precise
value (which involves per-cpu iteration) would cause an unreasonable
overhead.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Suggested-by: David Rientjes <rientjes@google.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Naoya Horiguchi
d7206a70af mm/madvise: update comment on sys_madvise()
Some new MADV_* advices are not documented in sys_madvise() comment.  So
let's update it.

[akpm@linux-foundation.org: modifications suggested by Michal]
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Vladimir Davydov
cecf257b62 mm: vmscan: do not clear SHRINKER_NUMA_AWARE if nr_node_ids == 1
Currently, on shrinker registration we clear SHRINKER_NUMA_AWARE if
there's the only NUMA node present.  The comment states that this will
allow us to save some small loop time later.  It used to be true when
this code was added (see commit 1d3d4437eae1b ("vmscan: per-node
deferred work")), but since commit 6b4f7799c6a57 ("mm: vmscan: invoke
slab shrinkers from shrink_zone()") it doesn't make any difference.
Anyway, running on non-NUMA machine shouldn't make a shrinker NUMA
unaware, so zap this hunk.

Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Vitaly Kuznetsov
703fc13a3f xen_balloon: support memory auto onlining policy
Add support for the newly added kernel memory auto onlining policy to
Xen ballon driver.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Suggested-by: Daniel Kiper <daniel.kiper@oracle.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
Acked-by: David Vrabel <david.vrabel@citrix.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Daniel Kiper <daniel.kiper@oracle.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Kay Sievers <kay@vrfy.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Vitaly Kuznetsov
31bc3858ea memory-hotplug: add automatic onlining policy for the newly added memory
Currently, all newly added memory blocks remain in 'offline' state
unless someone onlines them, some linux distributions carry special udev
rules like:

  SUBSYSTEM=="memory", ACTION=="add", ATTR{state}=="offline", ATTR{state}="online"

to make this happen automatically.  This is not a great solution for
virtual machines where memory hotplug is being used to address high
memory pressure situations as such onlining is slow and a userspace
process doing this (udev) has a chance of being killed by the OOM killer
as it will probably require to allocate some memory.

Introduce default policy for the newly added memory blocks in
/sys/devices/system/memory/auto_online_blocks file with two possible
values: "offline" which preserves the current behavior and "online"
which causes all newly added memory blocks to go online as soon as
they're added.  The default is "offline".

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Daniel Kiper <daniel.kiper@oracle.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Kay Sievers <kay@vrfy.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Mika Penttilä
9cb65bc3b1 mm/memory.c: make apply_to_page_range() more robust
Arm and arm64 used to trigger this BUG_ON() - this has now been fixed.

But a WARN_ON() here is sufficient to catch future buggy callers.

Signed-off-by: Mika Penttilä <mika.penttila@nextfour.com>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Liang Chen
4355c018c2 mm/mempolicy.c: skip VM_HUGETLB and VM_MIXEDMAP VMA for lazy mbind
VM_HUGETLB and VM_MIXEDMAP vma needs to be excluded to avoid compound
pages being marked for migration and unexpected COWs when handling
hugetlb fault.

Thanks to Naoya Horiguchi for reminding me on these checks.

Signed-off-by: Liang Chen <liangchen.linux@gmail.com>
Signed-off-by: Gavin Guo <gavin.guo@canonical.com>
Suggested-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: SeongJae Park <sj38.park@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Wang Xiaoqiang
0b94f17507 mm/memory-failure.c: remove useless "undef"s
Remove the useless #undef, since the corresponding #define has already
been removed.

Signed-off-by: Wang Xiaoqiang <wangxq10@lzu.edu.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Naoya Horiguchi
23a003bfd2 mm/madvise: pass return code of memory_failure() to userspace
Currently the return value of memory_failure() is not passed to
userspace when madvise(MADV_HWPOISON) is used.  This is inconvenient for
test programs that want to know the result of error handling.  So let's
return it to the caller as we already do in the MADV_SOFT_OFFLINE case.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Vlastimil Babka
5b3810e5c6 mm, sl[au]b: print gfp_flags as strings in slab_out_of_memory()
We can now print gfp_flags more human-readable.  Make use of this in
slab_out_of_memory() for SLUB and SLAB.  Also convert the SLAB variant
it to pr_warn() along the way.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00