These changes include:
* Support for the ACPI IORT table on ARM systems and patches to
make the ARM-SMMU driver make use of it
* Conversion of the Exynos IOMMU driver to device dependency
links and implementation of runtime pm support based on that
conversion
* Update the Mediatek IOMMU driver to use the new
struct device->iommu_fwspec member
* Implementation of dma_map/unmap_resource in the generic ARM
dma-iommu layer
* A number of smaller fixes and improvements all over the place
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJYUskLAAoJECvwRC2XARrjGnQQALfaIJffrh+5vjs08h3J86TF
lUEvxu1/5PGCwXEWbIH+NQh4pMgBuOL0exa6sR55CqtHPkUKMndv+sVXWUkCrl9Z
Q4jro+IuPxbHHdQvULjdqn1gtvPvBXN/OFCbQsfj6G+GblAZCIn0OtkKm2MqdGa4
xHPTAhW+QAt7X8O7PoMsvCNa0loZJx05ToIPbhN23U0H12TwU4aHaMg13+KSGgz/
wEXGDYDU8EnGubcM/JEpy84qywRU1D+TDwzaIFT2sy7Ewhmgz6rSf+SOa+vn4u5z
G9xhc6b8Vq+2ZKr1j28pPdrCYMRpQxBWru7t2rcdV0Xa+J1JEPKJfyEPxeldDeXJ
4bqld/D/0nq7MFvGL73qff+oiU2qOJ1VnPrQO8SbHyueeFd4axMYlPsU7GeDWqoz
LX1gKuHE1t2GqsNgn/dLouyJGDKjyYzZbio0HuPHGjegkx+dkTXEM0yzLtZJG8hw
cZSlYijrRJeEROpM8/5BGid3BOAIx8qlOXO/QzI41e0KnQikkgQBgr1L2+Ki8ZaB
mNs/S/BqDr6LSUP5ArQHlZ0wu/pk1ehwYkmzp9j7Ui1jGdSWwbqw7KkLDXd0pL4g
NMhsprJLYlnY2GUdrbcDOEWHS9Ex0vRsPKNWoqCggzg/8h8cQmuiDZO346vhlvq/
ORMwDO7LNZuPHqDM0WA+
=EYLL
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU updates from Joerg Roedel:
"These changes include:
- support for the ACPI IORT table on ARM systems and patches to make
the ARM-SMMU driver make use of it
- conversion of the Exynos IOMMU driver to device dependency links
and implementation of runtime pm support based on that conversion
- update the Mediatek IOMMU driver to use the new struct
device->iommu_fwspec member
- implementation of dma_map/unmap_resource in the generic ARM
dma-iommu layer
- a number of smaller fixes and improvements all over the place"
* tag 'iommu-updates-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (44 commits)
ACPI/IORT: Make dma masks set-up IORT specific
iommu/amd: Missing error code in amd_iommu_init_device()
iommu/s390: Drop duplicate header pci.h
ACPI/IORT: Introduce iort_iommu_configure
ACPI/IORT: Add single mapping function
ACPI/IORT: Replace rid map type with type mask
iommu/arm-smmu: Add IORT configuration
iommu/arm-smmu: Split probe functions into DT/generic portions
iommu/arm-smmu-v3: Add IORT configuration
iommu/arm-smmu-v3: Split probe functions into DT/generic portions
ACPI/IORT: Add support for ARM SMMU platform devices creation
ACPI/IORT: Add node match function
ACPI: Implement acpi_dma_configure
iommu/arm-smmu-v3: Convert struct device of_node to fwnode usage
iommu/arm-smmu: Convert struct device of_node to fwnode usage
iommu: Make of_iommu_set/get_ops() DT agnostic
ACPI/IORT: Add support for IOMMU fwnode registration
ACPI/IORT: Introduce linker section for IORT entries probing
ACPI: Add FWNODE_ACPI_STATIC fwnode type
iommu/arm-smmu: Set SMTNMB_TLBEN in ACR to enable caching of bypass entries
...
- Shared mlx5 updates with net stack (will drop out on merge if Dave's
tree has already been merged)
- Driver updates: cxgb4, hfi1, hns-roce, i40iw, mlx4, mlx5, qedr, rxe
- Debug cleanups
- New connection rejection helpers
- SRP updates
- Various misc fixes
- New paravirt driver from vmware
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJYUbAPAAoJELgmozMOVy/dMXcP/iuG5MNzfN8Ny1JftyBQGWg3
cqoQ2OLj9CsXjwVB+5EqbcZHRZY852lKONaLoDKkIOx4YAXO2YuIKOp944vN7EQx
96wfqzT1F5jzAcy5mYZXgLaStGFDAwejKMqeHd0LfJj3OEtemGnVPWYzyqSQmSKo
dzJraS1Z9GIRppzU5WaRpB9PtRBkqIqGJ5vZ0EKLGhed5hYY5r0iMJB0GfriMRDO
lJ4UUVfpsAoLPnqDBFH6IMn2V2UeAw9IR5zNa1mrM1RBfvt/uYTxrw1w3p9WoaNs
GRodhk4DCeAfeyqzVPNBLyXZ4Zq4FzGe3UWM4qysJ1RR4oFNw9Cuw0Fqk8mrfznr
7hv5TpGIckRZiKf8l6e+qLirF0qGtXJg29j2vPVQI9i5nSj95g1agA81PnLQlLLb
flWyxeMj81my7lfMHN1xcV6pqPEKMCOysZmfcvVfJd2XxpjuVD7ekl/YXWp8o8kU
YPdQMqPD626XsD8VpPdMszb9FPmx0JD0HEv+Y1rIFX8JegEI+c3H2X0dqC27T/Ou
FEPWOy025EgHm0Fh/7eIzkG6tjZ4JHoCugJAcxNZGj2XW4eB6r5vY8UwJ8iQRv+n
PVYHiy0UoIRePh0mrdOSSphGZMi/GO/DsqKwCtAMEK43WqZQju6wR7QSIGkh66mp
4uSHJqpf3YEYylxGMhk3
=QeGy
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull rdma updates from Doug Ledford:
"This is the complete update for the rdma stack for this release cycle.
Most of it is typical driver and core updates, but there is the
entirely new VMWare pvrdma driver. You may have noticed that there
were changes in DaveM's pull request to the bnxt Ethernet driver to
support a RoCE RDMA driver. The bnxt_re driver was tentatively set to
be pulled in this release cycle, but it simply wasn't ready in time
and was dropped (a few review comments still to address, and some
multi-arch build issues like prefetch() not working across all
arches).
Summary:
- shared mlx5 updates with net stack (will drop out on merge if
Dave's tree has already been merged)
- driver updates: cxgb4, hfi1, hns-roce, i40iw, mlx4, mlx5, qedr, rxe
- debug cleanups
- new connection rejection helpers
- SRP updates
- various misc fixes
- new paravirt driver from vmware"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (210 commits)
IB: Add vmw_pvrdma driver
IB/mlx4: fix improper return value
IB/ocrdma: fix bad initialization
infiniband: nes: return value of skb_linearize should be handled
MAINTAINERS: Update Intel RDMA RNIC driver maintainers
MAINTAINERS: Remove Mitesh Ahuja from emulex maintainers
IB/core: fix unmap_sg argument
qede: fix general protection fault may occur on probe
IB/mthca: Replace pci_pool_alloc by pci_pool_zalloc
mlx5, calc_sq_size(): Make a debug message more informative
mlx5: Remove a set-but-not-used variable
mlx5: Use { } instead of { 0 } to init struct
IB/srp: Make writing the add_target sysfs attr interruptible
IB/srp: Make mapping failures easier to debug
IB/srp: Make login failures easier to debug
IB/srp: Introduce a local variable in srp_add_one()
IB/srp: Fix CONFIG_DYNAMIC_DEBUG=n build
IB/multicast: Check ib_find_pkey() return value
IPoIB: Avoid reading an uninitialized member variable
IB/mad: Fix an array index check
...
When the dummy timer callback is invoked before the real timer callbacks,
then it tries to install that timer for the starting CPU. If the platform
does not have a broadcast timer installed the installation fails with a
kernel crash. The crash happens due to a unconditional deference of the non
available broadcast device. This needs to be fixed in the timer core code.
But even when this is fixed in the core code then installing the dummy
timer before the real timers is a pointless exercise.
Move it to the end of the callback list.
Fixes: 00c1d17aab51 ("clocksource/dummy_timer: Convert to hotplug state machine")
Reported-and-tested-by: Mason <slash.tmp@free.fr>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Richard Cochran <rcochran@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
Cc: Sebastian Frias <sf84@laposte.net>
Cc: Thibaud Cornic <thibaud_cornic@sigmadesigns.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Link: http://lkml.kernel.org/r/1147ef90-7877-e4d2-bb2b-5c4fa8d3144b@free.fr
Add a description of the HW_EVENT_ERR_DEFERRED type that wasn't included
with commit d12a969ebbfc ("EDAC, amd64: Add Deferred Error type").
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Instead of storing the concepts dictionary inside header file,
move it to the subsystem documentation.
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
As this file was never added to the driver-api, the kernel-doc
markups there were never tested. Some of them have issues.
Fix them.
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
The edac_core.h header contain data structures and function
definitions for the 3 parts of EDAC: MC, PCI and device.
Let's move the PCI ones to a separate header file, as part
of a header reorganization.
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
* patchwork: (496 commits)
[media] v4l: tvp5150: Add missing break in set control handler
[media] v4l: tvp5150: Don't inline the tvp5150_selmux() function
[media] v4l: tvp5150: Compile tvp5150_link_setup out if !CONFIG_MEDIA_CONTROLLER
[media] em28xx: don't store usb_device at struct em28xx
[media] em28xx: use usb_interface for dev_foo() calls
[media] em28xx: don't change the device's name
[media] mn88472: fix chip id check on probe
[media] mn88473: fix chip id check on probe
[media] lirc: fix error paths in lirc_cdev_add()
[media] s5p-mfc: Add support for MFC v8 available in Exynos 5433 SoCs
[media] s5p-mfc: Rework clock handling
[media] s5p-mfc: Don't keep clock prepared all the time
[media] s5p-mfc: Kill all IS_ERR_OR_NULL in clocks management code
[media] s5p-mfc: Remove dead conditional code
[media] s5p-mfc: Ensure that clock is disabled before turning power off
[media] s5p-mfc: Remove special clock rate management
[media] s5p-mfc: Use printk_ratelimited for reporting ioctl errors
[media] s5p-mfc: Set DMA_ATTR_ALLOC_SINGLE_PAGES
[media] vivid: Set color_enc on HSV formats
[media] v4l2-tpg: Init hv_enc field with a valid value
...
prefill_possible_map() reinitializes the cpu_possible_map by setting the
possible cpu bits and clearing all other bits up to NR_CPUS.
This is technically always correct because cpu_possible_map is statically
allocated and sized NR_CPUS. With CPUMASK_OFFSTACK and DEBUG_PER_CPU_MAPS
enabled the bounds check of cpu masks happens on nr_cpu_ids. nr_cpu_ids is
initialized to NR_CPUS and only limited after the set/clear bit loops have
been executed.
But if the system was booted with "nr_cpus=N" on the command line, where N
is < NR_CPUS then nr_cpu_ids is limited in the parameter parsing function
before prefill_possible_map() is invoked. As a consequence the cpumask
bounds check triggers when clearing the bits past nr_cpu_ids.
Add a helper which allows to reset cpu_possible_map w/o the bounds check
and then set only the possible bits which are well inside bounds.
Reported-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: 0x7f454c46@gmail.com
Cc: Jan Beulich <JBeulich@novell.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1612131836050.3415@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Contained in this update:
- DAX PMD vaults via iomap infrastructure
- Direct-io support in iomap infrastructure
- removal of now-redundant XFS inode iolock, replaced with VFS i_rwsem
- synchronisation with fixes and changes in userspace libxfs code
- extent tree lookup helpers
- lots of little corruption detection improvements to verifiers
- optimised CRC calculations
- faster buffer cache lookups
- deprecation of barrier/nobarrier mount options - we always use
REQ_FUA/REQ_FLUSH where appropriate for data integrity now
- cleanups to speculative preallocation
- miscellaneous minor bug fixes and cleanups
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJYUgqdAAoJEK3oKUf0dfodQgsP/1dJ4qUc6cRk8kL+f10FoIek
oFzdViRHZj8cROGe2n2YTBJtPa9KjU5DNHnxaxWZBN4ZpItp/uN1sAQhgtNQ4/cN
C3JF6B/+/dIbNSbd7DwvSl0dMWknzmrB+Myfs2ZPpMA1S4GInk1MOJSj7AQdYAvJ
dS0dQWAuIB20cahwuGA4y7zUniYL1IcF/BH8hlmzpcUNUoJ9AkR1hTg5/aVfmga3
w2p1vZyT2E4xs/Ff4FYW5MzPGxLVQMZVNIAXAcJl+c61z46ndXqidSmVHGvc+Tlt
ouxftHy/7KqowZlCFss1pSXg9HlXHhjS+iLbZerfcjO2qldriZS+QqQyASmQzPAz
+PpnMfVOj+yjsXKyIHWuS1G35aV16pPWwdA0ECeU6yv9iZ7tSz5rvSrsPZPLFz4x
RVhcKbmXR3y8DugkmtznU5ozxPt5hbbstEV3leCzxJpZu5reRJThUW7nYkSd0CEJ
ZyT/GP6Aq/MM8O/hOgVutAH409dsrYok8m/lq1J7VbNUt8inylcsMWsBeX/0/AHY
aC7I2Vx8bnbfL+C8wYKYhuShOGSch93O5hDUXdH2K/Sm5cK4y2asWge6MfFsS6Lu
waVYwd5aYBlNbzkvUMm2I5EV4cCCR3YwWYwfBEP7kPYUDxN14huOz6lVXnQPDLQ1
qsV1aNfK9PPiw6Fcaop0
=HwDG
-----END PGP SIGNATURE-----
Merge tag 'xfs-for-linus-4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs
Pull xfs updates from Dave Chinner:
"There is quite a varied bunch of stuff in this update, and some of it
you will have already merged through the ext4 tree which imported the
dax-4.10-iomap-pmd topic branch from the XFS tree.
There is also a new direct IO implementation that uses the iomap
infrastructure. It's much simpler, faster, and has lower IO latency
than the existing direct IO infrastructure.
Summary:
- DAX PMD faults via iomap infrastructure
- Direct-io support in iomap infrastructure
- removal of now-redundant XFS inode iolock, replaced with VFS
i_rwsem
- synchronisation with fixes and changes in userspace libxfs code
- extent tree lookup helpers
- lots of little corruption detection improvements to verifiers
- optimised CRC calculations
- faster buffer cache lookups
- deprecation of barrier/nobarrier mount options - we always use
REQ_FUA/REQ_FLUSH where appropriate for data integrity now
- cleanups to speculative preallocation
- miscellaneous minor bug fixes and cleanups"
* tag 'xfs-for-linus-4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (63 commits)
xfs: nuke unused tracepoint definitions
xfs: use GPF_NOFS when allocating btree cursors
xfs: use xfs_vn_setattr_size to check on new size
xfs: deprecate barrier/nobarrier mount option
xfs: Always flush caches when integrity is required
xfs: ignore leaf attr ichdr.count in verifier during log replay
xfs: use rhashtable to track buffer cache
xfs: optimise CRC updates
xfs: make xfs btree stats less huge
xfs: don't cap maximum dedupe request length
xfs: don't allow di_size with high bit set
xfs: error out if trying to add attrs and anextents > 0
xfs: don't crash if reading a directory results in an unexpected hole
xfs: complain if we don't get nextents bmap records
xfs: check for bogus values in btree block headers
xfs: forbid AG btrees with level == 0
xfs: several xattr functions can be void
xfs: handle cow fork in xfs_bmap_trace_exlist
xfs: pass state not whichfork to trace_xfs_extlist
xfs: Move AGI buffer type setting to xfs_read_agi
...
Fairly routine update this time around with all changes specific to drivers.
o New driver for STMicroelectronics FDMA
o Memory-to-memory transfers on dw dmac
o Support for slave maps on pl08x devices
o Bunch of driver fixes to use dma_pool_zalloc
o Bunch of compile and warning fixes spread across drivers
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJYUg7NAAoJEHwUBw8lI4NH5/gP/j81+2RzCUX8PiLQxNUt0Vj+
tVJEizpWCwN1cnhc8ibZdI1DAwyj+GbN2lghYTjqqEng4yOm3czPzUl99grBrpQl
t+Qylr9PSpck/paRhd2lgZzG8Nk+B5HJDcxBQbW4pwmbc69YAbqYzt44i4bDpR5K
u3mBve1Ulng7peP45EZB8BA32ffCpOEAC/9SdkaPokrSv6XxxPEFvzewy+mLtioU
a0zY0iuHqVGpOTABK65fXO/zkGiZLPXJ1T5vK7Iz8mOwuvtYVif0yktQSrx3BWbc
9r64W7Si633wWt/C9LkuMMSmQ7nI/PyHk811cDOcxp3SA79JV5SWwdQl+5QPdtoP
hyToaISfAY0BiNI9ltdscx3MPjlwSp08xXvi46RjSs8E2TXnbHUw+J5mTsxYuocl
Yi61nlL5ClhCbySf9Q3GFsuAJ3O2Nq9WkCTNRIvJtrMhe3NeqDDTfBZJRD4Bfg1G
q8RAc5oqGZDtqKHtLfwULr7Ec2Ru0hIZAyN907OwW+4jBR/eBJB1y+nGrNPtTWPT
OOcvrxe85/+ZNROGCZKr0L8UA/MBBMZtjvMY8RMXjBE4YJbakq7tV+7l5VolKeNH
G6I/1CC06qVPHrnetM6YejhtnmOQ4F8P1sE0wvpG0QTyHJoFq+aOhHNKJC8F9Eln
CQM2apvL4BHvS7OHt9XL
=Pf0d
-----END PGP SIGNATURE-----
Merge tag 'dmaengine-4.10-rc1' of git://git.infradead.org/users/vkoul/slave-dma
Pull dmaengine updates from Vinod Koul:
"Fairly routine update this time around with all changes specific to
drivers:
- New driver for STMicroelectronics FDMA
- Memory-to-memory transfers on dw dmac
- Support for slave maps on pl08x devices
- Bunch of driver fixes to use dma_pool_zalloc
- Bunch of compile and warning fixes spread across drivers"
[ The ST FDMA driver already came in earlier through the remoteproc tree ]
* tag 'dmaengine-4.10-rc1' of git://git.infradead.org/users/vkoul/slave-dma: (68 commits)
dmaengine: sirf-dma: remove unused ‘sdesc’
dmaengine: pl330: remove unused ‘regs’
dmaengine: s3c24xx: remove unused ‘cdata’
dmaengine: stm32-dma: remove unused ‘src_addr’
dmaengine: stm32-dma: remove unused ‘dst_addr’
dmaengine: stm32-dma: remove unused ‘sfcr’
dmaengine: pch_dma: remove unused ‘cookie’
dmaengine: mic_x100_dma: remove unused ‘data’
dmaengine: img-mdc: remove unused ‘prev_phys’
dmaengine: usb-dmac: remove unused ‘uchan’
dmaengine: ioat: remove unused ‘res’
dmaengine: ioat: remove unused ‘ioat_dma’
dmaengine: ioat: remove unused ‘is_raid_device’
dmaengine: pl330: do not generate unaligned access
dmaengine: k3dma: move to dma_pool_zalloc
dmaengine: at_hdmac: move to dma_pool_zalloc
dmaengine: at_xdmac: don't restore unsaved status
dmaengine: ioat: set error code on failures
dmaengine: ioat: set error code on failures
dmaengine: DW DMAC: add multi-block property to device tree
...
Summary of modules changes for the 4.10 merge window:
* The rodata= cmdline parameter has been extended to additionally
apply to module mappings
* Fix a hard to hit race between module loader error/clean up
handling and ftrace registration
* Some code cleanups, notably panic.c and modules code use a
unified taint_flags table now. This is much cleaner than
duplicating the taint flag code in modules.c
Signed-off-by: Jessica Yu <jeyu@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQIcBAABCgAGBQJYUf6/AAoJEMBFfjjOO8Fy5NoP+gOIus26yWWGymI495jVnX7n
wCga5JgwOL0SLBIPmiDVI7K+jz4eoQZb94eJcwkWDuw2/IvOdF1kB8ha1EOBRMSg
nb9HfIDlWiAPKkyUxe+k6XDb+BMPN3FUSYmBAKD3utsQkD1JWBLY8Id4e234y8Fo
sb3a6rLJbvIEXANrMeU7zO4/y1bVxQAeQPQbVPwlid5s76RKYH6JdGXoo6FKK0uE
Z3I8uQjqjmJ5U4vpjjWl0w+Qa7hIm/x05GpirtNxN6ztxjR+98c/4uRIry8oOX+I
KqRXDOnJ1l/rCwhp+pGLwPfCoDds+V3bknyOwYoxK3hqVVUAd8H0qd1JQ8XClwyJ
jnE0+EQpTt9brOO1Oq2XC+EDjpiuyYm3u91TFwE2VFmP98daBZsX6qY7bm03/GQq
ZLRthWPILNX9glGj4nbHQgdAKmRvYDO3SzWjFZNA75Mr2hbRKLJoWNvfgupDgjsF
giawxV/OcWXvEX92fzkwoUszpfWwoDhGsbimG2SCKYB87vNniG7wrgdjp5aWHhOL
qCUpUhCvE9/dO7kPRinqk5tnpAUGY2jMZ0QgVbpToF6FiHJJSyDjWHR9n0Bl1QTX
uAEZB/Hoav9frZ+MQC/1Yzhq5ejDbEm1ByjolJgbjl6YHBlQceL6NQpFmyEkrn7c
Tx+Q/PvG7/gfxFGMirf1
=bhCS
-----END PGP SIGNATURE-----
Merge tag 'modules-for-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux
Pull modules updates from Jessica Yu:
"Summary of modules changes for the 4.10 merge window:
- The rodata= cmdline parameter has been extended to additionally
apply to module mappings
- Fix a hard to hit race between module loader error/clean up
handling and ftrace registration
- Some code cleanups, notably panic.c and modules code use a unified
taint_flags table now. This is much cleaner than duplicating the
taint flag code in modules.c"
* tag 'modules-for-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
module: fix DEBUG_SET_MODULE_RONX typo
module: extend 'rodata=off' boot cmdline parameter to module mappings
module: Fix a comment above strong_try_module_get()
module: When modifying a module's text ignore modules which are going away too
module: Ensure a module's state is set accordingly during module coming cleanup code
module: remove trailing whitespace
taint/module: Clean up global and module taint flags handling
modpost: free allocated memory
Merge more updates from Andrew Morton:
- a few misc things
- kexec updates
- DMA-mapping updates to better support networking DMA operations
- IPC updates
- various MM changes to improve DAX fault handling
- lots of radix-tree changes, mainly to the test suite. All leading up
to reimplementing the IDA/IDR code to be a wrapper layer over the
radix-tree. However the final trigger-pulling patch is held off for
4.11.
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (114 commits)
radix tree test suite: delete unused rcupdate.c
radix tree test suite: add new tag check
radix-tree: ensure counts are initialised
radix tree test suite: cache recently freed objects
radix tree test suite: add some more functionality
idr: reduce the number of bits per level from 8 to 6
rxrpc: abstract away knowledge of IDR internals
tpm: use idr_find(), not idr_find_slowpath()
idr: add ida_is_empty
radix tree test suite: check multiorder iteration
radix-tree: fix replacement for multiorder entries
radix-tree: add radix_tree_split_preload()
radix-tree: add radix_tree_split
radix-tree: add radix_tree_join
radix-tree: delete radix_tree_range_tag_if_tagged()
radix-tree: delete radix_tree_locate_item()
radix-tree: improve multiorder iterators
btrfs: fix race in btrfs_free_dummy_fs_info()
radix-tree: improve dump output
radix-tree: make radix_tree_find_next_bit more useful
...
Pull fs meta data unmap optimization from Jens Axboe:
"A series from Jan Kara, providing a more efficient way for unmapping
meta data from in the buffer cache than doing it block-by-block.
Provide a general helper that existing callers can use"
* 'for-4.10/fs-unmap' of git://git.kernel.dk/linux-block:
fs: Remove unmap_underlying_metadata
fs: Add helper to clean bdev aliases under a bh and use it
ext2: Use clean_bdev_aliases() instead of iteration
ext4: Use clean_bdev_aliases() instead of iteration
direct-io: Use clean_bdev_aliases() instead of handmade iteration
fs: Provide function to unmap metadata for a range of blocks
In preparation for merging the IDR and radix tree, reduce the fanout at
each level from 256 to 64. If this causes a performance problem then a
bisect will point to this commit, and we'll have a better idea about
what we might do to fix it.
Link: http://lkml.kernel.org/r/1480369871-5271-66-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add idr_get_cursor() / idr_set_cursor() APIs, and remove the reference
to IDR_SIZE.
Link: http://lkml.kernel.org/r/1480369871-5271-65-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: David Howells <dhowells@redhat.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Two of the USB Gadgets were poking around in the internals of struct ida
in order to determine if it is empty. Add the appropriate abstraction.
Link: http://lkml.kernel.org/r/1480369871-5271-63-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Acked-by: Konstantin Khlebnikov <koct9i@gmail.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Felipe Balbi <balbi@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Calculate how many nodes we need to allocate to split an old_order entry
into multiple entries, each of size new_order. The test suite checks
that we allocated exactly the right number of nodes; neither too many
(checked by rtp->nr == 0), nor too few (checked by comparing
nr_allocated before and after the call to radix_tree_split()).
Link: http://lkml.kernel.org/r/1480369871-5271-60-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This new function splits a larger multiorder entry into smaller entries
(potentially multi-order entries). These entries are initialised to
RADIX_TREE_RETRY to ensure that RCU walkers who see this state aren't
confused. The caller should then call radix_tree_for_each_slot() and
radix_tree_replace_slot() in order to turn these retry entries into the
intended new entries. Tags are replicated from the original multiorder
entry into each new entry.
Link: http://lkml.kernel.org/r/1480369871-5271-59-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This new function allows for the replacement of many smaller entries in
the radix tree with one larger multiorder entry. From the point of view
of an RCU walker, they may see a mixture of the smaller entries and the
large entry during the same walk, but they will never see NULL for an
index which was populated before the join.
Link: http://lkml.kernel.org/r/1480369871-5271-58-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is an exceptionally complicated function with just one caller
(tag_pages_for_writeback). We devote a large portion of the runtime of
the test suite to testing this one function which has one caller. By
introducing the new function radix_tree_iter_tag_set(), we can eliminate
all of the complexity while keeping the performance. The caller can now
use a fairly standard radix_tree_for_each() loop, and it doesn't need to
worry about tricksy things like 'start' wrapping.
The test suite continues to spend a large amount of time investigating
this function, but now it's testing the underlying primitives such as
radix_tree_iter_resume() and the radix_tree_for_each_tagged() iterator
which are also used by other parts of the kernel.
Link: http://lkml.kernel.org/r/1480369871-5271-57-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This rather complicated function can be better implemented as an
iterator. It has only one caller, so move the functionality to the only
place that needs it. Update the test suite to follow the same pattern.
Link: http://lkml.kernel.org/r/1480369871-5271-56-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Konstantin Khlebnikov <koct9i@gmail.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This fixes several interlinked problems with the iterators in the
presence of multiorder entries.
1. radix_tree_iter_next() would only advance by one slot, which would
result in the iterators returning the same entry more than once if
there were sibling entries.
2. radix_tree_next_slot() could return an internal pointer instead of
a user pointer if a tagged multiorder entry was immediately followed by
an entry of lower order.
3. radix_tree_next_slot() expanded to a lot more code than it used to
when multiorder support was compiled in. And I wasn't comfortable with
entry_to_node() being in a header file.
Fixing radix_tree_iter_next() for the presence of sibling entries
necessarily involves examining the contents of the radix tree, so we now
need to pass 'slot' to radix_tree_iter_next(), and we need to change the
calling convention so it is called *before* dropping the lock which
protects the tree. Also rename it to radix_tree_iter_resume(), as some
people thought it was necessary to call radix_tree_iter_next() each time
around the loop.
radix_tree_next_slot() becomes closer to how it looked before multiorder
support was introduced. It only checks to see if the next entry in the
chunk is a sibling entry or a pointer to a node; this should be rare
enough that handling this case out of line is not a performance impact
(and such impact is amortised by the fact that the entry we just
processed was a multiorder entry). Also, radix_tree_next_slot() used to
force a new chunk lookup for untagged entries, which is more expensive
than the out of line sibling entry skipping.
Link: http://lkml.kernel.org/r/1480369871-5271-55-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I want to be able to reference node->parent after freeing node.
Currently node->parent is in a union with rcu_head, so it is overwritten
when the node is put on the RCU list. We know that private_list is not
referenced after the node is freed, so it is safe for these two members
to share space.
Link: http://lkml.kernel.org/r/1480369871-5271-50-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
DAX will need to implement its own version of page_check_address(). To
avoid duplicating page table walking code, export follow_pte() which
does what we need.
Link: http://lkml.kernel.org/r/1479460644-25076-18-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Provide a helper function for finishing write faults due to PTE being
read-only. The helper will be used by DAX to avoid the need of
complicating generic MM code with DAX locking specifics.
Link: http://lkml.kernel.org/r/1479460644-25076-16-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Move final handling of COW faults from generic code into DAX fault
handler. That way generic code doesn't have to be aware of
peculiarities of DAX locking so remove that knowledge and make locking
functions private to fs/dax.c.
Link: http://lkml.kernel.org/r/1479460644-25076-11-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Introduce finish_fault() as a helper function for finishing page faults.
It is rather thin wrapper around alloc_set_pte() but since we'd want to
call this from DAX code or filesystems, it is still useful to avoid some
boilerplate code.
Link: http://lkml.kernel.org/r/1479460644-25076-10-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "dax: Clear dirty bits after flushing caches", v5.
Patchset to clear dirty bits from radix tree of DAX inodes when caches
for corresponding pfns have been flushed. In principle, these patches
enable handlers to easily update PTEs and do other work necessary to
finish the fault without duplicating the functionality present in the
generic code. I'd like to thank Kirill and Ross for reviews of the
series!
This patch (of 20):
To allow full handling of COW faults add memcg field to struct vm_fault
and a return value of ->fault() handler meaning that COW fault is fully
handled and memcg charge must not be canceled. This will allow us to
remove knowledge about special DAX locking from the generic fault code.
Link: http://lkml.kernel.org/r/1479460644-25076-9-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add orig_pte field to vm_fault structure to allow ->page_mkwrite
handlers to fully handle the fault.
This also allows us to save some passing of extra arguments around.
Link: http://lkml.kernel.org/r/1479460644-25076-8-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Every single user of vmf->virtual_address typed that entry to unsigned
long before doing anything with it so the type of virtual_address does
not really provide us any additional safety. Just use masked
vmf->address which already has the appropriate type.
Link: http://lkml.kernel.org/r/1479460644-25076-3-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently we have two different structures for passing fault information
around - struct vm_fault and struct fault_env. DAX will need more
information in struct vm_fault to handle its faults so the content of
that structure would become event closer to fault_env. Furthermore it
would need to generate struct fault_env to be able to call some of the
generic functions. So at this point I don't think there's much use in
keeping these two structures separate. Just embed into struct vm_fault
all that is needed to use it for both purposes.
Link: http://lkml.kernel.org/r/1479460644-25076-2-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Unexport the low-level __get_user_pages_unlocked() function and replaces
invocations with calls to more appropriate higher-level functions.
In hva_to_pfn_slow() we are able to replace __get_user_pages_unlocked()
with get_user_pages_unlocked() since we can now pass gup_flags.
In async_pf_execute() and process_vm_rw_single_vec() we need to pass
different tsk, mm arguments so get_user_pages_remote() is the sane
replacement in these cases (having added manual acquisition and release
of mmap_sem.)
Additionally get_user_pages_remote() reintroduces use of the FOLL_TOUCH
flag. However, this flag was originally silently dropped by commit
1e9877902dc7 ("mm/gup: Introduce get_user_pages_remote()"), so this
appears to have been unintentional and reintroducing it is therefore not
an issue.
[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20161027095141.2569-3-lstoakes@gmail.com
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "mm: unexport __get_user_pages_unlocked()".
This patch series continues the cleanup of get_user_pages*() functions
taking advantage of the fact we can now pass gup_flags as we please.
It firstly adds an additional 'locked' parameter to
get_user_pages_remote() to allow for its callers to utilise
VM_FAULT_RETRY functionality. This is necessary as the invocation of
__get_user_pages_unlocked() in process_vm_rw_single_vec() makes use of
this and no other existing higher level function would allow it to do
so.
Secondly existing callers of __get_user_pages_unlocked() are replaced
with the appropriate higher-level replacement -
get_user_pages_unlocked() if the current task and memory descriptor are
referenced, or get_user_pages_remote() if other task/memory descriptors
are referenced (having acquiring mmap_sem.)
This patch (of 2):
Add a int *locked parameter to get_user_pages_remote() to allow
VM_FAULT_RETRY faulting behaviour similar to get_user_pages_[un]locked().
Taking into account the previous adjustments to get_user_pages*()
functions allowing for the passing of gup_flags, we are now in a
position where __get_user_pages_unlocked() need only be exported for his
ability to allow VM_FAULT_RETRY behaviour, this adjustment allows us to
subsequently unexport __get_user_pages_unlocked() as well as allowing
for future flexibility in the use of get_user_pages_remote().
[sfr@canb.auug.org.au: merge fix for get_user_pages_remote API change]
Link: http://lkml.kernel.org/r/20161122210511.024ec341@canb.auug.org.au
Link: http://lkml.kernel.org/r/20161027095141.2569-2-lstoakes@gmail.com
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "Clean up watchdog handlers", v2.
This is an attempt to cleanup watchdog handlers. Right now,
kernel/watchdog.c implements both softlockup and hardlockup detectors.
Softlockup code is generic. Hardlockup code is arch specific. Some
architectures don't use hardlockup detectors. They use their own
watchdog detectors. To make both these combination work, we have
numerous #ifdefs in kernel/watchdog.c.
We are trying here to make these handlers independent of each other.
Also provide an interface for architectures to implement their own
handlers. watchdog_nmi_enable and watchdog_nmi_disable will be defined
as weak such that architectures can override its definitions.
Thanks to Don Zickus for his suggestions.
Here are our previous discussions
http://www.spinics.net/lists/sparclinux/msg16543.htmlhttp://www.spinics.net/lists/sparclinux/msg16441.html
This patch (of 3):
Move shared macros and definitions to nmi.h so that watchdog.c, new file
watchdog_hld.c or any other architecture specific handler can use those
definitions.
Link: http://lkml.kernel.org/r/1478034826-43888-2-git-send-email-babu.moger@oracle.com
Signed-off-by: Babu Moger <babu.moger@oracle.com>
Acked-by: Don Zickus <dzickus@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Yaowei Bai <baiyaowei@cmss.chinamobile.com>
Cc: Aaron Tomlin <atomlin@redhat.com>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Josh Hunt <johunt@akamai.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
kdb_trap_printk allows to pass normal printk() messages to kdb via
vkdb_printk(). For example, it is used to get backtrace using the
classic show_stack(), see kdb_show_stack().
vkdb_printf() tries to avoid a potential infinite loop by disabling the
trap. But this approach is racy, for example:
CPU1 CPU2
vkdb_printf()
// assume that kdb_trap_printk == 0
saved_trap_printk = kdb_trap_printk;
kdb_trap_printk = 0;
kdb_show_stack()
kdb_trap_printk++;
Problem1: Now, a nested printk() on CPU0 calls vkdb_printf()
even when it should have been disabled. It will not
cause a deadlock but...
// using the outdated saved value: 0
kdb_trap_printk = saved_trap_printk;
kdb_trap_printk--;
Problem2: Now, kdb_trap_printk == -1 and will stay like this.
It means that all messages will get passed to kdb from
now on.
This patch removes the racy saved_trap_printk handling. Instead, the
recursion is prevented by a check for the locked CPU.
The solution is still kind of racy. A non-related printk(), from
another process, might get trapped by vkdb_printf(). And the wanted
printk() might not get trapped because kdb_printf_cpu is assigned. But
this problem existed even with the original code.
A proper solution would be to get_cpu() before setting kdb_trap_printk
and trap messages only from this CPU. I am not sure if it is worth the
effort, though.
In fact, the race is very theoretical. When kdb is running any of the
commands that use kdb_trap_printk there is a single active CPU and the
other CPUs should be in a holding pen inside kgdb_cpu_enter().
The only time this is violated is when there is a timeout waiting for
the other CPUs to report to the holding pen.
Finally, note that the situation is a bit schizophrenic. vkdb_printf()
explicitly allows recursion but only from KDB code that calls
kdb_printf() directly. On the other hand, the generic printk()
recursion is not allowed because it might cause an infinite loop. This
is why we could not hide the decision inside vkdb_printf() easily.
Link: http://lkml.kernel.org/r/1480412276-16690-4-git-send-email-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
kdb_event state variable is only set but never checked in the kernel
code.
http://www.spinics.net/lists/kdb/msg01733.html suggests that this
variable affected WARN_CONSOLE_UNLOCKED() in the original
implementation. But this check never went upstream.
The semantic is unclear and racy. The value is updated after the
kdb_printf_lock is acquired and after it is released. It should be
symmetric at minimum. The value should be manipulated either inside or
outside the locked area.
Fortunately, it seems that the original function is gone and we could
simply remove the state variable.
Link: http://lkml.kernel.org/r/1480412276-16690-2-git-send-email-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Suggested-by: Daniel Thompson <daniel.thompson@linaro.org>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a function that allows us to batch free a page that has multiple
references outstanding. Specifically this function can be used to drop
a page being used in the page frag alloc cache. With this drivers can
make use of functionality similar to the page frag alloc cache without
having to do any workarounds for the fact that there is no function that
frees multiple references.
Link: http://lkml.kernel.org/r/20161110113606.76501.70752.stgit@ahduyck-blue-test.jf.intel.com
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no>
Cc: Helge Deller <deller@gmx.de>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Keguang Zhang <keguang.zhang@gmail.com>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Steven Miao <realmz6@gmail.com>
Cc: Tobias Klauser <tklauser@distanz.ch>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add support for mapping and unmapping a page with attributes.
The primary use for this is currently to allow for us to pass the
DMA_ATTR_SKIP_CPU_SYNC attribute when mapping and unmapping a page. On
some architectures such as ARM the synchronization has significant
overhead and if we are already taking care of the sync_for_cpu and
sync_for_device from the driver there isn't much need to handle this in
the map/unmap calls as well.
Link: http://lkml.kernel.org/r/20161110113601.76501.46095.stgit@ahduyck-blue-test.jf.intel.com
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit 0549a3c02efb ("kdump, vmcoreinfo: report memory
sections virtual addresses").
Commit 0549a3c02efb tells the userspace utility makedumpfile the
randomized base address of these memmory sections when mm kaslr is
enabled. However the following patch "kexec: export the value of
phys_base instead of symbol address" makes makedumpfile not need these
addresses any more.
Besides we should use VMCOREINFO_NUMBER to export the value of the
variable so that we can use the existing number_table mechanism of
Makedumpfile to fetch it. So revert it now. If needed we can add it
later.
http://lists.infradead.org/pipermail/kexec/2016-October/017540.html
Link: http://lkml.kernel.org/r/1478568596-30060-1-git-send-email-bhe@redhat.com
Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Xunlei Pang <xlpang@redhat.com>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Eugene Surovegin <surovegin@google.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Atsushi Kumagai <ats-kumagai@wm.jp.nec.com>
Cc: Dave Anderson <anderson@redhat.com>
Cc: Pratyush Anand <panand@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When running certain database workload on a high-end system with many
CPUs, it was found that spinlock contention in the sigprocmask syscalls
became a significant portion of the overall CPU cycles as shown below.
9.30% 9.30% 905387 dataserver /proc/kcore 0x7fff8163f4d2
[k] _raw_spin_lock_irq
|
---_raw_spin_lock_irq
|
|--99.34%-- __set_current_blocked
| sigprocmask
| sys_rt_sigprocmask
| system_call_fastpath
| |
| |--50.63%-- __swapcontext
| | |
| | |--99.91%-- upsleepgeneric
| |
| |--49.36%-- __setcontext
| | ktskRun
Looking further into the swapcontext function in glibc, it was found that
the function always call sigprocmask() without checking if there are
changes in the signal mask.
A check was added to the __set_current_blocked() function to avoid taking
the sighand->siglock spinlock if there is no change in the signal mask.
This will prevent unneeded spinlock contention when many threads are
trying to call sigprocmask().
With this patch applied, the spinlock contention in sigprocmask() was
gone.
Link: http://lkml.kernel.org/r/1474979209-11867-1-git-send-email-Waiman.Long@hpe.com
Signed-off-by: Waiman Long <Waiman.Long@hpe.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stas Sergeev <stsp@list.ru>
Cc: Scott J Norton <scott.norton@hpe.com>
Cc: Douglas Hatch <doug.hatch@hpe.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull namespace updates from Eric Biederman:
"After a lot of discussion and work we have finally reachanged a basic
understanding of what is necessary to make unprivileged mounts safe in
the presence of EVM and IMA xattrs which the last commit in this
series reflects. While technically it is a revert the comments it adds
are important for people not getting confused in the future. Clearing
up that confusion allows us to seriously work on unprivileged mounts
of fuse in the next development cycle.
The rest of the fixes in this set are in the intersection of user
namespaces, ptrace, and exec. I started with the first fix which
started a feedback cycle of finding additional issues during review
and fixing them. Culiminating in a fix for a bug that has been present
since at least Linux v1.0.
Potentially these fixes were candidates for being merged during the rc
cycle, and are certainly backport candidates but enough little things
turned up during review and testing that I decided they should be
handled as part of the normal development process just to be certain
there were not any great surprises when it came time to backport some
of these fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
Revert "evm: Translate user/group ids relative to s_user_ns when computing HMAC"
exec: Ensure mm->user_ns contains the execed files
ptrace: Don't allow accessing an undumpable mm
ptrace: Capture the ptracer's creds not PT_PTRACE_CAP
mm: Add a user_ns owner to mm_struct and fix ptrace permission checks
r_safe_completion is currently, and has always been, signaled only if
on-disk ack was requested. It's there for fsync and syncfs, which wait
for in-flight writes to flush - all data write requests set ONDISK.
However, the pool perm check code introduced in 4.2 sends a write
request with only ACK set. An unfortunately timed syncfs can then hang
forever: r_safe_completion won't be signaled because only an unsafe
reply was requested.
We could patch ceph_osdc_sync() to skip !ONDISK write requests, but
that is somewhat incomplete and yet another special case. Instead,
rename this completion to r_done_completion and always signal it when
the OSD client is done with the request, whether unsafe, safe, or
error. This is a bit cleaner and helps with the cancellation code.
Reported-by: Yan, Zheng <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Pull crypto updates from Herbert Xu:
"Here is the crypto update for 4.10:
API:
- add skcipher walk interface
- add asynchronous compression (acomp) interface
- fix algif_aed AIO handling of zero buffer
Algorithms:
- fix unaligned access in poly1305
- fix DRBG output to large buffers
Drivers:
- add support for iMX6UL to caam
- fix givenc descriptors (used by IPsec) in caam
- accelerated SHA256/SHA512 for ARM64 from OpenSSL
- add SSE CRCT10DIF and CRC32 to ARM/ARM64
- add AEAD support to Chelsio chcr
- add Armada 8K support to omap-rng"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (148 commits)
crypto: testmgr - fix overlap in chunked tests again
crypto: arm/crc32 - accelerated support based on x86 SSE implementation
crypto: arm64/crc32 - accelerated support based on x86 SSE implementation
crypto: arm/crct10dif - port x86 SSE implementation to ARM
crypto: arm64/crct10dif - port x86 SSE implementation to arm64
crypto: testmgr - add/enhance test cases for CRC-T10DIF
crypto: testmgr - avoid overlap in chunked tests
crypto: chcr - checking for IS_ERR() instead of NULL
crypto: caam - check caam_emi_slow instead of re-lookup platform
crypto: algif_aead - fix AIO handling of zero buffer
crypto: aes-ce - Make aes_simd_algs static
crypto: algif_skcipher - set error code when kcalloc fails
crypto: caam - make aamalg_desc a proper module
crypto: caam - pass key buffers with typesafe pointers
crypto: arm64/aes-ce-ccm - Fix AEAD decryption length
MAINTAINERS: add crypto headers to crypto entry
crypt: doc - remove misleading mention of async API
crypto: doc - fix header file name
crypto: api - fix comment typo
crypto: skcipher - Add separate walker for AEAD decryption
..
Pull HID updates from Jiri Kosina:
- support for new Wacom "MobileStudio Pro" class of tablets from Jason
Gerecke
- Microsoft Surface 3 support from Benjamin Tissoires and Microsoft
Surface 4 support from Daniel Keller
- uDraw PS3 tablet support from Bastien Nocera
- timeout scheduling fixes for intel-ish-hid from Even Xu
- HID_QUIRK_MULTI_INPUT in order to simplify LED handling from Benjamin
Tissoires
- support for Sony DS4 dongle and various other fixes to Sony driver
from Roderick Colenbrander
- other assorted smaller fixes and device ID additions
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (63 commits)
HID: fix missing irq field
HID: i2c-hid: fix build
HID: i2c-hid: Disable IRQ before freeing buffers
HID: usbhid: fix improper return value
HID: wacom: generic: Don't sync input on empty input packets
HID: wacom: generic: Pad supports more than buttons
HID: wacom: generic: Send data only when the interface is defined
HID: wacom: generic: Don't return a value for wacom_wac_event
HID: intel_ish-hid: use %pUL for uuid formatting
HID: cp2112: explicitly require irqchip support in gpiolib
HID: asus: Add i2c touchpad support
HID: intel-ish-hid: Fix potential race condition
HID: sony: Support DS4 dongle
HID: sony: Comply to Linux gamepad spec for DS4
HID: sony: Make the DS4 touchpad a separate device
HID: sony: Fix memory issue when connecting device using both Bluetooth and USB
HID: cp2112: add IRQ chip handling
HID: i2c-hid: force the IRQ level trigger only when not set
HID: multitouch: do not retrieve all reports for all devices
HID: multitouch: enable the Surface 3 Type Cover to report multitouch data
...
This update includes the usual round of major driver updates (ncr5380,
lpfc, hisi_sas, megaraid_sas, ufs, ibmvscsis, mpt3sas). There's also
an assortment of minor fixes, mostly in error legs or other not very
user visible stuff. The major change is the pci_alloc_irq_vectors
replacement for the old pci_msix_.. calls; this effectively makes IRQ
mapping generic for the drivers and allows blk_mq to use the
information.
Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJYUJOvAAoJEAVr7HOZEZN42B0P/1lj1W2N7y0LOAbR2MasyQvT
fMD/SSip/v+R/zJiTv+5M/IDQT6ez62JnQGWyX3HZTob9VEfoqagbUuHH6y+fmib
tHyqiYc9FC/NZSRX/0ib+rpnSVhC/YRSVV7RrAqilbpOKAzeU25FlN/vbz+Nv/XL
ReVBl+2nGjJtHyWqUN45Zuf74c0zXOWPPUD0hRaNclK5CfZv5wDYupzHzTNSQTkj
nWvwPYT0OgSMNe7mR+IDFyOe3UQ/OYyeJB0yBNqO63IiaUabT8/hgrWR9qJAvWh8
LiH+iSQ69+sDUnvWvFjuls/GzcFuuTljwJbm+FyTsmNHONPVY8JRCLNq7CNDJ6Vx
HwpNuJdTSJpne4lAVBGPwgjs+GhlMvUP/xYVLWAXdaBxU9XGePrwqQDcFu1Rbx3P
yfMiVaY1+e45OEjLRCbDAwFnMPevC3kyymIvSsTySJxhTbYrOsyrrWt5kwWsvE3r
SKANsub+xUnpCkyg57nXRQStJSCiSfGIDsydKmMX+pf1SR4k6gCUQZlcchUX0uOa
dcY6re0c7EJIQQiT7qeGP5TRBblxARocCA/Igx6b5U5HmuQ48tDFlMCps7/TE84V
JBsBnmkXcEi/ALShL/Tui+3YKA1DfOtEnwHtXx/9Ecx/nxP2Sjr9LJwCKiONv8NY
RgLpGfccrix34lQumOk5
=sPXh
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"This update includes the usual round of major driver updates (ncr5380,
lpfc, hisi_sas, megaraid_sas, ufs, ibmvscsis, mpt3sas).
There's also an assortment of minor fixes, mostly in error legs or
other not very user visible stuff. The major change is the
pci_alloc_irq_vectors replacement for the old pci_msix_.. calls; this
effectively makes IRQ mapping generic for the drivers and allows
blk_mq to use the information"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (256 commits)
scsi: qla4xxx: switch to pci_alloc_irq_vectors
scsi: hisi_sas: support deferred probe for v2 hw
scsi: megaraid_sas: switch to pci_alloc_irq_vectors
scsi: scsi_devinfo: remove synchronous ALUA for NETAPP devices
scsi: be2iscsi: set errno on error path
scsi: be2iscsi: set errno on error path
scsi: hpsa: fallback to use legacy REPORT PHYS command
scsi: scsi_dh_alua: Fix RCU annotations
scsi: hpsa: use %phN for short hex dumps
scsi: hisi_sas: fix free'ing in probe and remove
scsi: isci: switch to pci_alloc_irq_vectors
scsi: ipr: Fix runaway IRQs when falling back from MSI to LSI
scsi: dpt_i2o: double free on error path
scsi: cxlflash: Migrate scsi command pointer to AFU command
scsi: cxlflash: Migrate IOARRIN specific routines to function pointers
scsi: cxlflash: Cleanup queuecommand()
scsi: cxlflash: Cleanup send_tmf()
scsi: cxlflash: Remove AFU command lock
scsi: cxlflash: Wait for active AFU commands to timeout upon tear down
scsi: cxlflash: Remove private command pool
...
from the ->drop_link method.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJYUUZ8AAoJEA+eU2VSBFGDjHIP/RKCBerQNkQ5hdUPnwJkXOh1
b4SWiPNsNgkzpKJsE/vuL6R2qq9KX5sxWLi8a3Wx5jyMst1Xj2m4SvI4fUew/hBC
QZM86aMJLcWZ8BfqLk2tLys759z/MSZSrXcUU8EM+JJqb6E0+i+5pgX8gOk7Pxwo
FUagFzuGXTGORwkWRf47ludBuDEGMCrLQZ6WaXEKQULTUwfPrnP+n9EhZcWzzsyu
0YxW9SFD73LgRSPLgxc+rw875D8rb3WSClWj/2LLQUy8z8QEJ83Mgt9hcbBV0Ppa
efP0kPZbpDnVx6TjpldRKW9GivkbFXNnChMmgTkBGYTRjn8IHDsyAb6ZABw/O37N
oyvd42xVDCE+GSImaMgCPL/5MEsQ+v9xCfgkBcyhWVQYFFj89Nmz/8VKp6AtTW3j
X7MQuzdzKGoWTVpAOgw/SvjrRx+fcciTg31AhhGjE5cmARVoBJuCDa6NM3WXFQf4
Pq74zetWDB38sBZwQV/6Y1m3OJGquD4MxX9b5SbNzwuROrKyJCAe3CCw7CKvuuWj
RPkwZkHiCawRijxNDCWU8zpMcuUCdt9yjTWbUW/WrvKR6BGF4IwUHf9k8oorv1Qc
Bo7enURnYBmcb7cijOLO+onzCf5iWAwMNk8KKJ1zthaZbloIGw8jdp5kOe5spEvF
Uju3n/GNA/OWe2MAuutE
=nOOH
-----END PGP SIGNATURE-----
Merge tag 'configfs-for-4.10' of git://git.infradead.org/users/hch/configfs
Pull configfs update from Christoph Hellwig:
"Just one simple change from Andrzej to drop the pointless return value
from the ->drop_link method"
* tag 'configfs-for-4.10' of git://git.infradead.org/users/hch/configfs:
fs: configfs: don't return anything from drop_link