linux

mirror of https://github.com/FEX-Emu/linux.git synced 2024-12-28 20:37:27 +00:00

History

Michal Hocko 72b39cfc4d mm, memory_hotplug: do not fail offlining too early Patch series "mm, memory_hotplug: redefine memory offline retry logic", v2. While testing memory hotplug on a large 4TB machine we have noticed that memory offlining is just too eager to fail. The primary reason is that the retry logic is just too easy to give up. We have 4 ways out of the offline - we have a permanent failure (isolation or memory notifiers fail, or hugetlb pages cannot be dropped) - userspace sends a signal - a hardcoded 120s timeout expires - page migration fails 5 times This is way too convoluted and it doesn't scale very well. We have seen both temporary migration failures as well as 120s being triggered. After removing those restrictions we were able to pass stress testing during memory hot remove without any other negative side effects observed. Therefore I suggest dropping both hard coded policies. I couldn't have found any specific reason for them in the changelog. I neither didn't get any response [1] from Kamezawa. If we need some upper bound - e.g. timeout based - then we should have a proper and user defined policy for that. In any case there should be a clear use case when introducing it. This patch (of 2): Memory offlining can fail too eagerly under heavy memory pressure. page:ffffea22a646bd00 count:255 mapcount:252 mapping:ffff88ff926c9f38 index:0x3 flags: 0x9855fe40010048(uptodate\|active\|mappedtodisk) page dumped because: isolation failed page->mem_cgroup:ffff8801cd662000 memory offlining [mem 0x18b580000000-0x18b5ffffffff] failed Isolation has failed here because the page is not on LRU. Most probably because it was on the pcp LRU cache or it has been removed from the LRU already but it hasn't been freed yet. In both cases the page doesn't look non-migrable so retrying more makes sense. __offline_pages seems rather cluttered when it comes to the retry logic. We have 5 retries at maximum and a timeout. We could argue whether the timeout makes sense but failing just because of a race when somebody isoltes a page from LRU or puts it on a pcp LRU lists is just wrong. It only takes it to race with a process which unmaps some pages and remove them from the LRU list and we can fail the whole offline because of something that is a temporary condition and actually not harmful for the offline. Please note that unmovable pages should be already excluded during start_isolate_page_range. We could argue that has_unmovable_pages is racy and MIGRATE_MOVABLE check doesn't provide any hard guarantee either but kernel zones (aka < ZONE_MOVABLE) will very likely detect unmovable pages in most cases and movable zone shouldn't contain unmovable pages at all. Some of those pages might be pinned but not for ever because that would be a bug on its own. In any case the context is still interruptible and so the userspace can easily bail out when the operation takes too long. This is certainly better behavior than a hardcoded retry loop which is racy. Fix this by removing the max retry count and only rely on the timeout resp. interruption by a signal from the userspace. Also retry rather than fail when check_pages_isolated sees some !free pages because those could be a result of the race as well. Link: http://lkml.kernel.org/r/20170918070834.13083-2-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Reza Arbab <arbab@linux.vnet.ibm.com> Cc: Yasuaki Ishimatsu <yasu.isimatu@gmail.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>		2017-11-15 18:21:02 -08:00
..
kasan	slab, slub, slob: add slab_flags_t	2017-11-15 18:21:01 -08:00
backing-dev.c	backing-dev: kill unused pdflush_proc_obsolete()	2017-10-06 08:15:15 -06:00
balloon_compaction.c	mm/migrate: new migrate mode MIGRATE_SYNC_NO_COPY	2017-09-08 18:26:46 -07:00
bootmem.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
cleancache.c	fs: switch ->s_uuid to uuid_t	2017-06-05 16:59:12 +02:00
cma_debug.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
cma.c	mm/cma.c: take __GFP_NOWARN into account in cma_alloc()	2017-10-13 16:18:32 -07:00
cma.h	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
compaction.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
debug_page_ref.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
debug.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
dmapool.c	lib/vsprintf.c: remove %Z support	2017-02-27 18:43:47 -08:00
early_ioremap.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
fadvise.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
failslab.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
filemap.c	mm: have filemap_check_and_advance_wb_err clear AS_EIO/AS_ENOSPC	2017-10-03 17:54:24 -07:00
frame_vector.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
frontswap.c	mm, frontswap: convert frontswap_enabled to static key	2016-07-26 16:19:19 -07:00
gup.c	Merge branch 'x86/urgent' into x86/mm, to pick up fixes	2017-10-20 13:06:52 +02:00
highmem.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
hmm.c	mm/hmm: avoid bloating arch that do not make use of HMM	2017-09-08 18:26:46 -07:00
huge_memory.c	Merge branch 'for-linus' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jikos/trivial	2017-11-15 10:14:11 -08:00
hugetlb_cgroup.c	mm, hugetlb_cgroup: round limit_in_bytes down to hugepage size	2016-05-20 17:58:30 -07:00
hugetlb.c	userfaultfd: hugetlbfs: prevent UFFDIO_COPY to fill beyond the end of i_size	2017-11-03 07:39:19 -07:00
hwpoison-inject.c	mm: hwpoison: call shake_page() unconditionally	2017-05-03 15:52:12 -07:00
init-mm.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
internal.h	mm, oom: do not rely on TIF_MEMDIE for memory reserves access	2017-09-06 17:27:30 -07:00
interval_tree.c	lib/interval_tree: fast overlap detection	2017-09-08 18:26:49 -07:00
Kconfig	mm/hmm: avoid bloating arch that do not make use of HMM	2017-09-08 18:26:46 -07:00
Kconfig.debug	mm: enable page poisoning early at boot	2017-05-03 15:52:10 -07:00
khugepaged.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
kmemcheck.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
kmemleak-test.c	mm: convert printk(KERN_<LEVEL> to pr_<level>	2016-03-17 15:09:34 -07:00
kmemleak.c	mm: kmemleak: treat vm_struct as alternative reference to vmalloc'ed objects	2017-07-06 16:24:34 -07:00
ksm.c	ksm: fix unlocked iteration over vmas in cmp_and_merge_page()	2017-10-03 17:54:23 -07:00
list_lru.c	mm: memcontrol: use vmalloc fallback for large kmem memcg arrays	2017-10-03 17:54:25 -07:00
maccess.c	x86: remove more uaccess_32.h complexity	2016-05-22 17:21:27 -07:00
madvise.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
Makefile	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
memblock.c	mm/memblock.c: reversed logic in memblock_discard()	2017-08-25 16:12:46 -07:00
memcontrol.c	mm: slabinfo: remove CONFIG_SLABINFO	2017-11-15 18:21:01 -08:00
memory_hotplug.c	mm, memory_hotplug: do not fail offlining too early	2017-11-15 18:21:02 -08:00
memory-failure.c	x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages	2017-08-17 10:30:49 +02:00
memory.c	mm: swap: SWP_SYNCHRONOUS_IO: skip swapcache only if swapped page has no other reference	2017-11-15 18:21:02 -08:00
mempolicy.c	mm/mempolicy: fix NUMA_INTERLEAVE_HIT counter	2017-10-13 16:18:32 -07:00
mempool.c	mm/mempool.c: use kmalloc_array_node()	2017-11-15 18:21:02 -08:00
memtest.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
migrate.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
mincore.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
mlock.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
mm_init.c	mm: convert printk(KERN_<LEVEL> to pr_<level>	2016-03-17 15:09:34 -07:00
mmap.c	lib/interval_tree: fast overlap detection	2017-09-08 18:26:49 -07:00
mmu_context.c	sched/headers: Prepare to move the task_lock()/unlock() APIs to <linux/sched/task.h>	2017-03-02 08:42:38 +01:00
mmu_notifier.c	mm/mmu_notifier: kill invalidate_page	2017-08-31 16:13:00 -07:00
mmzone.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
mprotect.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
mremap.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
msync.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
nobootmem.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
nommu.c	Merge branch 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2017-09-14 18:13:32 -07:00
oom_kill.c	mm: oom: show unreclaimable slab info when unreclaimable slabs > user memory	2017-11-15 18:21:01 -08:00
page_alloc.c	mm, page_alloc: fail has_unmovable_pages when seeing reserved pages	2017-11-15 18:21:02 -08:00
page_counter.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
page_ext.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
page_idle.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
page_io.c	mm, swap: skip swapcache for swapin of synchronous device	2017-11-15 18:21:02 -08:00
page_isolation.c	mm: distinguish CMA and MOVABLE isolation in has_unmovable_pages()	2017-11-15 18:21:02 -08:00
page_owner.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
page_poison.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
page_vma_mapped.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
page-writeback.c	mm/page-writeback.c: remove unused parameter from balance_dirty_pages()	2017-11-15 18:21:02 -08:00
pagewalk.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
percpu-internal.h	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
percpu-km.c	percpu: replace area map allocator with bitmap	2017-07-26 17:41:05 -04:00
percpu-stats.c	percpu: fix starting offset for chunk statistics traversal	2017-09-27 14:45:57 -07:00
percpu-vm.c	percpu: fix static checker warnings in pcpu_destroy_chunk	2017-06-29 11:23:38 -04:00
percpu.c	mm, percpu: add support for __GFP_NOWARN flag	2017-10-19 13:13:49 +01:00
pgtable-generic.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
process_vm_access.c	sched/headers: Prepare for new header dependencies before moving code to <linux/sched/mm.h>	2017-03-02 08:42:28 +01:00
quicklist.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
readahead.c	mm: don't cap request size based on read-ahead setting	2016-12-12 18:55:08 -08:00
rmap.c	lib/interval_tree: fast overlap detection	2017-09-08 18:26:49 -07:00
rodata_test.c	mm: fix RODATA_TEST failure "rodata_test: test data was not read only"	2017-10-03 17:54:24 -07:00
shmem.c	mm: treewide: remove GFP_TEMPORARY allocation flag	2017-09-13 18:53:16 -07:00
slab_common.c	slab, slub, slob: add slab_flags_t	2017-11-15 18:21:01 -08:00
slab.c	slab, slub, slob: convert slab_flags_t to 32-bit	2017-11-15 18:21:01 -08:00
slab.h	slab, slub, slob: add slab_flags_t	2017-11-15 18:21:01 -08:00
slob.c	slab, slub, slob: add slab_flags_t	2017-11-15 18:21:01 -08:00
slub.c	slub: fix sysfs duplicate filename creation when slub_debug=O	2017-11-15 18:21:01 -08:00
sparse-vmemmap.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
sparse.c	Merge branch 'x86/mm' into x86/asm, to merge branches	2017-11-10 08:05:30 +01:00
swap_cgroup.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
swap_slots.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
swap_state.c	mm, swap: fix false error message in __swp_swapcount()	2017-11-15 18:21:02 -08:00
swap.c	mm: avoid marking swap cached page as lazyfree	2017-10-03 17:54:24 -07:00
swapfile.c	mm: swap: SWP_SYNCHRONOUS_IO: skip swapcache only if swapped page has no other reference	2017-11-15 18:21:02 -08:00
truncate.c	mm/truncate.c: fix THP handling in invalidate_mapping_pages()	2017-07-10 16:32:32 -07:00
usercopy.c	mm/usercopy: Drop extra is_vmalloc_or_module() check	2017-04-05 12:30:18 -07:00
userfaultfd.c	userfaultfd: shmem: wire up shmem_mfill_zeropage_pte	2017-09-06 17:27:28 -07:00
util.c	mm: rename global_page_state to global_zone_page_state	2017-09-06 17:27:29 -07:00
vmacache.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
vmalloc.c	Revert "vmalloc: back off when the current task is killed"	2017-10-13 16:18:32 -07:00
vmpressure.c	mm, vmpressure: pass-through notification support	2017-07-10 16:32:31 -07:00
vmscan.c	Merge branch 'for-4.15/block' of git://git.kernel.dk/linux-block	2017-11-14 15:32:19 -08:00
vmstat.c	mm: consider the number in local CPUs when reading NUMA stats	2017-09-08 18:26:47 -07:00
workingset.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
z3fold.c	z3fold: fix stale list handling	2017-10-03 17:54:24 -07:00
zbud.c	mm/zbud.c: use list_last_entry() instead of list_tail_entry()	2016-01-15 11:40:52 -08:00
zpool.c	mm: zsmalloc: constify struct zs_pool name	2015-11-06 17:50:42 -08:00
zsmalloc.c	mm/zsmalloc.c: change stat type parameter to int	2017-09-08 18:26:47 -07:00
zswap.c	mm/zswap.c: delete an error message for a failed memory allocation in zswap_dstmem_prepare()	2017-07-06 16:24:35 -07:00