linux/mm
Nishanth Aravamudan f5bf18fa22 bootmem/sparsemem: remove limit constraint in alloc_bootmem_section
While testing AMS (Active Memory Sharing) / CMO (Cooperative Memory
Overcommit) on powerpc, we tripped the following:

  kernel BUG at mm/bootmem.c:483!
  cpu 0x0: Vector: 700 (Program Check) at [c000000000c03940]
      pc: c000000000a62bd8: .alloc_bootmem_core+0x90/0x39c
      lr: c000000000a64bcc: .sparse_early_usemaps_alloc_node+0x84/0x29c
      sp: c000000000c03bc0
     msr: 8000000000021032
    current = 0xc000000000b0cce0
    paca    = 0xc000000001d80000
      pid   = 0, comm = swapper
  kernel BUG at mm/bootmem.c:483!
  enter ? for help
  [c000000000c03c80] c000000000a64bcc
  .sparse_early_usemaps_alloc_node+0x84/0x29c
  [c000000000c03d50] c000000000a64f10 .sparse_init+0x12c/0x28c
  [c000000000c03e20] c000000000a474f4 .setup_arch+0x20c/0x294
  [c000000000c03ee0] c000000000a4079c .start_kernel+0xb4/0x460
  [c000000000c03f90] c000000000009670 .start_here_common+0x1c/0x2c

This is

        BUG_ON(limit && goal + size > limit);

and after some debugging, it seems that

	goal = 0x7ffff000000
	limit = 0x80000000000

and sparse_early_usemaps_alloc_node ->
sparse_early_usemaps_alloc_pgdat_section calls

	return alloc_bootmem_section(usemap_size() * count, section_nr);

This is on a system with 8TB available via the AMS pool, and as a quirk
of AMS in firmware, all of that memory shows up in node 0.  So, we end
up with an allocation that will fail the goal/limit constraints.

In theory, we could "fall-back" to alloc_bootmem_node() in
sparse_early_usemaps_alloc_node(), but since we actually have HOTREMOVE
defined, we'll BUG_ON() instead.  A simple solution appears to be to
unconditionally remove the limit condition in alloc_bootmem_section,
meaning allocations are allowed to cross section boundaries (necessary
for systems of this size).

Johannes Weiner pointed out that if alloc_bootmem_section() no longer
guarantees section-locality, we need check_usemap_section_nr() to print
possible cross-dependencies between node descriptors and the usemaps
allocated through it.  That makes the two loops in
sparse_early_usemaps_alloc_node() identical, so re-factor the code a
bit.

[akpm@linux-foundation.org: code simplification]
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Anton Blanchard <anton@au1.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ben Herrenschmidt <benh@kernel.crashing.org>
Cc: Robert Jennings <rcj@linux.vnet.ibm.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: <stable@vger.kernel.org>	[3.3.1]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-21 17:54:58 -07:00
..
backing-dev.c backing-dev: fix wakeup timer races with bdi_unregister() 2012-02-01 16:52:49 +08:00
bootmem.c bootmem/sparsemem: remove limit constraint in alloc_bootmem_section 2012-03-21 17:54:58 -07:00
bounce.c mm: remove the second argument of k[un]map_atomic() 2012-03-20 21:48:27 +08:00
cleancache.c
compaction.c mm: compaction: make compact_control order signed 2012-03-21 17:54:56 -07:00
debug-pagealloc.c
dmapool.c
fadvise.c
failslab.c
filemap_xip.c mm/filemap_xip.c: fix race condition in xip_file_fault() 2012-02-03 16:16:41 -08:00
filemap.c mm: remove the second argument of k[un]map_atomic() 2012-03-20 21:48:27 +08:00
fremap.c
highmem.c
huge_memory.c thp: optimize away unnecessary page table locking 2012-03-21 17:54:57 -07:00
hugetlb.c mm: hugetlb: bail out unmapping after serving reference page 2012-03-21 17:54:57 -07:00
hwpoison-inject.c
init-mm.c
internal.h
Kconfig
Kconfig.debug
kmemcheck.c
kmemleak-test.c
kmemleak.c kmemleak: Disable early logging when kmemleak is off by default 2012-01-20 16:57:05 +00:00
ksm.c mm: remove the second argument of k[un]map_atomic() 2012-03-20 21:48:27 +08:00
maccess.c
madvise.c
Makefile
memblock.c memblock: Fix size aligning of memblock_alloc_base_nid() 2012-03-01 10:53:18 +01:00
memcontrol.c mm: thp: fix pmd_bad() triggering in code paths holding mmap_sem read mode 2012-03-21 17:54:54 -07:00
memory_hotplug.c mm: compaction: introduce sync-light migration for use by compaction 2012-01-12 20:13:09 -08:00
memory-failure.c thp: allow a hwpoisoned head page to be put back to LRU 2012-03-21 17:54:58 -07:00
memory.c mm: make get_mm_counter static-inline 2012-03-21 17:54:55 -07:00
mempolicy.c mm: fix move/migrate_pages() race on task struct 2012-03-21 17:54:58 -07:00
mempool.c mempool: fix first round failure behavior 2012-01-10 16:30:45 -08:00
migrate.c mm: fix move/migrate_pages() race on task struct 2012-03-21 17:54:58 -07:00
mincore.c mm: thp: fix pmd_bad() triggering in code paths holding mmap_sem read mode 2012-03-21 17:54:54 -07:00
mlock.c vm: avoid using find_vma_prev() unnecessarily 2012-03-06 18:23:36 -08:00
mm_init.c
mmap.c mm: search from free_area_cache for the bigger size 2012-03-21 17:54:56 -07:00
mmu_context.c
mmu_notifier.c
mmzone.c
mprotect.c mm: replace PAGE_MIGRATION with IS_ENABLED(CONFIG_MIGRATION) 2012-03-21 17:54:57 -07:00
mremap.c
msync.c
nobootmem.c
nommu.c NOMMU: Don't need to clear vm_mm when deleting a VMA 2012-02-24 08:59:04 -08:00
oom_kill.c mm, oom: force oom kill on sysrq+f 2012-03-21 17:54:58 -07:00
page_alloc.c mm: drain percpu lru add/rotate page-vectors on cpu hot-unplug 2012-03-21 17:54:58 -07:00
page_cgroup.c page_cgroup: fix horrid swap accounting regression 2012-03-06 08:18:23 -08:00
page_io.c
page_isolation.c
page-writeback.c Merge branch 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux 2012-01-10 16:59:59 -08:00
pagewalk.c mm: thp: fix pmd_bad() triggering in code paths holding mmap_sem read mode 2012-03-21 17:54:54 -07:00
percpu-km.c
percpu-vm.c percpu: use bitmap_clear 2012-01-20 09:23:16 -08:00
percpu.c Kmemleak patches 2012-01-14 18:11:11 -08:00
pgtable-generic.c
prio_tree.c
process_vm_access.c Fix race in process_vm_rw_core 2012-02-02 12:55:17 -08:00
quicklist.c
readahead.c
rmap.c rmap: anon_vma_prepare: Reduce code duplication by calling anon_vma_chain_link 2012-03-21 17:54:57 -07:00
shmem.c tmpfs: security xattr setting on inode creation 2012-03-21 17:54:58 -07:00
slab.c Merge branch 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux 2012-01-11 18:52:23 -08:00
slob.c
slub.c mm,x86,um: move CMPXCHG_DOUBLE config option 2012-01-12 20:13:03 -08:00
sparse-vmemmap.c
sparse.c bootmem/sparsemem: remove limit constraint in alloc_bootmem_section 2012-03-21 17:54:58 -07:00
swap_state.c mm: make swapin readahead skip over holes 2012-03-21 17:54:56 -07:00
swap.c mm: drain percpu lru add/rotate page-vectors on cpu hot-unplug 2012-03-21 17:54:58 -07:00
swapfile.c mm: make swapin readahead skip over holes 2012-03-21 17:54:56 -07:00
thrash.c
truncate.c mm: fix comment typo of truncate_inode_pages_range 2012-02-23 11:52:19 +01:00
util.c procfs: mark thread stack correctly in proc/<pid>/maps 2012-03-21 17:54:58 -07:00
vmalloc.c mm: remove the second argument of k[un]map_atomic() 2012-03-20 21:48:27 +08:00
vmscan.c vmscan: handle isolated pages with lru lock released 2012-03-21 17:54:57 -07:00
vmstat.c mm,x86,um: move CMPXCHG_LOCAL config option 2012-01-12 20:13:03 -08:00