linux/mm
Andy Whitcroft 641c767389 [PATCH] sparsemem swiss cheese numa layouts
The part of the sparsemem patch which modifies memmap_init_zone() has recently
become a problem.  It changes behavior so that there is a call to
pfn_to_page() for each individual page inside of a node's range:
node_start_pfn through node_end_pfn.  It used to simply do this once, at the
beginning of the node, but having sparsemem's non-contiguous mem_map[]s inside
of a node made it necessary to change.

Mike Kravetz recently wrote a patch which made the NUMA code accept some new
kinds of layouts.  The system's memory was laid out like this, with node 0's
memory in two pieces: one before and one after node 1's memory:

	Node 0: +++++     +++++
	Node 1:      +++++

Previous behavior before Mike's patch was to assign nodes like this:

	Node 0: 00000     XXXXX
	Node 1:      11111

Where the 'X' areas were simply thrown away.  The new behavior was to make the
pg_data_t span node 0 across all of its areas, including areas that are really
node 1's: Node 0: 000000000000000 Node 1: 11111

This wastes a little bit of mem_map space, but ends up being OK, and more
fully utilizes the system's memory.  memmap_init_zone() initializes all of the
"struct page"s for node 0, even for the "hole", but those never get used,
because there is no pfn_to_page() that resolves to those pages.  However, only
calling pfn_to_page() once, memmap_init_zone() always uses the pages that were
allocated for node0->node_mem_map because:

	struct page *start = pfn_to_page(start_pfn);
	// effectively start = &node->node_mem_map[0]
	for (page = start; page < (start + size); page++) {
		init_page_here();...
		page++;
	}

Slow, and wasteful, but generally harmless.

But, modify that to call pfn_to_page() for each loop iteration (like sparsemem
does):

	for (pfn = start_pfn; pfn < < (start_pfn + size); pfn++++) {
		page = pfn_to_page(pfn);
	}

And you end up trying to initialize node 1's pages too early, along with bogus
data from node 0.  This patch checks for those weird layouts and declines to
touch the pages, making the more frequent pfn_to_page() calls OK to do.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:05 -07:00
..
bootmem.c [PATCH] sparsemem memory model 2005-06-23 09:45:04 -07:00
fadvise.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
filemap.c [PATCH] broken fault_in_pages_readable call in generic_file_buffered_write() 2005-06-06 14:42:23 -07:00
fremap.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
highmem.c [PATCH] count bounce buffer pages in vmstat 2005-05-01 08:58:37 -07:00
hugetlb.c [PATCH] Hugepage consolidation 2005-06-21 18:46:15 -07:00
internal.h Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
Kconfig [PATCH] sparsemem memory model 2005-06-23 09:45:04 -07:00
madvise.c [PATCH] madvise: merge the maps 2005-06-21 18:46:13 -07:00
Makefile [PATCH] sparsemem memory model 2005-06-23 09:45:04 -07:00
memory.c [PATCH] sparsemem memory model 2005-06-23 09:45:04 -07:00
mempolicy.c [PATCH] mbind: check_range use standard ptwalk 2005-06-21 18:46:19 -07:00
mempool.c [PATCH] use smp_mb/wmb/rmb where possible 2005-05-01 08:58:47 -07:00
mincore.c [PATCH] freepgt: sys_mincore ignore FIRST_USER_PGD_NR 2005-04-19 13:29:20 -07:00
mlock.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
mmap.c [PATCH] mmap topdown fix for large stack limit, large allocation 2005-06-21 18:46:16 -07:00
mprotect.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
mremap.c [PATCH] mm acct accounting fix 2005-05-17 07:59:12 -07:00
msync.c [PATCH] msync: check pte dirty earlier 2005-06-21 18:46:21 -07:00
nommu.c [PATCH] Avoiding mmap fragmentation 2005-06-21 18:46:16 -07:00
oom_kill.c [PATCH] add OOM debug 2005-06-21 18:46:17 -07:00
page_alloc.c [PATCH] sparsemem swiss cheese numa layouts 2005-06-23 09:45:05 -07:00
page_io.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
page-writeback.c [PATCH] DocBook: fix some descriptions 2005-05-01 08:59:26 -07:00
pdflush.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
prio_tree.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
readahead.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
rmap.c [PATCH] can_share_swap_page: use page_mapcount 2005-06-21 18:46:21 -07:00
shmem.c [PATCH] shmem: restore superblock info 2005-06-21 18:46:18 -07:00
slab.c [PATCH] Periodically drain non local pagesets 2005-06-21 18:46:18 -07:00
sparse.c [PATCH] sparsemem memory model 2005-06-23 09:45:04 -07:00
swap_state.c [PATCH] mm: use __GFP_NOMEMALLOC 2005-05-01 08:58:37 -07:00
swap.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
swapfile.c [PATCH] can_share_swap_page: use page_mapcount 2005-06-21 18:46:21 -07:00
thrash.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
tiny-shmem.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
truncate.c [PATCH] DocBook: fix some descriptions 2005-05-01 08:59:26 -07:00
vmalloc.c [PATCH] x86_64: Fixed guard page handling again in iounmap 2005-05-20 15:48:20 -07:00
vmscan.c [PATCH] vm: try_to_free_pages unused argument 2005-06-21 18:46:17 -07:00