linux/mm
Paul Jackson bdd804f478 [PATCH] Cpuset: might sleep checking zones allowed fix
Fix a couple of infrequently encountered 'sleeping function called from
invalid context' in the cpuset hooks in __alloc_pages.  Could sleep while
interrupts disabled.

The routine cpuset_zone_allowed() is called by code in mm/page_alloc.c
__alloc_pages() to determine if a zone is allowed in the current tasks
cpuset.  This routine can sleep, for certain GFP_KERNEL allocations, if the
zone is on a memory node not allowed in the current cpuset, but might be
allowed in a parent cpuset.

But we can't sleep in __alloc_pages() if in interrupt, nor if called for a
GFP_ATOMIC request (__GFP_WAIT not set in gfp_flags).

The rule was intended to be:
  Don't call cpuset_zone_allowed() if you can't sleep, unless you
  pass in the __GFP_HARDWALL flag set in gfp_flag, which disables
  the code that might scan up ancestor cpusets and sleep.

This rule was being violated in a couple of places, due to a bogus change
made (by myself, pj) to __alloc_pages() as part of the November 2005 effort
to cleanup its logic, and also due to a later fix to constrain which swap
daemons were awoken.

The bogus change can be seen at:
  http://linux.derkeiler.com/Mailing-Lists/Kernel/2005-11/4691.html
  [PATCH 01/05] mm fix __alloc_pages cpuset ALLOC_* flags

This was first noticed on a tight memory system, in code that was disabling
interrupts and doing allocation requests with __GFP_WAIT not set, which
resulted in __might_sleep() writing complaints to the log "Debug: sleeping
function called ...", when the code in cpuset_zone_allowed() tried to take
the callback_sem cpuset semaphore.

We haven't seen a system hang on this 'might_sleep' yet, but we are at
decent risk of seeing it fairly soon, especially since the additional
cpuset_zone_allowed() check was added, conditioning wakeup_kswapd(), in
March 2006.

Special thanks to Dave Chinner, for figuring this out, and a tip of the hat
to Nick Piggin who warned me of this back in Nov 2005, before I was ready
to listen.

Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-21 12:59:18 -07:00
..
bootmem.c [PATCH] x86_64: Handle empty PXMs that only contain hotplug memory 2006-04-09 11:53:16 -07:00
fadvise.c [PATCH] sys_sync_file_range() 2006-03-31 12:18:54 -08:00
filemap_xip.c [PATCH] replace inode_update_time with file_update_time 2006-01-10 08:01:30 -08:00
filemap.c [PATCH] Add find_get_pages_contig(): contiguous variant of find_get_pages() 2006-04-27 08:59:48 +02:00
filemap.h
fremap.c VM: add common helper function to create the page tables 2005-11-29 14:03:14 -08:00
highmem.c BUG_ON() Conversion in mm/highmem.c 2006-04-02 13:47:35 +02:00
hugetlb.c [PATCH] hugetlb: don't allow free hugetlb count fall below reserved count 2006-03-31 12:18:50 -08:00
internal.h [PATCH] remove set_page_count() outside mm/ 2006-03-22 07:54:02 -08:00
Kconfig [PATCH] mm: make page migration dependent on swap and NUMA 2006-03-25 08:22:50 -08:00
madvise.c [PATCH] Fix MADV_REMOVE protection checking 2006-04-17 18:22:18 -07:00
Makefile [PATCH] uninline zone helpers 2006-03-27 08:44:48 -08:00
memory_hotplug.c [PATCH] spufs: fix for CONFIG_NUMA 2006-05-01 18:17:46 -07:00
memory.c [PATCH] Don't pass boot parameters to argv_init[] 2006-03-31 12:18:53 -08:00
mempolicy.c [PATCH] Remove cond_resched in gather_stats() 2006-04-20 07:54:03 -07:00
mempool.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial 2006-03-26 09:41:18 -08:00
migrate.c [PATCH] page migration: Fix fallback behavior for dirty pages 2006-05-01 18:17:45 -07:00
mincore.c
mlock.c [PATCH] move capable() to capability.h 2006-01-11 18:42:13 -08:00
mmap.c [PATCH] overcommit: use totalreserve_pages 2006-04-11 06:18:32 -07:00
mmzone.c [PATCH] uninline zone helpers 2006-03-27 08:44:48 -08:00
mprotect.c [PATCH] Enable mprotect on huge pages 2006-03-22 07:54:03 -08:00
mremap.c [PATCH] move capable() to capability.h 2006-01-11 18:42:13 -08:00
msync.c The comment describing how MS_ASYNC works in msync.c is confusing 2006-03-24 18:30:53 +01:00
nommu.c [PATCH] overcommit: use totalreserve_pages for nommu 2006-04-11 06:18:32 -07:00
oom_kill.c [PATCH] mm: fix mm_struct reference counting bugs in mm/oom_kill.c 2006-04-19 09:13:50 -07:00
page_alloc.c [PATCH] Cpuset: might sleep checking zones allowed fix 2006-05-21 12:59:18 -07:00
page_io.c
page-writeback.c [PATCH] page-writeback comment fixes 2006-04-11 06:18:46 -07:00
pdflush.c [PATCH] Swap Migration V5: PF_SWAPWRITE to allow writing to swap 2006-01-08 20:12:41 -08:00
prio_tree.c
readahead.c [PATCH] ext3_readdir: use generic readahead 2006-03-23 07:38:09 -08:00
rmap.c [PATCH] mm: more CONFIG_DEBUG_VM 2006-03-22 07:54:02 -08:00
shmem.c [PATCH] add migratepage address space op to shmem 2006-04-22 09:19:52 -07:00
slab.c [PATCH] slab: Fix kmem_cache_destroy() on NUMA 2006-05-16 07:59:32 -07:00
slob.c [PATCH] mm/slob.c: for_each_possible_cpu(), not NR_CPUS 2006-04-19 09:13:49 -07:00
sparse.c [PATCH] SPARSEMEM incorrectly calculates section number 2006-05-21 12:59:17 -07:00
swap_state.c BUG_ON() Conversion in mm/swap_state.c 2006-04-01 01:25:12 +02:00
swap.c [PATCH] for_each_possible_cpu: fixes for generic part 2006-03-28 09:16:05 -08:00
swapfile.c [PATCH] mm: schedule find_trylock_page() removal 2006-03-31 12:18:49 -08:00
thrash.c [PATCH] temporarily disable swap token on memory pressure 2005-11-28 14:42:25 -08:00
tiny-shmem.c [PATCH] do_truncate() call fix in tiny-shmem.c 2006-01-12 09:08:49 -08:00
truncate.c [PATCH] mutex subsystem, semaphore to mutex: VFS, ->i_sem 2006-01-09 15:59:24 -08:00
util.c [PATCH] slab: optimize constant-size kzalloc calls 2006-03-25 08:22:49 -08:00
vmalloc.c BUG_ON() Conversion in mm/vmalloc.c 2006-04-01 01:26:09 +02:00
vmscan.c [PATCH] Remove __devinit and __cpuinit from notifier_call definitions 2006-04-26 08:30:03 -07:00