Badari Pulavarty f6b3ec238d [PATCH] madvise(MADV_REMOVE): remove pages from tmpfs shm backing store
Here is the patch to implement madvise(MADV_REMOVE) - which frees up a
given range of pages & its associated backing store.  Current
implementation supports only shmfs/tmpfs and other filesystems return
-ENOSYS.

"Some app allocates large tmpfs files, then when some task quits and some
client disconnect, some memory can be released.  However the only way to
release tmpfs-swap is to MADV_REMOVE". - Andrea Arcangeli

Databases want to use this feature to drop a section of their bufferpool
(shared memory segments) - without writing back to disk/swap space.

This feature is also useful for supporting hot-plug memory on UML.

Concerns raised by Andrew Morton:

- "We have no plan for holepunching!  If we _do_ have such a plan (or
  might in the future) then what would the API look like?  I think
  sys_holepunch(fd, start, len), so we should start out with that."

- Using madvise is very weird, because people will ask "why do I need to
  mmap my file before I can stick a hole in it?"

- None of the other madvise operations call into the filesystem in this
  manner.  A broad question is: is this capability an MM operation or a
  filesytem operation?  truncate, for example, is a filesystem operation
  which sometimes has MM side-effects.  madvise is an mm operation and with
  this patch, it gains FS side-effects, only they're really, really
  significant ones."

Comments:

- Andrea suggested the fs operation too but then it's more efficient to
  have it as a mm operation with fs side effects, because they don't
  immediatly know fd and physical offset of the range.  It's possible to
  fixup in userland and to use the fs operation but it's more expensive,
  the vmas are already in the kernel and we can use them.

Short term plan &  Future Direction:

- We seem to need this interface only for shmfs/tmpfs files in the short
  term.  We have to add hooks into the filesystem for correctness and
  completeness.  This is what this patch does.

- In the future, plan is to support both fs and mmap apis also.  This
  also involves (other) filesystem specific functions to be implemented.

- Current patch doesn't support VM_NONLINEAR - which can be addressed in
  the future.

Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Andrea Arcangeli <andrea@suse.de>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:33:22 -08:00

82 lines
2.9 KiB
C

/*
* include/asm-xtensa/mman.h
*
* Xtensa Processor memory-manager definitions
*
* This file is subject to the terms and conditions of the GNU General Public
* License. See the file "COPYING" in the main directory of this archive
* for more details.
*
* Copyright (C) 1995 by Ralf Baechle
* Copyright (C) 2001 - 2005 Tensilica Inc.
*/
#ifndef _XTENSA_MMAN_H
#define _XTENSA_MMAN_H
/*
* Protections are chosen from these bits, OR'd together. The
* implementation does not necessarily support PROT_EXEC or PROT_WRITE
* without PROT_READ. The only guarantees are that no writing will be
* allowed without PROT_WRITE and no access will be allowed for PROT_NONE.
*/
#define PROT_NONE 0x0 /* page can not be accessed */
#define PROT_READ 0x1 /* page can be read */
#define PROT_WRITE 0x2 /* page can be written */
#define PROT_EXEC 0x4 /* page can be executed */
#define PROT_SEM 0x10 /* page may be used for atomic ops */
#define PROT_GROWSDOWN 0x01000000 /* mprotect flag: extend change to start of growsdown vma */
#define PROT_GROWSUP 0x02000000 /* mprotect flag: extend change to end fo growsup vma */
/*
* Flags for mmap
*/
#define MAP_SHARED 0x001 /* Share changes */
#define MAP_PRIVATE 0x002 /* Changes are private */
#define MAP_TYPE 0x00f /* Mask for type of mapping */
#define MAP_FIXED 0x010 /* Interpret addr exactly */
/* not used by linux, but here to make sure we don't clash with ABI defines */
#define MAP_RENAME 0x020 /* Assign page to file */
#define MAP_AUTOGROW 0x040 /* File may grow by writing */
#define MAP_LOCAL 0x080 /* Copy on fork/sproc */
#define MAP_AUTORSRV 0x100 /* Logical swap reserved on demand */
/* These are linux-specific */
#define MAP_NORESERVE 0x0400 /* don't check for reservations */
#define MAP_ANONYMOUS 0x0800 /* don't use a file */
#define MAP_GROWSDOWN 0x1000 /* stack-like segment */
#define MAP_DENYWRITE 0x2000 /* ETXTBSY */
#define MAP_EXECUTABLE 0x4000 /* mark it as an executable */
#define MAP_LOCKED 0x8000 /* pages are locked */
#define MAP_POPULATE 0x10000 /* populate (prefault) pagetables */
#define MAP_NONBLOCK 0x20000 /* do not block on IO */
/*
* Flags for msync
*/
#define MS_ASYNC 0x0001 /* sync memory asynchronously */
#define MS_INVALIDATE 0x0002 /* invalidate mappings & caches */
#define MS_SYNC 0x0004 /* synchronous memory sync */
/*
* Flags for mlockall
*/
#define MCL_CURRENT 1 /* lock all current mappings */
#define MCL_FUTURE 2 /* lock all future mappings */
#define MADV_NORMAL 0x0 /* default page-in behavior */
#define MADV_RANDOM 0x1 /* page-in minimum required */
#define MADV_SEQUENTIAL 0x2 /* read-ahead aggressively */
#define MADV_WILLNEED 0x3 /* pre-fault pages */
#define MADV_DONTNEED 0x4 /* discard these pages */
#define MADV_REMOVE 0x5 /* remove these pages & resources */
/* compatibility flags */
#define MAP_ANON MAP_ANONYMOUS
#define MAP_FILE 0
#endif /* _XTENSA_MMAN_H */