2014-10-09 15:28:37 -07:00
|
|
|
/*
|
|
|
|
* mm/debug.c
|
|
|
|
*
|
|
|
|
* mm/ specific debug routines.
|
|
|
|
*
|
|
|
|
*/
|
|
|
|
|
2014-10-09 15:28:34 -07:00
|
|
|
#include <linux/kernel.h>
|
|
|
|
#include <linux/mm.h>
|
2015-04-29 14:36:05 -04:00
|
|
|
#include <linux/trace_events.h>
|
2014-10-09 15:28:34 -07:00
|
|
|
#include <linux/memcontrol.h>
|
mm, tracing: unify mm flags handling in tracepoints and printk
In tracepoints, it's possible to print gfp flags in a human-friendly
format through a macro show_gfp_flags(), which defines a translation
array and passes is to __print_flags(). Since the following patch will
introduce support for gfp flags printing in printk(), it would be nice
to reuse the array. This is not straightforward, since __print_flags()
can't simply reference an array defined in a .c file such as mm/debug.c
- it has to be a macro to allow the macro magic to communicate the
format to userspace tools such as trace-cmd.
The solution is to create a macro __def_gfpflag_names which is used both
in show_gfp_flags(), and to define the gfpflag_names[] array in
mm/debug.c.
On the other hand, mm/debug.c also defines translation tables for page
flags and vma flags, and desire was expressed (but not implemented in
this series) to use these also from tracepoints. Thus, this patch also
renames the events/gfpflags.h file to events/mmflags.h and moves the
table definitions there, using the same macro approach as for gfpflags.
This allows translating all three kinds of mm-specific flags both in
tracepoints and printk.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Michal Hocko <mhocko@suse.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 14:55:52 -07:00
|
|
|
#include <trace/events/mmflags.h>
|
2016-03-15 14:56:18 -07:00
|
|
|
#include <linux/migrate.h>
|
2016-03-15 14:56:21 -07:00
|
|
|
#include <linux/page_owner.h>
|
2014-10-09 15:28:34 -07:00
|
|
|
|
mm, printk: introduce new format string for flags
In mm we use several kinds of flags bitfields that are sometimes printed
for debugging purposes, or exported to userspace via sysfs. To make
them easier to interpret independently on kernel version and config, we
want to dump also the symbolic flag names. So far this has been done
with repeated calls to pr_cont(), which is unreliable on SMP, and not
usable for e.g. sysfs export.
To get a more reliable and universal solution, this patch extends
printk() format string for pointers to handle the page flags (%pGp),
gfp_flags (%pGg) and vma flags (%pGv). Existing users of
dump_flag_names() are converted and simplified.
It would be possible to pass flags by value instead of pointer, but the
%p format string for pointers already has extensions for various kernel
structures, so it's a good fit, and the extra indirection in a
non-critical path is negligible.
[linux@rasmusvillemoes.dk: lots of good implementation suggestions]
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 14:55:56 -07:00
|
|
|
#include "internal.h"
|
|
|
|
|
2016-03-15 14:56:18 -07:00
|
|
|
char *migrate_reason_names[MR_TYPES] = {
|
|
|
|
"compaction",
|
|
|
|
"memory_failure",
|
|
|
|
"memory_hotplug",
|
|
|
|
"syscall_or_cpuset",
|
|
|
|
"mempolicy_mbind",
|
|
|
|
"numa_misplaced",
|
|
|
|
"cma",
|
|
|
|
};
|
|
|
|
|
mm, printk: introduce new format string for flags
In mm we use several kinds of flags bitfields that are sometimes printed
for debugging purposes, or exported to userspace via sysfs. To make
them easier to interpret independently on kernel version and config, we
want to dump also the symbolic flag names. So far this has been done
with repeated calls to pr_cont(), which is unreliable on SMP, and not
usable for e.g. sysfs export.
To get a more reliable and universal solution, this patch extends
printk() format string for pointers to handle the page flags (%pGp),
gfp_flags (%pGg) and vma flags (%pGv). Existing users of
dump_flag_names() are converted and simplified.
It would be possible to pass flags by value instead of pointer, but the
%p format string for pointers already has extensions for various kernel
structures, so it's a good fit, and the extra indirection in a
non-critical path is negligible.
[linux@rasmusvillemoes.dk: lots of good implementation suggestions]
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 14:55:56 -07:00
|
|
|
const struct trace_print_flags pageflag_names[] = {
|
|
|
|
__def_pageflag_names,
|
|
|
|
{0, NULL}
|
|
|
|
};
|
|
|
|
|
|
|
|
const struct trace_print_flags gfpflag_names[] = {
|
|
|
|
__def_gfpflag_names,
|
|
|
|
{0, NULL}
|
mm, tracing: unify mm flags handling in tracepoints and printk
In tracepoints, it's possible to print gfp flags in a human-friendly
format through a macro show_gfp_flags(), which defines a translation
array and passes is to __print_flags(). Since the following patch will
introduce support for gfp flags printing in printk(), it would be nice
to reuse the array. This is not straightforward, since __print_flags()
can't simply reference an array defined in a .c file such as mm/debug.c
- it has to be a macro to allow the macro magic to communicate the
format to userspace tools such as trace-cmd.
The solution is to create a macro __def_gfpflag_names which is used both
in show_gfp_flags(), and to define the gfpflag_names[] array in
mm/debug.c.
On the other hand, mm/debug.c also defines translation tables for page
flags and vma flags, and desire was expressed (but not implemented in
this series) to use these also from tracepoints. Thus, this patch also
renames the events/gfpflags.h file to events/mmflags.h and moves the
table definitions there, using the same macro approach as for gfpflags.
This allows translating all three kinds of mm-specific flags both in
tracepoints and printk.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Michal Hocko <mhocko@suse.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 14:55:52 -07:00
|
|
|
};
|
|
|
|
|
mm, printk: introduce new format string for flags
In mm we use several kinds of flags bitfields that are sometimes printed
for debugging purposes, or exported to userspace via sysfs. To make
them easier to interpret independently on kernel version and config, we
want to dump also the symbolic flag names. So far this has been done
with repeated calls to pr_cont(), which is unreliable on SMP, and not
usable for e.g. sysfs export.
To get a more reliable and universal solution, this patch extends
printk() format string for pointers to handle the page flags (%pGp),
gfp_flags (%pGg) and vma flags (%pGv). Existing users of
dump_flag_names() are converted and simplified.
It would be possible to pass flags by value instead of pointer, but the
%p format string for pointers already has extensions for various kernel
structures, so it's a good fit, and the extra indirection in a
non-critical path is negligible.
[linux@rasmusvillemoes.dk: lots of good implementation suggestions]
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 14:55:56 -07:00
|
|
|
const struct trace_print_flags vmaflag_names[] = {
|
|
|
|
__def_vmaflag_names,
|
|
|
|
{0, NULL}
|
2014-10-09 15:28:34 -07:00
|
|
|
};
|
|
|
|
|
2016-03-15 14:56:24 -07:00
|
|
|
void __dump_page(struct page *page, const char *reason)
|
2014-10-09 15:28:34 -07:00
|
|
|
{
|
2016-10-07 17:01:40 -07:00
|
|
|
/*
|
|
|
|
* Avoid VM_BUG_ON() in page_mapcount().
|
|
|
|
* page->_mapcount space in struct page is used by sl[aou]b pages to
|
|
|
|
* encode own info.
|
|
|
|
*/
|
2016-09-19 14:44:07 -07:00
|
|
|
int mapcount = PageSlab(page) ? 0 : page_mapcount(page);
|
|
|
|
|
2016-01-15 16:53:42 -08:00
|
|
|
pr_emerg("page:%p count:%d mapcount:%d mapping:%p index:%#lx",
|
2016-09-19 14:44:07 -07:00
|
|
|
page, page_ref_count(page), mapcount,
|
|
|
|
page->mapping, page_to_pgoff(page));
|
2016-01-15 16:53:42 -08:00
|
|
|
if (PageCompound(page))
|
|
|
|
pr_cont(" compound_mapcount: %d", compound_mapcount(page));
|
|
|
|
pr_cont("\n");
|
mm, printk: introduce new format string for flags
In mm we use several kinds of flags bitfields that are sometimes printed
for debugging purposes, or exported to userspace via sysfs. To make
them easier to interpret independently on kernel version and config, we
want to dump also the symbolic flag names. So far this has been done
with repeated calls to pr_cont(), which is unreliable on SMP, and not
usable for e.g. sysfs export.
To get a more reliable and universal solution, this patch extends
printk() format string for pointers to handle the page flags (%pGp),
gfp_flags (%pGg) and vma flags (%pGv). Existing users of
dump_flag_names() are converted and simplified.
It would be possible to pass flags by value instead of pointer, but the
%p format string for pointers already has extensions for various kernel
structures, so it's a good fit, and the extra indirection in a
non-critical path is negligible.
[linux@rasmusvillemoes.dk: lots of good implementation suggestions]
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 14:55:56 -07:00
|
|
|
BUILD_BUG_ON(ARRAY_SIZE(pageflag_names) != __NR_PAGEFLAGS + 1);
|
2016-03-15 14:56:24 -07:00
|
|
|
|
2016-03-15 14:55:59 -07:00
|
|
|
pr_emerg("flags: %#lx(%pGp)\n", page->flags, &page->flags);
|
|
|
|
|
2016-12-12 16:44:35 -08:00
|
|
|
print_hex_dump(KERN_ALERT, "raw: ", DUMP_PREFIX_NONE, 32,
|
|
|
|
sizeof(unsigned long), page,
|
|
|
|
sizeof(struct page), false);
|
|
|
|
|
2014-10-09 15:28:34 -07:00
|
|
|
if (reason)
|
|
|
|
pr_alert("page dumped because: %s\n", reason);
|
2016-03-15 14:55:59 -07:00
|
|
|
|
2014-12-10 15:44:58 -08:00
|
|
|
#ifdef CONFIG_MEMCG
|
|
|
|
if (page->mem_cgroup)
|
|
|
|
pr_alert("page->mem_cgroup:%p\n", page->mem_cgroup);
|
|
|
|
#endif
|
2014-10-09 15:28:34 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
void dump_page(struct page *page, const char *reason)
|
|
|
|
{
|
2016-03-15 14:56:24 -07:00
|
|
|
__dump_page(page, reason);
|
2016-03-15 14:56:21 -07:00
|
|
|
dump_page_owner(page);
|
2014-10-09 15:28:34 -07:00
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(dump_page);
|
|
|
|
|
|
|
|
#ifdef CONFIG_DEBUG_VM
|
|
|
|
|
|
|
|
void dump_vma(const struct vm_area_struct *vma)
|
|
|
|
{
|
2014-10-09 15:28:41 -07:00
|
|
|
pr_emerg("vma %p start %p end %p\n"
|
2014-10-09 15:28:34 -07:00
|
|
|
"next %p prev %p mm %p\n"
|
|
|
|
"prot %lx anon_vma %p vm_ops %p\n"
|
2016-03-15 14:55:59 -07:00
|
|
|
"pgoff %lx file %p private_data %p\n"
|
|
|
|
"flags: %#lx(%pGv)\n",
|
2014-10-09 15:28:34 -07:00
|
|
|
vma, (void *)vma->vm_start, (void *)vma->vm_end, vma->vm_next,
|
|
|
|
vma->vm_prev, vma->vm_mm,
|
|
|
|
(unsigned long)pgprot_val(vma->vm_page_prot),
|
|
|
|
vma->anon_vma, vma->vm_ops, vma->vm_pgoff,
|
2016-03-15 14:55:59 -07:00
|
|
|
vma->vm_file, vma->vm_private_data,
|
|
|
|
vma->vm_flags, &vma->vm_flags);
|
2014-10-09 15:28:34 -07:00
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(dump_vma);
|
|
|
|
|
2014-10-09 15:28:37 -07:00
|
|
|
void dump_mm(const struct mm_struct *mm)
|
|
|
|
{
|
2014-10-09 15:28:41 -07:00
|
|
|
pr_emerg("mm %p mmap %p seqnum %d task_size %lu\n"
|
2014-10-09 15:28:37 -07:00
|
|
|
#ifdef CONFIG_MMU
|
|
|
|
"get_unmapped_area %p\n"
|
|
|
|
#endif
|
|
|
|
"mmap_base %lu mmap_legacy_base %lu highest_vm_end %lu\n"
|
mm: account pmd page tables to the process
Dave noticed that unprivileged process can allocate significant amount of
memory -- >500 MiB on x86_64 -- and stay unnoticed by oom-killer and
memory cgroup. The trick is to allocate a lot of PMD page tables. Linux
kernel doesn't account PMD tables to the process, only PTE.
The use-cases below use few tricks to allocate a lot of PMD page tables
while keeping VmRSS and VmPTE low. oom_score for the process will be 0.
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/prctl.h>
#define PUD_SIZE (1UL << 30)
#define PMD_SIZE (1UL << 21)
#define NR_PUD 130000
int main(void)
{
char *addr = NULL;
unsigned long i;
prctl(PR_SET_THP_DISABLE);
for (i = 0; i < NR_PUD ; i++) {
addr = mmap(addr + PUD_SIZE, PUD_SIZE, PROT_WRITE|PROT_READ,
MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
if (addr == MAP_FAILED) {
perror("mmap");
break;
}
*addr = 'x';
munmap(addr, PMD_SIZE);
mmap(addr, PMD_SIZE, PROT_WRITE|PROT_READ,
MAP_ANONYMOUS|MAP_PRIVATE|MAP_FIXED, -1, 0);
if (addr == MAP_FAILED)
perror("re-mmap"), exit(1);
}
printf("PID %d consumed %lu KiB in PMD page tables\n",
getpid(), i * 4096 >> 10);
return pause();
}
The patch addresses the issue by account PMD tables to the process the
same way we account PTE.
The main place where PMD tables is accounted is __pmd_alloc() and
free_pmd_range(). But there're few corner cases:
- HugeTLB can share PMD page tables. The patch handles by accounting
the table to all processes who share it.
- x86 PAE pre-allocates few PMD tables on fork.
- Architectures with FIRST_USER_ADDRESS > 0. We need to adjust sanity
check on exit(2).
Accounting only happens on configuration where PMD page table's level is
present (PMD is not folded). As with nr_ptes we use per-mm counter. The
counter value is used to calculate baseline for badness score by
oom-killer.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: David Rientjes <rientjes@google.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11 15:26:50 -08:00
|
|
|
"pgd %p mm_users %d mm_count %d nr_ptes %lu nr_pmds %lu map_count %d\n"
|
2014-10-09 15:28:37 -07:00
|
|
|
"hiwater_rss %lx hiwater_vm %lx total_vm %lx locked_vm %lx\n"
|
2016-01-14 15:22:07 -08:00
|
|
|
"pinned_vm %lx data_vm %lx exec_vm %lx stack_vm %lx\n"
|
2014-10-09 15:28:37 -07:00
|
|
|
"start_code %lx end_code %lx start_data %lx end_data %lx\n"
|
|
|
|
"start_brk %lx brk %lx start_stack %lx\n"
|
|
|
|
"arg_start %lx arg_end %lx env_start %lx env_end %lx\n"
|
|
|
|
"binfmt %p flags %lx core_state %p\n"
|
|
|
|
#ifdef CONFIG_AIO
|
|
|
|
"ioctx_table %p\n"
|
|
|
|
#endif
|
|
|
|
#ifdef CONFIG_MEMCG
|
|
|
|
"owner %p "
|
|
|
|
#endif
|
|
|
|
"exe_file %p\n"
|
|
|
|
#ifdef CONFIG_MMU_NOTIFIER
|
|
|
|
"mmu_notifier_mm %p\n"
|
|
|
|
#endif
|
|
|
|
#ifdef CONFIG_NUMA_BALANCING
|
|
|
|
"numa_next_scan %lu numa_scan_offset %lu numa_scan_seq %d\n"
|
|
|
|
#endif
|
|
|
|
#if defined(CONFIG_NUMA_BALANCING) || defined(CONFIG_COMPACTION)
|
|
|
|
"tlb_flush_pending %d\n"
|
|
|
|
#endif
|
2016-03-15 14:55:59 -07:00
|
|
|
"def_flags: %#lx(%pGv)\n",
|
2014-10-09 15:28:37 -07:00
|
|
|
|
|
|
|
mm, mm->mmap, mm->vmacache_seqnum, mm->task_size,
|
|
|
|
#ifdef CONFIG_MMU
|
|
|
|
mm->get_unmapped_area,
|
|
|
|
#endif
|
|
|
|
mm->mmap_base, mm->mmap_legacy_base, mm->highest_vm_end,
|
|
|
|
mm->pgd, atomic_read(&mm->mm_users),
|
|
|
|
atomic_read(&mm->mm_count),
|
|
|
|
atomic_long_read((atomic_long_t *)&mm->nr_ptes),
|
mm: account pmd page tables to the process
Dave noticed that unprivileged process can allocate significant amount of
memory -- >500 MiB on x86_64 -- and stay unnoticed by oom-killer and
memory cgroup. The trick is to allocate a lot of PMD page tables. Linux
kernel doesn't account PMD tables to the process, only PTE.
The use-cases below use few tricks to allocate a lot of PMD page tables
while keeping VmRSS and VmPTE low. oom_score for the process will be 0.
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/prctl.h>
#define PUD_SIZE (1UL << 30)
#define PMD_SIZE (1UL << 21)
#define NR_PUD 130000
int main(void)
{
char *addr = NULL;
unsigned long i;
prctl(PR_SET_THP_DISABLE);
for (i = 0; i < NR_PUD ; i++) {
addr = mmap(addr + PUD_SIZE, PUD_SIZE, PROT_WRITE|PROT_READ,
MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
if (addr == MAP_FAILED) {
perror("mmap");
break;
}
*addr = 'x';
munmap(addr, PMD_SIZE);
mmap(addr, PMD_SIZE, PROT_WRITE|PROT_READ,
MAP_ANONYMOUS|MAP_PRIVATE|MAP_FIXED, -1, 0);
if (addr == MAP_FAILED)
perror("re-mmap"), exit(1);
}
printf("PID %d consumed %lu KiB in PMD page tables\n",
getpid(), i * 4096 >> 10);
return pause();
}
The patch addresses the issue by account PMD tables to the process the
same way we account PTE.
The main place where PMD tables is accounted is __pmd_alloc() and
free_pmd_range(). But there're few corner cases:
- HugeTLB can share PMD page tables. The patch handles by accounting
the table to all processes who share it.
- x86 PAE pre-allocates few PMD tables on fork.
- Architectures with FIRST_USER_ADDRESS > 0. We need to adjust sanity
check on exit(2).
Accounting only happens on configuration where PMD page table's level is
present (PMD is not folded). As with nr_ptes we use per-mm counter. The
counter value is used to calculate baseline for badness score by
oom-killer.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: David Rientjes <rientjes@google.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11 15:26:50 -08:00
|
|
|
mm_nr_pmds((struct mm_struct *)mm),
|
2014-10-09 15:28:37 -07:00
|
|
|
mm->map_count,
|
|
|
|
mm->hiwater_rss, mm->hiwater_vm, mm->total_vm, mm->locked_vm,
|
2016-01-14 15:22:07 -08:00
|
|
|
mm->pinned_vm, mm->data_vm, mm->exec_vm, mm->stack_vm,
|
2014-10-09 15:28:37 -07:00
|
|
|
mm->start_code, mm->end_code, mm->start_data, mm->end_data,
|
|
|
|
mm->start_brk, mm->brk, mm->start_stack,
|
|
|
|
mm->arg_start, mm->arg_end, mm->env_start, mm->env_end,
|
|
|
|
mm->binfmt, mm->flags, mm->core_state,
|
|
|
|
#ifdef CONFIG_AIO
|
|
|
|
mm->ioctx_table,
|
|
|
|
#endif
|
|
|
|
#ifdef CONFIG_MEMCG
|
|
|
|
mm->owner,
|
|
|
|
#endif
|
|
|
|
mm->exe_file,
|
|
|
|
#ifdef CONFIG_MMU_NOTIFIER
|
|
|
|
mm->mmu_notifier_mm,
|
|
|
|
#endif
|
|
|
|
#ifdef CONFIG_NUMA_BALANCING
|
|
|
|
mm->numa_next_scan, mm->numa_scan_offset, mm->numa_scan_seq,
|
|
|
|
#endif
|
|
|
|
#if defined(CONFIG_NUMA_BALANCING) || defined(CONFIG_COMPACTION)
|
|
|
|
mm->tlb_flush_pending,
|
|
|
|
#endif
|
2016-03-15 14:55:59 -07:00
|
|
|
mm->def_flags, &mm->def_flags
|
|
|
|
);
|
2014-10-09 15:28:37 -07:00
|
|
|
}
|
|
|
|
|
2014-10-09 15:28:34 -07:00
|
|
|
#endif /* CONFIG_DEBUG_VM */
|