linux/arch/x86/mm
Youquan Song b6999b1912 thp: add compound tail page _mapcount when mapped
With the 3.2-rc kernel, IOMMU 2M pages in KVM works.  But when I tried
to use IOMMU 1GB pages in KVM, I encountered an oops and the 1GB page
failed to be used.

The root cause is that 1GB page allocation calls gup_huge_pud() while 2M
page calls gup_huge_pmd.  If compound pages are used and the page is a
tail page, gup_huge_pmd() increases _mapcount to record tail page are
mapped while gup_huge_pud does not do that.

So when the mapped page is relesed, it will result in kernel oops
because the page is not marked mapped.

This patch add tail process for compound page in 1GB huge page which
keeps the same process as 2M page.

Reproduce like:
1. Add grub boot option: hugepagesz=1G hugepages=8
2. mount -t hugetlbfs -o pagesize=1G hugetlbfs /dev/hugepages
3. qemu-kvm -m 2048 -hda os-kvm.img -cpu kvm64 -smp 4 -mem-path /dev/hugepages
	-net none -device pci-assign,host=07:00.1

  kernel BUG at mm/swap.c:114!
  invalid opcode: 0000 [#1] SMP
  Call Trace:
    put_page+0x15/0x37
    kvm_release_pfn_clean+0x31/0x36
    kvm_iommu_put_pages+0x94/0xb1
    kvm_iommu_unmap_memslots+0x80/0xb6
    kvm_assign_device+0xba/0x117
    kvm_vm_ioctl_assigned_device+0x301/0xa47
    kvm_vm_ioctl+0x36c/0x3a2
    do_vfs_ioctl+0x49e/0x4e4
    sys_ioctl+0x5a/0x7c
    system_call_fastpath+0x16/0x1b
  RIP  put_compound_page+0xd4/0x168

Signed-off-by: Youquan Song <youquan.song@intel.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-09 07:50:28 -08:00
..
kmemcheck x86: Swap save_stack_trace_regs parameters 2011-06-14 22:48:51 -04:00
amdtopology.c x86, NUMA: Enable CONFIG_AMD_NUMA on 32bit too 2011-05-02 17:24:48 +02:00
dump_pagetables.c x86, mm: Create symbolic index into address_markers array 2010-07-20 16:56:19 -07:00
extable.c x86, 64-bit: Move K8 B step iret fixup to fault entry asm 2009-10-12 18:29:46 +02:00
fault.c Merge branch 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2011-10-28 05:46:02 -07:00
gup.c thp: add compound tail page _mapcount when mapped 2011-12-09 07:50:28 -08:00
highmem_32.c x86/paravirt: PTE updates in k(un)map_atomic need to be synchronous, regardless of lazy_mmu mode 2011-12-05 17:06:34 +01:00
hugetlbpage.c mm: Convert i_mmap_lock to a mutex 2011-05-25 08:39:18 -07:00
init_32.c x86, mm: Allow ZONE_DMA to be configurable 2011-05-16 14:03:28 -07:00
init_64.c mm: Move definition of MIN_MEMORY_BLOCK_SIZE to a header 2011-07-12 11:08:01 +10:00
init.c x86: Fix S4 regression 2011-10-24 06:55:20 +02:00
iomap_32.c mm: fix race in kunmap_atomic() 2010-10-27 18:03:05 -07:00
ioremap.c ioremap: Delay sanity check until after a successful mapping 2011-04-29 08:02:47 +02:00
kmmio.c x86, kmmio/mmiotrace: Fix double free of kmmio_fault_pages 2010-06-18 11:30:09 +02:00
Makefile x86, NUMA: Rename amdtopology_64.c to amdtopology.c 2011-05-02 17:24:48 +02:00
memblock.c x86, efi: Do not reserve boot services regions within reserved areas 2011-06-18 22:48:49 +02:00
memtest.c x86, memblock: Replace e820_/_early string with memblock_ 2010-08-27 11:13:47 -07:00
mmap.c x86-32, amd: Move va_align definition to unbreak 32-bit build 2011-08-06 11:44:57 -07:00
mmio-mod.c Merge branch 'master' into for-next 2011-09-15 15:08:18 +02:00
numa_32.c x86, mm: s/PAGES_PER_ELEMENT/PAGES_PER_SECTION/ 2011-07-12 21:58:11 -07:00
numa_64.c x86, NUMA: Move NUMA init logic from numa_64.c to numa.c 2011-05-02 14:18:53 +02:00
numa_emulation.c x86, NUMA: Enable emulation on 32bit too 2011-05-02 17:24:48 +02:00
numa_internal.h x86, NUMA: Initialize and use remap allocator from setup_node_bootmem() 2011-05-02 14:18:54 +02:00
numa.c x86, numa: Implement pfn -> nid mapping granularity check 2011-07-12 21:58:29 -07:00
pageattr-test.c x86: Convert vmalloc()+memset() to vzalloc() 2011-05-28 19:53:57 +02:00
pageattr.c x86: Fix common misspellings 2011-03-18 10:39:30 +01:00
pat_internal.h x86, pat: Fix memory leak in free_memtype 2010-05-26 11:26:04 -07:00
pat_rbtree.c rbtree: Undo augmented trees performance damage and regression 2010-07-05 14:43:50 +02:00
pat.c Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2010-08-06 10:17:52 -07:00
pf_in.c x86: Eliminate various 'set but not used' warnings 2011-05-21 19:10:33 +02:00
pf_in.h x86 mmiotrace: move files into arch/x86/mm/. 2008-05-24 11:25:37 +02:00
pgtable_32.c x86: remove last traces of quicklist usage 2010-05-24 13:33:31 -07:00
pgtable.c x86: Flush TLB if PGD entry is changed in i386 PAE mode 2011-03-18 11:44:01 +01:00
physaddr.c x86: split __phys_addr out into separate file 2009-09-10 11:48:55 -07:00
physaddr.h x86: split __phys_addr out into separate file 2009-09-10 11:48:55 -07:00
setup_nx.c x86, cpu: Only CPU features determine NX capabilities 2010-11-10 15:43:15 -08:00
srat.c x86, NUMA: make srat.c 32bit safe 2011-05-02 14:18:52 +02:00
testmmiotrace.c x86, kmmio/mmiotrace: Fix double free of kmmio_fault_pages 2010-06-18 11:30:09 +02:00
tlb.c x86, tlb, UV: Do small micro-optimization for native_flush_tlb_others() 2011-03-15 08:30:34 +01:00