Commit Graph

401201 Commits

Author SHA1 Message Date
Linus Torvalds
4f794ee8c4 Merge branch 'akpm' (fixes from Andrew Morton)
Merge four more fixes from Andrew Morton.

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  lib/scatterlist.c: don't flush_kernel_dcache_page on slab page
  mm: memcg: fix test for child groups
  mm: memcg: lockdep annotation for memcg OOM lock
  mm: memcg: use proper memcg in limit bypass
2013-10-31 16:58:23 -07:00
Ming Lei
3d77b50c58 lib/scatterlist.c: don't flush_kernel_dcache_page on slab page
Commit b1adaf65ba ("[SCSI] block: add sg buffer copy helper
functions") introduces two sg buffer copy helpers, and calls
flush_kernel_dcache_page() on pages in SG list after these pages are
written to.

Unfortunately, the commit may introduce a potential bug:

 - Before sending some SCSI commands, kmalloc() buffer may be passed to
   block layper, so flush_kernel_dcache_page() can see a slab page
   finally

 - According to cachetlb.txt, flush_kernel_dcache_page() is only called
   on "a user page", which surely can't be a slab page.

 - ARCH's implementation of flush_kernel_dcache_page() may use page
   mapping information to do optimization so page_mapping() will see the
   slab page, then VM_BUG_ON() is triggered.

Aaro Koskinen reported the bug on ARM/kirkwood when DEBUG_VM is enabled,
and this patch fixes the bug by adding test of '!PageSlab(miter->page)'
before calling flush_kernel_dcache_page().

Signed-off-by: Ming Lei <ming.lei@canonical.com>
Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Tested-by: Simon Baatz <gmbnomis@gmail.com>
Cc: Russell King - ARM Linux <linux@arm.linux.org.uk>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Aaro Koskinen <aaro.koskinen@iki.fi>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Tejun Heo <tj@kernel.org>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: <stable@vger.kernel.org>	[3.2+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-31 16:58:13 -07:00
Johannes Weiner
696ac172ff mm: memcg: fix test for child groups
When memcg code needs to know whether any given memcg has children, it
uses the cgroup child iteration primitives and returns true/false
depending on whether the iteration loop is executed at least once or
not.

Because a cgroup's list of children is RCU protected, these primitives
require the RCU read-lock to be held, which is not the case for all
memcg callers.  This results in the following splat when e.g.  enabling
hierarchy mode:

  WARNING: CPU: 3 PID: 1 at kernel/cgroup.c:3043 css_next_child+0xa3/0x160()
  CPU: 3 PID: 1 Comm: systemd Not tainted 3.12.0-rc5-00117-g83f11a9-dirty #18
  Hardware name: LENOVO 3680B56/3680B56, BIOS 6QET69WW (1.39 ) 04/26/2012
  Call Trace:
    dump_stack+0x54/0x74
    warn_slowpath_common+0x78/0xa0
    warn_slowpath_null+0x1a/0x20
    css_next_child+0xa3/0x160
    mem_cgroup_hierarchy_write+0x5b/0xa0
    cgroup_file_write+0x108/0x2a0
    vfs_write+0xbd/0x1e0
    SyS_write+0x4c/0xa0
    system_call_fastpath+0x16/0x1b

In the memcg case, we only care about children when we are attempting to
modify inheritable attributes interactively.  Racing with deletion could
mean a spurious -EBUSY, no problem.  Racing with addition is handled
just fine as well through the memcg_create_mutex: if the child group is
not on the list after the mutex is acquired, it won't be initialized
from the parent's attributes until after the unlock.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-31 16:58:13 -07:00
Johannes Weiner
0056f4e66a mm: memcg: lockdep annotation for memcg OOM lock
The memcg OOM lock is a mutex-type lock that is open-coded due to
memcg's special needs.  Add annotations for lockdep coverage.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-31 16:58:13 -07:00
Johannes Weiner
3168ecbe1c mm: memcg: use proper memcg in limit bypass
Commit 84235de394 ("fs: buffer: move allocation failure loop into the
allocator") allowed __GFP_NOFAIL allocations to bypass the limit if they
fail to reclaim enough memory for the charge.  But because the main test
case was on a 3.2-based system, the patch missed the fact that on newer
kernels the charge function needs to return root_mem_cgroup when
bypassing the limit, and not NULL.  This will corrupt whatever memory is
at NULL + percpu pointer offset.  Fix this quickly before problems are
reported.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-31 16:58:13 -07:00
Linus Torvalds
358eec1824 vfs: decrapify dput(), fix cache behavior under normal load
We do not want to dirty the dentry->d_flags cacheline in dput() just to
set the DCACHE_REFERENCED flag when it is already set in the common case
anyway.  This way the first cacheline of the dentry (which contains the
RCU lookup information etc) can stay shared among multiple CPU's.

This finishes off some of the details of all the scalability patches
merged during the merge window.

Also don't mark dentry_kill() for inlining, since it's the uncommon path
and inlining it just makes the common path slower due to extra function
entry/exit overhead.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-31 15:43:02 -07:00
Linus Torvalds
0baab4fd6d i915: fix compiler warning
The last i915 drm update brought with it this annoying warning

  drivers/gpu/drm/i915/intel_crt.c: In function ‘intel_crt_get_config’:
  drivers/gpu/drm/i915/intel_crt.c:110:21: warning: unused variable ‘dev’ [-Wunused-variable]
    struct drm_device *dev = encoder->base.dev;
                       ^

introduced by commit 7195a50b5c ("drm/i915: Add HSW CRT output readout
support").

Remove the offending pointless variable.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-31 15:28:23 -07:00
Linus Torvalds
52469b4fcd Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull NUMA balancing memory corruption fixes from Ingo Molnar:
 "So these fixes are definitely not something I'd like to sit on, but as
  I said to Mel at the KS the timing is quite tight, with Linus planning
  v3.12-final within a week.

  Fedora-19 is affected:

   comet:~> grep NUMA_BALANCING /boot/config-3.11.3-201.fc19.x86_64

   CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
   CONFIG_NUMA_BALANCING_DEFAULT_ENABLED=y
   CONFIG_NUMA_BALANCING=y

  AFAICS Ubuntu will be affected as well, once it updates the kernel:

   hubble:~> grep NUMA_BALANCING /boot/config-3.8.0-32-generic

   CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
   CONFIG_NUMA_BALANCING_DEFAULT_ENABLED=y
   CONFIG_NUMA_BALANCING=y

  These 6 commits are a minimalized set of cherry-picks needed to fix
  the memory corruption bugs.  All commits are fixes, except "mm: numa:
  Sanitize task_numa_fault() callsites" which is a cleanup that made two
  followup fixes simpler.

  I've done targeted testing with just this SHA1 to try to make sure
  there are no cherry-picking artifacts.  The original non-cherry-picked
  set of fixes were exposed to linux-next for a couple of weeks"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  mm: Account for a THP NUMA hinting update as one PTE update
  mm: Close races between THP migration and PMD numa clearing
  mm: numa: Sanitize task_numa_fault() callsites
  mm: Prevent parallel splits during THP migration
  mm: Wait for THP migrations to complete during NUMA hinting faults
  mm: numa: Do not account for a hinting fault if we raced
2013-10-31 15:21:26 -07:00
Linus Torvalds
026f8f612a Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input updates from Dmitry Torokhov:
 "A bit later than I would want, but the changes are very minor - a few
  new device IDs for new hardware in existing drivers, fix for battery
  in Wacom devices not be considered system battery and cause emergency
  hibernations, and a couple of other bug fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: ALPS - add support for model found on Dell XT2
  Input: wacom - add support for ISDv4 0x10E sensor
  Input: wacom - add support for ISDv4 0x10F sensor
  Input: wacom - export battery scope
  Input: cm109 - convert high volume dev_err() to dev_err_ratelimited()
  Input: move name/timer init to input_alloc_dev()
  Input: i8042 - i8042_flush fix for a full 8042 buffer
  Input: pxa27x_keypad - fix NULL pointer dereference
2013-10-31 10:38:59 -07:00
Linus Torvalds
e7647027bd Last-minute ACPI and power management fixes for 3.12
- Revert epoll and select commits related to the freezer, introduced
    during the 3.11 cycle, that cause mysterious user space breakage
    to occur during resume from suspend to RAM for multiple users of
    32-bit x86 systems.  Material for 3.11.y stable kernels.
 
  - Revert a recent ACPI-based PCI hotplug (ACPIPHP) commit that was
    part of boot problem fixes for one machine, but turns out to cause
    issues with hotplug on Thunderbolt chains with multiple devices.
    It also turns out to be unnecessary after another fix in the
    same area that went in later.  From Mika Westerberg.
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQIcBAABCAAGBQJSclBPAAoJEILEb/54YlRxB0IQAI9Iq9VxxeGyXyK6xtEsZcoK
 PCEkuGBrZf0Ih6dyuijQ3PEQvTLIIGSzJmnkFdpn5k9YFF3GiUg+sIs1wFnry1lA
 LKxhsaRyXMv8J1yoByJYczRZNwtFSNKAC0U78BSLALZGqlw+kbVrcb7sYg1TodUJ
 nYKogS4SphSv3Jjccbw9PSUsW2FHJ6x0wUJ4WXPFQE+mO7VH5wGTmbv6lK5wOrKZ
 JAMW1RxkVffW3NLEIJHZ0sFFjn4Cp0oo6o/bV+rQx5XEHRfLDUSOMezpQim/+qPf
 NVHN1+8ydFL6bjpm+nX2Mtvy9j8x+E1iRp4EVpQOst2JaxpHXTkGWGKm8qxQ/eRO
 ksA0M09neoQKlN8uKcQ1Ank8BpX02ZKrZ3Jx/VwCvLQRk1UYAy5442WsTmnvOGlO
 /WFkhbqYFqAxngCVdiTT1ptdeT0a4VTtIZBXmA3j7qHrbo3RKSFDFDX8/82VeU8W
 GjMlFooQ6p4HgzBxNsPkBgI9d45U9lpRgdmBZK3T95AbFk+az79hm94/c4LUKnfk
 KsTOhhZ+B29LFHS9Dqz0jGkW1mXj3rXNE5Z01VA1qeWxWT6xdGsnLlgbJdvJwyZS
 lHP91ELAmPKnoxd3E36nocRRLAMZ9zq1b4kPJY155ig866kDpUaNqWjojZYBkHdC
 qGfRTBw0/35CEhdHEvgB
 =8S79
 -----END PGP SIGNATURE-----

Merge tag 'pm+acpi-3.12-late' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI and power management fixes from Rafael J Wysocki:
 "Last-minute ACPI and power management fixes for 3.12

   - Revert epoll and select commits related to the freezer, introduced
     during the 3.11 cycle, that cause mysterious user space breakage to
     occur during resume from suspend to RAM for multiple users of
     32-bit x86 systems.  Material for 3.11.y stable kernels.

   - Revert a recent ACPI-based PCI hotplug (ACPIPHP) commit that was
     part of boot problem fixes for one machine, but turns out to cause
     issues with hotplug on Thunderbolt chains with multiple devices.
     It also turns out to be unnecessary after another fix in the same
     area that went in later.  From Mika Westerberg"

* tag 'pm+acpi-3.12-late' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  Revert "ACPI / hotplug / PCI: Avoid doing too much for spurious notifies"
  Revert "select: use freezable blocking call"
  Revert "epoll: use freezable blocking call"
2013-10-31 10:13:28 -07:00
Yunkang Tang
5beea882e6 Input: ALPS - add support for model found on Dell XT2
This patch adds support for touchpad found on Dell XT2. It's a dual device
with device ID: 73, 00, 14, that comply with "ALPS_PROTO_V2".

Signed-off-by: Yunkang Tang <yunkang.tang@cn.alps.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2013-10-31 00:59:20 -07:00
Dave Airlie
74c85e1357 Merge branch 'drm-fixes-3.12' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
Just a few small fixes for radeon (audio regression fix,
stability fix, and an endian bug noticed by coverity).

* 'drm-fixes-3.12' of git://people.freedesktop.org/~agd5f/linux:
  drm/radeon/dpm: fix incompatible casting on big endian
  drm/radeon: disable bapm on KB
  drm/radeon: use sw CTS/N values for audio on DCE4+
2013-10-31 15:29:10 +10:00
Linus Torvalds
12aee278b5 Merge branch 'akpm' (fixes from Andrew Morton)
Merge three fixes from Andrew Morton.

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  memcg: use __this_cpu_sub() to dec stats to avoid incorrect subtrahend casting
  percpu: fix this_cpu_sub() subtrahend casting for unsigneds
  mm/pagewalk.c: fix walk_page_range() access of wrong PTEs
2013-10-30 14:27:10 -07:00
Greg Thelen
5e8cfc3c75 memcg: use __this_cpu_sub() to dec stats to avoid incorrect subtrahend casting
As of commit 3ea67d06e4 ("memcg: add per cgroup writeback pages
accounting") memcg counter errors are possible when moving charged
memory to a different memcg.  Charge movement occurs when processing
writes to memory.force_empty, moving tasks to a memcg with
memcg.move_charge_at_immigrate=1, or memcg deletion.

An example showing error after memory.force_empty:

  $ cd /sys/fs/cgroup/memory
  $ mkdir x
  $ rm /data/tmp/file
  $ (echo $BASHPID >> x/tasks && exec mmap_writer /data/tmp/file 1M) &
  [1] 13600
  $ grep ^mapped x/memory.stat
  mapped_file 1048576
  $ echo 13600 > tasks
  $ echo 1 > x/memory.force_empty
  $ grep ^mapped x/memory.stat
  mapped_file 4503599627370496

mapped_file should end with 0.
  4503599627370496 == 0x10,0000,0000,0000 == 0x100,0000,0000 pages
  1048576          == 0x10,0000           == 0x100 pages

This issue only affects the source memcg on 64 bit machines; the
destination memcg counters are correct.  So the rmdir case is not too
important because such counters are soon disappearing with the entire
memcg.  But the memcg.force_empty and memory.move_charge_at_immigrate=1
cases are larger problems as the bogus counters are visible for the
(possibly long) remaining life of the source memcg.

The problem is due to memcg use of __this_cpu_from(.., -nr_pages), which
is subtly wrong because it subtracts the unsigned int nr_pages (either
-1 or -512 for THP) from a signed long percpu counter.  When
nr_pages=-1, -nr_pages=0xffffffff.  On 64 bit machines stat->count[idx]
is signed 64 bit.  So memcg's attempt to simply decrement a count (e.g.
from 1 to 0) boils down to:

  long count = 1
  unsigned int nr_pages = 1
  count += -nr_pages  /* -nr_pages == 0xffff,ffff */
  count is now 0x1,0000,0000 instead of 0

The fix is to subtract the unsigned page count rather than adding its
negation.  This only works once "percpu: fix this_cpu_sub() subtrahend
casting for unsigneds" is applied to fix this_cpu_sub().

Signed-off-by: Greg Thelen <gthelen@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-30 14:27:03 -07:00
Greg Thelen
bd09d9a351 percpu: fix this_cpu_sub() subtrahend casting for unsigneds
this_cpu_sub() is implemented as negation and addition.

This patch casts the adjustment to the counter type before negation to
sign extend the adjustment.  This helps in cases where the counter type
is wider than an unsigned adjustment.  An alternative to this patch is
to declare such operations unsupported, but it seemed useful to avoid
surprises.

This patch specifically helps the following example:
  unsigned int delta = 1
  preempt_disable()
  this_cpu_write(long_counter, 0)
  this_cpu_sub(long_counter, delta)
  preempt_enable()

Before this change long_counter on a 64 bit machine ends with value
0xffffffff, rather than 0xffffffffffffffff.  This is because
this_cpu_sub(pcp, delta) boils down to this_cpu_add(pcp, -delta),
which is basically:
  long_counter = 0 + 0xffffffff

Also apply the same cast to:
  __this_cpu_sub()
  __this_cpu_sub_return()
  this_cpu_sub_return()

All percpu_test.ko passes, especially the following cases which
previously failed:

  l -= ui_one;
  __this_cpu_sub(long_counter, ui_one);
  CHECK(l, long_counter, -1);

  l -= ui_one;
  this_cpu_sub(long_counter, ui_one);
  CHECK(l, long_counter, -1);
  CHECK(l, long_counter, 0xffffffffffffffff);

  ul -= ui_one;
  __this_cpu_sub(ulong_counter, ui_one);
  CHECK(ul, ulong_counter, -1);
  CHECK(ul, ulong_counter, 0xffffffffffffffff);

  ul = this_cpu_sub_return(ulong_counter, ui_one);
  CHECK(ul, ulong_counter, 2);

  ul = __this_cpu_sub_return(ulong_counter, ui_one);
  CHECK(ul, ulong_counter, 1);

Signed-off-by: Greg Thelen <gthelen@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-30 14:27:03 -07:00
Chen LinX
3017f079ef mm/pagewalk.c: fix walk_page_range() access of wrong PTEs
When walk_page_range walk a memory map's page tables, it'll skip
VM_PFNMAP area, then variable 'next' will to assign to vma->vm_end, it
maybe larger than 'end'.  In next loop, 'addr' will be larger than
'next'.  Then in /proc/XXXX/pagemap file reading procedure, the 'addr'
will growing forever in pagemap_pte_range, pte_to_pagemap_entry will
access the wrong pte.

  BUG: Bad page map in process procrank  pte:8437526f pmd:785de067
  addr:9108d000 vm_flags:00200073 anon_vma:f0d99020 mapping:  (null) index:9108d
  CPU: 1 PID: 4974 Comm: procrank Tainted: G    B   W  O 3.10.1+ #1
  Call Trace:
    dump_stack+0x16/0x18
    print_bad_pte+0x114/0x1b0
    vm_normal_page+0x56/0x60
    pagemap_pte_range+0x17a/0x1d0
    walk_page_range+0x19e/0x2c0
    pagemap_read+0x16e/0x200
    vfs_read+0x84/0x150
    SyS_read+0x4a/0x80
    syscall_call+0x7/0xb

Signed-off-by: Liu ShuoX <shuox.liu@intel.com>
Signed-off-by: Chen LinX <linx.z.chen@intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: <stable@vger.kernel.org>	[3.10.x+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-30 14:27:03 -07:00
Russell King
c56b097af2 mm: list_lru: fix almost infinite loop causing effective livelock
I've seen a fair number of issues with kswapd and other processes
appearing to get stuck in v3.12-rc.  Using sysrq-p many times seems to
indicate that it gets stuck somewhere in list_lru_walk_node(), called
from prune_icache_sb() and super_cache_scan().

I never seem to be able to trigger a calltrace for functions above that
point.

So I decided to add the following to super_cache_scan():

    @@ -81,10 +81,14 @@ static unsigned long super_cache_scan(struct shrinker *shrink,
            inodes = list_lru_count_node(&sb->s_inode_lru, sc->nid);
            dentries = list_lru_count_node(&sb->s_dentry_lru, sc->nid);
            total_objects = dentries + inodes + fs_objects + 1;
    +printk("%s:%u: %s: dentries %lu inodes %lu total %lu\n", current->comm, current->pid, __func__, dentries, inodes, total_objects);

            /* proportion the scan between the caches */
            dentries = mult_frac(sc->nr_to_scan, dentries, total_objects);
            inodes = mult_frac(sc->nr_to_scan, inodes, total_objects);
    +printk("%s:%u: %s: dentries %lu inodes %lu\n", current->comm, current->pid, __func__, dentries, inodes);
    +BUG_ON(dentries == 0);
    +BUG_ON(inodes == 0);

            /*
             * prune the dcache first as the icache is pinned by it, then
    @@ -99,7 +103,7 @@ static unsigned long super_cache_scan(struct shrinker *shrink,
                    freed += sb->s_op->free_cached_objects(sb, fs_objects,
                                                           sc->nid);
            }
    -
    +printk("%s:%u: %s: dentries %lu inodes %lu freed %lu\n", current->comm, current->pid, __func__, dentries, inodes, freed);
            drop_super(sb);
            return freed;
     }

and shortly thereafter, having applied some pressure, I got this:

    update-apt-xapi:1616: super_cache_scan: dentries 25632 inodes 2 total 25635
    update-apt-xapi:1616: super_cache_scan: dentries 1023 inodes 0
    ------------[ cut here ]------------
    Kernel BUG at c0101994 [verbose debug info unavailable]
    Internal error: Oops - BUG: 0 [#3] SMP ARM
    Modules linked in: fuse rfcomm bnep bluetooth hid_cypress
    CPU: 0 PID: 1616 Comm: update-apt-xapi Tainted: G      D      3.12.0-rc7+ #154
    task: daea1200 ti: c3bf8000 task.ti: c3bf8000
    PC is at super_cache_scan+0x1c0/0x278
    LR is at trace_hardirqs_on+0x14/0x18
    Process update-apt-xapi (pid: 1616, stack limit = 0xc3bf8240)
    ...
    Backtrace:
      (super_cache_scan) from [<c00cd69c>] (shrink_slab+0x254/0x4c8)
      (shrink_slab) from [<c00d09a0>] (try_to_free_pages+0x3a0/0x5e0)
      (try_to_free_pages) from [<c00c59cc>] (__alloc_pages_nodemask+0x5)
      (__alloc_pages_nodemask) from [<c00e07c0>] (__pte_alloc+0x2c/0x13)
      (__pte_alloc) from [<c00e3a70>] (handle_mm_fault+0x84c/0x914)
      (handle_mm_fault) from [<c001a4cc>] (do_page_fault+0x1f0/0x3bc)
      (do_page_fault) from [<c001a7b0>] (do_translation_fault+0xac/0xb8)
      (do_translation_fault) from [<c000840c>] (do_DataAbort+0x38/0xa0)
      (do_DataAbort) from [<c00133f8>] (__dabt_usr+0x38/0x40)

Notice that we had a very low number of inodes, which were reduced to
zero my mult_frac().

Now, prune_icache_sb() calls list_lru_walk_node() passing that number of
inodes (0) into that as the number of objects to scan:

    long prune_icache_sb(struct super_block *sb, unsigned long nr_to_scan,
                         int nid)
    {
            LIST_HEAD(freeable);
            long freed;

            freed = list_lru_walk_node(&sb->s_inode_lru, nid, inode_lru_isolate,
                                           &freeable, &nr_to_scan);

which does:

    unsigned long
    list_lru_walk_node(struct list_lru *lru, int nid, list_lru_walk_cb isolate,
                       void *cb_arg, unsigned long *nr_to_walk)
    {

            struct list_lru_node    *nlru = &lru->node[nid];
            struct list_head *item, *n;
            unsigned long isolated = 0;

            spin_lock(&nlru->lock);
    restart:
            list_for_each_safe(item, n, &nlru->list) {
                    enum lru_status ret;

                    /*
                     * decrement nr_to_walk first so that we don't livelock if we
                     * get stuck on large numbesr of LRU_RETRY items
                     */
                    if (--(*nr_to_walk) == 0)
                            break;

So, if *nr_to_walk was zero when this function was entered, that means
we're wanting to operate on (~0UL)+1 objects - which might as well be
infinite.

Clearly this is not correct behaviour.  If we think about the behaviour
of this function when *nr_to_walk is 1, then clearly it's wrong - we
decrement first and then test for zero - which results in us doing
nothing at all.  A post-decrement would give the desired behaviour -
we'd try to walk one object and one object only if *nr_to_walk were one.

It also gives the correct behaviour for zero - we exit at this point.

Fixes: 5cedf721a7 ("list_lru: fix broken LRU_RETRY behaviour")
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
[ Modified to make sure we never underflow the count: this function gets
  called in a loop, so the 0 -> ~0ul transition is dangerous  - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-30 12:57:46 -07:00
Linus Torvalds
ced5d6b552 Serial fixes for 3.12-final
Here are 3 tiny fixes that are needed for 3.12-final for some serial
 drivers.  One of them is a revert of a broken patch, and two others are
 fixes for reported bugs.  All of these have been in linux-next for a
 while, I forgot I had not sent them to you yet, my fault.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iEYEABECAAYFAlJxUPwACgkQMUfUDdst+ymzGgCfSmp/uQAOLnEKJ2WKfElDcbZM
 nhkAnRivp6JzwmFLVEIXORMcmLhyc8Nn
 =FOvb
 -----END PGP SIGNATURE-----

Merge tag 'tty-3.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull serial fixes from Greg KH:
 "Here are 3 tiny fixes that are needed for 3.12-final for some serial
  drivers.

  One of them is a revert of a broken patch, and two others are fixes
  for reported bugs.  All of these have been in linux-next for a while,
  I forgot I had not sent them to you yet, my fault"

(Actually, Greg, you _had_ sent two of the three, so this pulls in just
one actual new fix)

* tag 'tty-3.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  tty/serial: at91: fix uart/usart selection for older products
2013-10-30 12:29:06 -07:00
Linus Torvalds
b8cab70665 Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
 "Mainly Intel regression fixes and quirks, along with a simple one
  liner to fix rendernodes ioctl access (off by default, but testers
  want to test it)"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
  drm: allow DRM_IOCTL_VERSION on render-nodes
  drm/i915: Fix the PPT fdi lane bifurcate state handling on ivb
  drm/i915: No LVDS hardware on Intel D410PT and D425KT
  drm/i915/dp: workaround BIOS eDP bpp clamping issue
  drm/i915: Add HSW CRT output readout support
  drm/i915: Add support for pipe_bpp readout
2013-10-30 12:27:12 -07:00
Linus Torvalds
182b4fd9f3 sound fixes for 3.12-final
A few small HD-audio regression fixes, mostly for stable kernels, too.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJScOAKAAoJEGwxgFQ9KSmknI8P/18upoLMoGn+zJYkfTFPkpXY
 X59FdVi9C/7wb+LQz9geHzDI4npZyLCVp2HUfwxCw12//MtB4D/zAxEC+1eyqjyK
 V45IUyBwly54+xQQdFdT1CkDmx6sW0Q+yfy38+KJR5IDPHSXw6IJg3pNOKJu5/Gh
 YB7FjnUdy2BLaIp1FDGS6NVkDfkSZZSTGvh6e1YXXennf1fhy0AalRx2y7091OhW
 hPjBMCatPeiiaUbTeduYrC0/6NjENNjsM/C2j489R47JtYELihwm2DHV+RxVQMGC
 kPmHXGIvqTZaIgX5mIrAlUfWAojx0W7Bt8I0L768ofg8NqF5x1Nohtnsq8Vmg4jz
 K8C7g/0AIR58bD4Q2eavVEMevlpeAaF79pTwFvShProPrposE+sv/0a38u2VarxS
 fBc8DgnBtZzYbaZRG/0gHEXS2WzOkaP7IMv/0B8F5W+cOQfaJCUUWZPv86ebGIcY
 0y4q0rDfQ58Wq9p3QEr1Z7E38gusmvU1pLteWZ+3jw661uy3prOiJRX+O3XTrBk5
 6ZABBrsMLkiNqxOeNZtFu0Gjdyq4KbNV/LSMUFz0X5E+JGGqZndvl1/AJZM6pppi
 dB28rAYwXD0TF9RvEZsm9uxPKKdFk+lslfTwZP1xJQimHR0WkS6chU1DjYp+2eWL
 7ug1CEQAAFd4ZdEnp+dL
 =HqzC
 -----END PGP SIGNATURE-----

Merge tag 'sound-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "A few small HD-audio regression fixes, mostly for stable kernels, too"

* tag 'sound-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda - Fix silent headphone on Thinkpads with AD1984A codec
  ALSA: hda - Add missing initial vmaster hook at build_controls callback
  ALSA: hda - Fix unbalanced runtime PM refcount after S3/S4
2013-10-30 12:26:29 -07:00
Linus Torvalds
96d33b086b Fixes for the 3.12 debugfs problem---removing the duplicate directory
name, and using a better the error code.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJScOswAAoJEBvWZb6bTYbyCO8P/RoJy4EgsEjv9Qxfr+wfHXX0
 QOz/7fqEsFFMDh1Fiyjvg9uNYj5jQO9lzQfX0ea0uc4svIMUBWaAlyQVL/U0igkp
 2DrIoRh1aNa9Lt4RDsGswapreJylGgyhADa2W9r1tlPna7dlaPCEHGxZBR112KJc
 R9t0PMIq7NBGvPGBhiOkjeDVlqWUrEQvxOQW8ZQZw2pBc54DSreY9nq3n1dIhqMp
 gxpAKi3h5YWK8lzFxFSOLKnRqC+5SAUj2Bk8g37ZESJXp7bpmn74DpuK4KiLLokP
 V3rUvHO1p0eZSx0QWuV9QrPV6Q1VVuhLEinWnCdejhXUqk/tCR5EOMKMcHIk2xad
 6ApE7a4UPs9VmXp5Ry7/k6lorOrBuvwArPwJOCpCN7zAN1gEmmfFg3KbFdtbaIgx
 g4hEtqWQ6VULBxyDXCFHYbXGLS22kgWoQCR3ToGqbjzuZmZph49NbLjlt6d7o6w0
 n+UMZqcpLA9eUhq4Yi0IAGe45pizm5rdC4xRpbH7bK7WmKywYw+zf+JSOheTLwtm
 ER0sTmcyMSP68ysnJCLgh4hYf1lxoaQ/8SEdh331VIrThmHYyWGZD4Ol9YKg5xwU
 Yzd3wedxYYIiJ1vFfZykMWUT9zLU3feMDvq2s2maqg+uV/b4uPfZhwnFLBU+PZKC
 cxkXsb/7oZ//tmGF1MEm
 =GwaF
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "Fixes for the 3.12 debugfs problem - removing the duplicate directory
  name, and using a better the error code"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: use a more sensible error number when debugfs directory creation fails
  KVM: Fix modprobe failure for kvm_intel/kvm_amd
2013-10-30 12:25:15 -07:00
Dan Carpenter
a8b33654b1 Staging: sb105x: info leak in mp_get_count()
The icount.reserved[] array isn't initialized so it leaks stack
information to userspace.

Reported-by: Nico Golde <nico@ngolde.de>
Reported-by: Fabian Yamaguchi <fabs@goesec.de>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-30 12:24:50 -07:00
Dan Carpenter
8d1e72250c Staging: bcm: info leak in ioctl
The DevInfo.u32Reserved[] array isn't initialized so it leaks kernel
information to user space.

Reported-by: Nico Golde <nico@ngolde.de>
Reported-by: Fabian Yamaguchi <fabs@goesec.de>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-30 12:24:49 -07:00
Dan Carpenter
b5e2f33986 staging: wlags49_h2: buffer overflow setting station name
We need to check the length parameter before doing the memcpy().  I've
actually changed it to strlcpy() as well so that it's NUL terminated.

You need CAP_NET_ADMIN to trigger these so it's not the end of the
world.

Reported-by: Nico Golde <nico@ngolde.de>
Reported-by: Fabian Yamaguchi <fabs@goesec.de>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-30 12:24:49 -07:00
Dan Carpenter
f856567b93 aacraid: missing capable() check in compat ioctl
In commit d496f94d22 ('[SCSI] aacraid: fix security weakness') we
added a check on CAP_SYS_RAWIO to the ioctl.  The compat ioctls need the
check as well.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-30 12:24:49 -07:00
Dan Carpenter
c2c65cd2e1 staging: ozwpan: prevent overflow in oz_cdev_write()
We need to check "count" so we don't overflow the ei->data buffer.

Reported-by: Nico Golde <nico@ngolde.de>
Reported-by: Fabian Yamaguchi <fabs@goesec.de>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-30 12:24:49 -07:00
Dan Carpenter
201f99f170 uml: check length in exitcode_proc_write()
We don't cap the size of buffer from the user so we could write past the
end of the array here.  Only root can write to this file.

Reported-by: Nico Golde <nico@ngolde.de>
Reported-by: Fabian Yamaguchi <fabs@goesec.de>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-30 12:24:49 -07:00
Mika Westerberg
ab1225901d Revert "ACPI / hotplug / PCI: Avoid doing too much for spurious notifies"
Commit 2dc4128 (ACPI / hotplug / PCI: Avoid doing too much for
spurious notifies) changed the enable_slot() to check return value of
pci_scan_slot() and if it is zero return early from the function. It
means that there were no new devices in this particular slot.

However, if a device appeared deeper in the hierarchy the code now
ignores it causing things like Thunderbolt chaining fail to recognize
new devices.

The problem with Alex Williamson's machine was solved with commit
a47d8c8 (ACPI / hotplug / PCI: Avoid parent bus rescans on spurious
device checks) and hence we should be able to restore the original
functionality that we always rescan on bus check notification.

On a device check notification we still check what acpiphp_rescan_slot()
returns and on zero bail out early.

Fixes: 2dc41281b1 (ACPI / hotplug / PCI: Avoid doing too much for spurious notifies)
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-10-30 15:28:52 +01:00
Rafael J. Wysocki
59612d1879 Revert "select: use freezable blocking call"
This reverts commit 9745cdb36d (select: use freezable blocking call)
that triggers problems during resume from suspend to RAM on Paul Bolle's
32-bit x86 machines.  Paul says:

  Ever since I tried running (release candidates of) v3.11 on the two
  working i686s I still have lying around I ran into issues on resuming
  from suspend. Reverting 9745cdb36d (select: use freezable blocking
  call) resolves those issues.

  Resuming from suspend on i686 on (release candidates of) v3.11 and
  later triggers issues like:

  traps: systemd[1] general protection ip:b738e490 sp:bf882fc0 error:0 in libc-2.16.so[b731c000+1b0000]

  and

  traps: rtkit-daemon[552] general protection ip:804d6e5 sp:b6cb32f0 error:0 in rtkit-daemon[8048000+d000]

  Once I hit the systemd error I can only get out of the mess that the
  system is at that point by power cycling it.

Since we are reverting another freezer-related change causing similar
problems to happen, this one should be reverted as well.

References: https://lkml.org/lkml/2013/10/29/583
Reported-by: Paul Bolle <pebolle@tiscali.nl>
Fixes: 9745cdb36d (select: use freezable blocking call)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: 3.11+ <stable@vger.kernel.org> # 3.11+
2013-10-30 15:28:35 +01:00
Rafael J. Wysocki
c511851de1 Revert "epoll: use freezable blocking call"
This reverts commit 1c441e9212 (epoll: use freezable blocking call)
which is reported to cause user space memory corruption to happen
after suspend to RAM.

Since it appears to be extremely difficult to root cause this
problem, it is best to revert the offending commit and try to address
the original issue in a better way later.

References: https://bugzilla.kernel.org/show_bug.cgi?id=61781
Reported-by: Natrio <natrio@list.ru>
Reported-by: Jeff Pohlmeyer <yetanothergeek@gmail.com>
Bisected-by: Leo Wolf <jclw@ymail.com>
Fixes: 1c441e9212 (epoll: use freezable blocking call)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: 3.11+ <stable@vger.kernel.org> # 3.11+
2013-10-30 15:27:53 +01:00
Paolo Bonzini
0c8eb04a62 KVM: use a more sensible error number when debugfs directory creation fails
I don't know if this was due to cut and paste, or somebody was really
using a D20 to pick the error code for kvm_init_debugfs as suggested by
Linus (EFAULT is 14, so the possibility cannot be entirely ruled out).

In any case, this patch fixes it.

Reported-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-10-30 12:15:34 +01:00
Tim Gardner
d780a31271 KVM: Fix modprobe failure for kvm_intel/kvm_amd
The x86 specific kvm init creates a new conflicting
debugfs directory which causes modprobe issues
with kvm_intel and kvm_amd. For example,

sudo modprobe kvm_amd
modprobe: ERROR: could not insert 'kvm_amd': Bad address

The simplest fix is to just rename the directory. The following
KVM config options are set:

CONFIG_KVM_GUEST=y
CONFIG_KVM_DEBUG_FS=y
CONFIG_HAVE_KVM=y
CONFIG_HAVE_KVM_IRQCHIP=y
CONFIG_HAVE_KVM_IRQ_ROUTING=y
CONFIG_HAVE_KVM_EVENTFD=y
CONFIG_KVM_APIC_ARCHITECTURE=y
CONFIG_KVM_MMIO=y
CONFIG_KVM_ASYNC_PF=y
CONFIG_HAVE_KVM_MSI=y
CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT=y
CONFIG_KVM=m
CONFIG_KVM_INTEL=m
CONFIG_KVM_AMD=m
CONFIG_KVM_DEVICE_ASSIGNMENT=y

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
[Change debugfs directory name. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-10-30 12:10:42 +01:00
David Herrmann
3d3b78c06c drm: allow DRM_IOCTL_VERSION on render-nodes
DRM_IOCTL_VERSION is a reliable way to get the driver-name and version
information. It's not related to the interface-version (SET_VERSION ioctl)
so we can safely enable it on render-nodes.

Note that gbm uses udev-BUSID to load the correct mesa driver. However,
the VERSION ioctl should be the more reliable way to do this (in case we
add new DRM-bus drivers which have no BUSID or similar).

Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2013-10-30 14:41:56 +10:00
Dave Airlie
2f2632ff6e Merge tag 'drm-intel-fixes-2013-10-29' of git://people.freedesktop.org/~danvet/drm-intel into drm-fixes
Regression and warn fixes for i915.

* tag 'drm-intel-fixes-2013-10-29' of git://people.freedesktop.org/~danvet/drm-intel:
  drm/i915: Fix the PPT fdi lane bifurcate state handling on ivb
  drm/i915: No LVDS hardware on Intel D410PT and D425KT
  drm/i915/dp: workaround BIOS eDP bpp clamping issue
  drm/i915: Add HSW CRT output readout support
  drm/i915: Add support for pipe_bpp readout
2013-10-30 12:30:12 +10:00
Linus Torvalds
7314e613d5 Fix a few incorrectly checked [io_]remap_pfn_range() calls
Nico Golde reports a few straggling uses of [io_]remap_pfn_range() that
really should use the vm_iomap_memory() helper.  This trivially converts
two of them to the helper, and comments about why the third one really
needs to continue to use remap_pfn_range(), and adds the missing size
check.

Reported-by: Nico Golde <nico@ngolde.de>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org.
2013-10-29 10:21:34 -07:00
Linus Torvalds
f9ec2e6f79 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf tooling fixes from Ingo Molnar:
 "This contains five tooling fixes:

   - fix a remaining mmap2 assumption which resulted in perf top output
     breakage
   - fix mmap ring-buffer processing bug that corrupts data
   - fix for a severe python scripting memory leak
   - fix broken (and user-visible) -g option handling
   - fix stdio output

  The diffstat size is larger than what we'd like to see this late :-/"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf tools: Fixup mmap event consumption
  perf top: Split -G and --call-graph
  perf record: Split -g and --call-graph
  perf hists: Add color overhead for stdio output buffer
  perf tools: Fix up /proc/PID/maps parsing
  perf script python: Fix mem leak due to missing Py_DECREFs on dict entries
2013-10-29 08:36:50 -07:00
Linus Torvalds
2a999aa0a1 Kconfig: make KOBJECT_RELEASE debugging require timer debugging
Without the timer debugging, the delayed kobject release will just
result in undebuggable oopses if it triggers any latent bugs.  That
doesn't actually help debugging at all.

So make DEBUG_KOBJECT_RELEASE depend on DEBUG_OBJECTS_TIMERS to avoid
having people enable one without the other.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-29 08:33:36 -07:00
Daniel Vetter
1fbc0d789d drm/i915: Fix the PPT fdi lane bifurcate state handling on ivb
Originally I've thought that this is leftover hw state dirt from the
BIOS. But after way too much helpless flailing around on my part I've
noticed that the actual bug is when we change the state of an already
active pipe.

For example when we change the fdi lines from 2 to 3 without switching
off outputs in-between we'll never see the crucial on->off transition
in the ->modeset_global_resources hook the current logic relies on.

Patch version 2 got this right by instead also checking whether the
pipe is indeed active. But that in turn broke things when pipes have
been turned off through dpms since the bifurcate enabling is done in
the ->crtc_mode_set callback.

To address this issues discussed with Ville in the patch review move
the setting of the bifurcate bit into the ->crtc_enable hook. That way
we won't wreak havoc with this state when userspace puts all other
outputs into dpms off state. This also moves us forward with our
overall goal to unify the modeset and dpms on paths (which we need to
have to allow runtime pm in the dpms off state).

Unfortunately this requires us to move the bifurcate helpers around a
bit.

Also update the commit message, I've misanalyzed the bug rather badly.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70507
Tested-by: Jan-Michael Brummer <jan.brummer@tabos.org>
Cc: stable@vger.kernel.org
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-10-29 13:52:56 +01:00
Mel Gorman
0255d49184 mm: Account for a THP NUMA hinting update as one PTE update
A THP PMD update is accounted for as 512 pages updated in vmstat.  This is
large difference when estimating the cost of automatic NUMA balancing and
can be misleading when comparing results that had collapsed versus split
THP. This patch addresses the accounting issue.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: <stable@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1381141781-10992-10-git-send-email-mgorman@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-29 11:38:17 +01:00
Mel Gorman
3f926ab945 mm: Close races between THP migration and PMD numa clearing
THP migration uses the page lock to guard against parallel allocations
but there are cases like this still open

  Task A					Task B
  ---------------------				---------------------
  do_huge_pmd_numa_page				do_huge_pmd_numa_page
  lock_page
  mpol_misplaced == -1
  unlock_page
  goto clear_pmdnuma
						lock_page
						mpol_misplaced == 2
						migrate_misplaced_transhuge
  pmd = pmd_mknonnuma
  set_pmd_at

During hours of testing, one crashed with weird errors and while I have
no direct evidence, I suspect something like the race above happened.
This patch extends the page lock to being held until the pmd_numa is
cleared to prevent migration starting in parallel while the pmd_numa is
being cleared. It also flushes the old pmd entry and orders pagetable
insertion before rmap insertion.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: <stable@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1381141781-10992-9-git-send-email-mgorman@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-29 11:38:05 +01:00
Mel Gorman
c61109e34f mm: numa: Sanitize task_numa_fault() callsites
There are three callers of task_numa_fault():

 - do_huge_pmd_numa_page():
     Accounts against the current node, not the node where the
     page resides, unless we migrated, in which case it accounts
     against the node we migrated to.

 - do_numa_page():
     Accounts against the current node, not the node where the
     page resides, unless we migrated, in which case it accounts
     against the node we migrated to.

 - do_pmd_numa_page():
     Accounts not at all when the page isn't migrated, otherwise
     accounts against the node we migrated towards.

This seems wrong to me; all three sites should have the same
sementaics, furthermore we should accounts against where the page
really is, we already know where the task is.

So modify all three sites to always account; we did after all receive
the fault; and always account to where the page is after migration,
regardless of success.

They all still differ on when they clear the PTE/PMD; ideally that
would get sorted too.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: <stable@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1381141781-10992-8-git-send-email-mgorman@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-29 11:37:52 +01:00
Mel Gorman
587fe586f4 mm: Prevent parallel splits during THP migration
THP migrations are serialised by the page lock but on its own that does
not prevent THP splits. If the page is split during THP migration then
the pmd_same checks will prevent page table corruption but the unlock page
and other fix-ups potentially will cause corruption. This patch takes the
anon_vma lock to prevent parallel splits during migration.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: <stable@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1381141781-10992-7-git-send-email-mgorman@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-29 11:37:39 +01:00
Mel Gorman
42836f5f8b mm: Wait for THP migrations to complete during NUMA hinting faults
The locking for migrating THP is unusual. While normal page migration
prevents parallel accesses using a migration PTE, THP migration relies on
a combination of the page_table_lock, the page lock and the existance of
the NUMA hinting PTE to guarantee safety but there is a bug in the scheme.

If a THP page is currently being migrated and another thread traps a
fault on the same page it checks if the page is misplaced. If it is not,
then pmd_numa is cleared. The problem is that it checks if the page is
misplaced without holding the page lock meaning that the racing thread
can be migrating the THP when the second thread clears the NUMA bit
and faults a stale page.

This patch checks if the page is potentially being migrated and stalls
using the lock_page if it is potentially being migrated before checking
if the page is misplaced or not.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: <stable@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1381141781-10992-6-git-send-email-mgorman@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-29 11:37:19 +01:00
Mel Gorman
1dd49bfa34 mm: numa: Do not account for a hinting fault if we raced
If another task handled a hinting fault in parallel then do not double
account for it.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: <stable@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1381141781-10992-5-git-send-email-mgorman@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-29 11:37:05 +01:00
Ingo Molnar
cd65718712 perf/urgent fixes:
. Add color overhead for stdio output buffer, which fixes
   --stdio output being chopped up on the hot (red) entries,
   fix from Jiri Olsa.
 
 . Get 'perf record -g -a sleep 1' working again, removing the
   need for -- separating perf options from the workload, restoring
   ages old behaviour, fix from Jiri Olsa.
   More patches allowing ~/.perfconfig setting up of default
   callchain collecting method ("fp" or "dwarf") left for next
   merge window.
 
 . Fixup mmap event consumption, where we were acking the
   consumption by writing the tail before actually accessing
   the event, which could lead to using overwritten records
   in things like 'perf record --call-graph'.  from Zhouyi Zhou.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJSbrexAAoJENZQFvNTUqpA+MIP/1dMAat/+jAobBvy9s69rJbT
 OV2QxqS5haA6rjbyogY7AiaCM/bSFO9dpLDF31NUTc3s6hiMsHnXnogQxvGGrBzT
 lzj45V9/a8XXl8Wxbpv0z4mI7ppNR3KjJdkBTsuqso5GivzMuG/8/N7ZxYB8kSlL
 vnWLRBKiBa/sDKIhKy5oEqmMMa+kPmwIXGZiKFxWkNMjmBtK80SB7MtPernhLNGf
 wj6EEulphPR1s8VHXqZIoijEvgw2vEXDTFapKd53EV+UPCII13NW7gicgPXyiWrY
 BIRW4Pwl98y9qHEc5gSq5Eotcl9JhIPQVegTSnvx8jJ7x0bYrxmChb/I9eAjWQcj
 YQAKGqUoQGu0PJv8TeIoGsxNMFxeeeMKzf275KKc5g6Les6gIHYKOmW7RPXHIIA7
 DHGYOJ808jMrNFmAkKNmxQGbE7forMFj6CPDgGpGezqTUbGsszpzga2QQgI0QmrY
 l8aPUsirF5cZ67vPk7dnUt58FayfKMdGME2dkU352hjOL1V5EeVdxx46zZsPQNyQ
 evQvkGBmRm6WFi6Sybjc6i4Gkn9F1O5sNDtEDbSQ6aZH5tNnktzLoufPrWwHKV95
 GzEHpCuAeZbagPF6WSxpNTb2xfScPb6PfveGm0gbypwl/KaAr47w8VCWJiuvpSOv
 yV83qEooKW0LUn4JXrH3
 =U+a8
 -----END PGP SIGNATURE-----

Merge tag 'perf-urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent

Pull perf/urgent fixes from Arnaldo Carvalho de Melo:

 * Add color overhead for stdio output buffer, which fixes
   --stdio output being chopped up on the hot (red) entries,
   fix from Jiri Olsa.

 * Get 'perf record -g -a sleep 1' working again, removing the
   need for -- separating perf options from the workload, restoring
   ages old behaviour, fix from Jiri Olsa.
   More patches allowing ~/.perfconfig setting up of default
   callchain collecting method ("fp" or "dwarf") left for next
   merge window.

 * Fixup mmap event consumption, where we were acking the
   consumption by writing the tail before actually accessing
   the event, which could lead to using overwritten records
   in things like 'perf record --call-graph'. From Zhouyi Zhou.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-10-29 09:06:07 +01:00
Linus Torvalds
c9ca72fc56 Xtensa patchset for v3.12-rc6
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJSXuCxAAoJEI9vqH3mFV2sdWwP/11xVQFtzoT01k9P4fZeFCMT
 +dGztMBY6bODMEyeC9raX6sJhuOYNivS2IBFJ4qv9p+kcCplYE0EsB3MfOQ60NdV
 eV2DdnHrPKgIPUutGf7mcZ5bXB8q8HEuEFn0ONsCdNTAxzrcLxaPZU3MeGhSfL0H
 d/RPPmSW8+vLEMWUe58EB2J8Z1DJye4O0pbxskjGA64KrLssKBG3LWBQBdTsihkE
 r2olYYT0j7osJ7loLbyll/CcacQTfJbQT8Xd0Y7MbLYR/G6EVMwb6PkzkFnUPMIQ
 zLwMAOgL8/E250p4ab0Wy8vf9rW+938qR93oZEyO8TRDdF/mYNQb+GK+ZKUrMrxg
 eZ/pOhKQ+xUKtt23xrvbXR3nTnI9AfiJ1nhuO4UmX5WIhyJXDcYYk+rujlEduxGm
 IJ7p5cRWBdqC+iIiSzQJjyW4drIXU6QJ+ureOGo0vBcBJsLRbUFYLt+08Rb60/d1
 zgS8L7DpKRUZlIrbD6/HxyUimJCTGERsvTSDZYCYs1+ri/YKV4a8Ukfsl8bWLbhd
 Le5PyMeusWaM6MHCIOJPys2SpsKGl9UCaVr7Nz1Lr6HIqZJUN61rNVSIcUpufiqt
 hPSYtljmz6746Z6QvNW0OD8+qE5foplCnv94Oqj9tb24gHfINPXkjiaEKhPDICi0
 6cCQj4pCsjjlzXXmn5Fp
 =sopZ
 -----END PGP SIGNATURE-----

Merge tag 'xtensa-next-20131015' of git://github.com/czankel/xtensa-linux

Pull Xtensa patchset from Chris Zankel:
 "The main patch fixes a bug that can cause a kernel panic, and was
  introduced in rc1.  The other two have been discovered by a uclibc
  test and 'coccinelle'"

* tag 'xtensa-next-20131015' of git://github.com/czankel/xtensa-linux:
  xtensa: Cocci spatch "noderef"
  xtensa: don't use alternate signal stack on threads
  xtensa: fix fast_syscall_spill_registers_fixup
2013-10-28 16:58:05 -07:00
Linus Torvalds
5d914a959d SCSI fixes on 20131028
This is a set of four patches that revert functionality introduced in the
 merge window to sg.  The locking changes turned out to introduce this bug:
 
     [  205.372901] [ BUG: lock held when returning to user space! ]
 [...]
     [  205.373285]  #0:  (&sdp->o_sem){.+.+..}, at: [<ffffffff8161e650>] sg_open+0x3a0/0x4d0
 
 The fix is large, so at this late stage we'd like to revert the functionality
 and start again in the next merge window.
 
 Signed-off-by: James Bottomley <JBottomley@Parallels.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQEcBAABAgAGBQJSbneXAAoJEDeqqVYsXL0Mr+AIAJRkPeIzyX59Y16nAlxY8tZj
 MCYFZ9v9TiVEXm2+AM/Z/Y6jVYG3wgkRzZARyOxeNeRsyWFMsm74WQG5AIopwL5B
 liN9tMlcxl3Ofdh9Ww1Os7AaIhm3xmiRTNsNirNxgh5Y3+XH8MMTv5QR97XIt5Ys
 arDJUAJKtYJIcgMbYpDdQ7vb8deNsrwC0mQ2vbdox6tMKn8wIigsyZKI/+55imJP
 uPZjruhFQxhzMzi5+kR8q/Y2hj+2Lo0pSIEWjO2K/xqLzMI/8IkCwx5gYnNdxONE
 Rrj/6VevFd0w3G9EWb8d4HhptyTMFkq4D8XoSxepSFdA1Ti4kVixwXw+PRdEU/I=
 =pSy2
 -----END PGP SIGNATURE-----

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "This is a set of four patches that revert functionality introduced in
  the merge window to sg.  The locking changes turned out to introduce
  this bug:

      [  205.372901] [ BUG: lock held when returning to user space! ]
   [...]
      [  205.373285]  #0:  (&sdp->o_sem){.+.+..}, at: [<ffffffff8161e650>] sg_open+0x3a0/0x4d0

  The fix is large, so at this late stage we'd like to revert the
  functionality and start again in the next merge window"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  [SCSI] Revert "sg: use rwsem to solve race during exclusive open"
  [SCSI] Revert "sg: no need sg_open_exclusive_lock"
  [SCSI] Revert "sg: checking sdp->detached isn't protected when open"
  [SCSI] Revert "sg: push file descriptor list locking down to per-device locking"
2013-10-28 16:57:13 -07:00
Zhouyi Zhou
8e50d384cc perf tools: Fixup mmap event consumption
The tail position of the event buffer should only be modified after
actually use that event.

If not the event buffer could be invalid before use, and segment fault
occurs when invoking perf top -G.

Signed-off-by: Zhouyi Zhou <yizhouzhou@ict.ac.cn>
Cc: David Ahern <dsahern@gmail.com>
Cc: Zhouyi Zhou <yizhouzhou@ict.ac.cn>
Link: http://lkml.kernel.org/r/1382600613-32177-1-git-send-email-zhouzhouyi@gmail.com
[ Simplified the logic using exit gotos and renamed write_tail method to mmap_consume ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-10-28 16:06:00 -03:00
Jiri Olsa
ae779a6309 perf top: Split -G and --call-graph
Splitting -G and --call-graph for record command, so we could use '-G'
with no option.

The '-G' option now takes NO argument and enables the configured unwind
method, which is currently the frame pointers method.

It will be possible to configure unwind method via config file in
upcoming patches.

All current '-G' arguments is overtaken by --call-graph option.

NOTE: The documentation for top --call-graph option
      was wrongly copied from report command.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: David Ahern <dsahern@gmail.com>
Tested-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: David Ahern <dsahern@gmail.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1382797536-32303-3-git-send-email-jolsa@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-10-28 16:06:00 -03:00
Jiri Olsa
09b0fd45ff perf record: Split -g and --call-graph
Splitting -g and --call-graph for record command, so we could use '-g'
with no option.

The '-g' option now takes NO argument and enables the configured unwind
method, which is currently the frame pointers method.

It will be possible to configure unwind method via config file in
upcoming patches.

All current '-g' arguments is overtaken by --call-graph option.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: David Ahern <dsahern@gmail.com>
Tested-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: David Ahern <dsahern@gmail.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1382797536-32303-2-git-send-email-jolsa@redhat.com
[ reordered -g/--call-graph on --help and expanded the man page
  according to comments by David Ahern and Namhyung Kim ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-10-28 16:05:59 -03:00