72955 Commits

Author SHA1 Message Date
Naoya Horiguchi
fafaa4264e pagewalk: improve vma handling
Current implementation of page table walker has a fundamental problem in
vma handling, which started when we tried to handle vma(VM_HUGETLB).
Because it's done in pgd loop, considering vma boundary makes code
complicated and bug-prone.

From the users viewpoint, some user checks some vma-related condition to
determine whether the user really does page walk over the vma.

In order to solve these, this patch moves vma check outside pgd loop and
introduce a new callback ->test_walk().

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11 17:06:05 -08:00
Naoya Horiguchi
0b1fbfe500 mm/pagewalk: remove pgd_entry() and pud_entry()
Currently no user of page table walker sets ->pgd_entry() or
->pud_entry(), so checking their existence in each loop is just wasting
CPU cycle.  So let's remove it to reduce overhead.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11 17:06:05 -08:00
Andrea Arcangeli
0664e57ff0 mm: gup: kvm use get_user_pages_unlocked
Use the more generic get_user_pages_unlocked which has the additional
benefit of passing FAULT_FLAG_ALLOW_RETRY at the very first page fault
(which allows the first page fault in an unmapped area to be always able
to block indefinitely by being allowed to release the mmap_sem).

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Andres Lagar-Cavilla <andreslc@google.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Peter Feiner <pfeiner@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11 17:06:05 -08:00
Andrea Arcangeli
0fd71a56f4 mm: gup: add __get_user_pages_unlocked to customize gup_flags
Some callers (like KVM) may want to set the gup_flags like FOLL_HWPOSION
to get a proper -EHWPOSION retval instead of -EFAULT to take a more
appropriate action if get_user_pages runs into a memory failure.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andres Lagar-Cavilla <andreslc@google.com>
Cc: Peter Feiner <pfeiner@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11 17:06:05 -08:00
Andrea Arcangeli
f0818f472d mm: gup: add get_user_pages_locked and get_user_pages_unlocked
FAULT_FOLL_ALLOW_RETRY allows the page fault to drop the mmap_sem for
reading to reduce the mmap_sem contention (for writing), like while
waiting for I/O completion.  The problem is that right now practically no
get_user_pages call uses FAULT_FOLL_ALLOW_RETRY, so we're not leveraging
that nifty feature.

Andres fixed it for the KVM page fault.  However get_user_pages_fast
remains uncovered, and 99% of other get_user_pages aren't using it either
(the only exception being FOLL_NOWAIT in KVM which is really nonblocking
and in fact it doesn't even release the mmap_sem).

So this patchsets extends the optimization Andres did in the KVM page
fault to the whole kernel.  It makes most important places (including
gup_fast) to use FAULT_FOLL_ALLOW_RETRY to reduce the mmap_sem hold times
during I/O.

The only few places that remains uncovered are drivers like v4l and other
exceptions that tends to work on their own memory and they're not working
on random user memory (for example like O_DIRECT that uses gup_fast and is
fully covered by this patch).

A follow up patch should probably also add a printk_once warning to
get_user_pages that should go obsolete and be phased out eventually.  The
"vmas" parameter of get_user_pages makes it fundamentally incompatible
with FAULT_FOLL_ALLOW_RETRY (vmas array becomes meaningless the moment the
mmap_sem is released).

While this is just an optimization, this becomes an absolute requirement
for the userfaultfd feature http://lwn.net/Articles/615086/ .

The userfaultfd allows to block the page fault, and in order to do so I
need to drop the mmap_sem first.  So this patch also ensures that all
memory where userfaultfd could be registered by KVM, the very first fault
(no matter if it is a regular page fault, or a get_user_pages) always has
FAULT_FOLL_ALLOW_RETRY set.  Then the userfaultfd blocks and it is waken
only when the pagetable is already mapped.  The second fault attempt after
the wakeup doesn't need FAULT_FOLL_ALLOW_RETRY, so it's ok to retry
without it.

This patch (of 5):

We can leverage the VM_FAULT_RETRY functionality in the page fault paths
better by using either get_user_pages_locked or get_user_pages_unlocked.

The former allows conversion of get_user_pages invocations that will have
to pass a "&locked" parameter to know if the mmap_sem was dropped during
the call.  Example from:

    down_read(&mm->mmap_sem);
    do_something()
    get_user_pages(tsk, mm, ..., pages, NULL);
    up_read(&mm->mmap_sem);

to:

    int locked = 1;
    down_read(&mm->mmap_sem);
    do_something()
    get_user_pages_locked(tsk, mm, ..., pages, &locked);
    if (locked)
        up_read(&mm->mmap_sem);

The latter is suitable only as a drop in replacement of the form:

    down_read(&mm->mmap_sem);
    get_user_pages(tsk, mm, ..., pages, NULL);
    up_read(&mm->mmap_sem);

into:

    get_user_pages_unlocked(tsk, mm, ..., pages);

Where tsk, mm, the intermediate "..." paramters and "pages" can be any
value as before.  Just the last parameter of get_user_pages (vmas) must be
NULL for get_user_pages_locked|unlocked to be usable (the latter original
form wouldn't have been safe anyway if vmas wasn't null, for the former we
just make it explicit by dropping the parameter).

If vmas is not NULL these two methods cannot be used.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Andres Lagar-Cavilla <andreslc@google.com>
Reviewed-by: Peter Feiner <pfeiner@google.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11 17:06:05 -08:00
Vlastimil Babka
be97a41b29 mm/mempolicy.c: merge alloc_hugepage_vma to alloc_pages_vma
The previous commit ("mm/thp: Allocate transparent hugepages on local
node") introduced alloc_hugepage_vma() to mm/mempolicy.c to perform a
special policy for THP allocations.  The function has the same interface
as alloc_pages_vma(), shares a lot of boilerplate code and a long
comment.

This patch merges the hugepage special case into alloc_pages_vma.  The
extra if condition should be cheap enough price to pay.  We also prevent
a (however unlikely) race with parallel mems_allowed update, which could
make hugepage allocation restart only within the fallback call to
alloc_hugepage_vma() and not reconsider the special rule in
alloc_hugepage_vma().

Also by making sure mpol_cond_put(pol) is always called before actual
allocation attempt, we can use a single exit path within the function.

Also update the comment for missing node parameter and obsolete reference
to mm_sem.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11 17:06:04 -08:00
Aneesh Kumar K.V
077fcf116c mm/thp: allocate transparent hugepages on local node
This make sure that we try to allocate hugepages from local node if
allowed by mempolicy.  If we can't, we fallback to small page allocation
based on mempolicy.  This is based on the observation that allocating
pages on local node is more beneficial than allocating hugepages on remote
node.

With this patch applied we may find transparent huge page allocation
failures if the current node doesn't have enough freee hugepages.  Before
this patch such failures result in us retrying the allocation on other
nodes in the numa node mask.

[akpm@linux-foundation.org: fix comment, add CONFIG_TRANSPARENT_HUGEPAGE dependency]
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11 17:06:04 -08:00
Joonsoo Kim
24e2716f63 mm/compaction: add tracepoint to observe behaviour of compaction defer
Compaction deferring logic is heavy hammer that block the way to the
compaction.  It doesn't consider overall system state, so it could prevent
user from doing compaction falsely.  In other words, even if system has
enough range of memory to compact, compaction would be skipped due to
compaction deferring logic.  This patch add new tracepoint to understand
work of deferring logic.  This will also help to check compaction success
and fail.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11 17:06:04 -08:00
Joonsoo Kim
837d026d56 mm/compaction: more trace to understand when/why compaction start/finish
It is not well analyzed that when/why compaction start/finish or not.
With these new tracepoints, we can know much more about start/finish
reason of compaction.  I can find following bug with these tracepoint.

http://www.spinics.net/lists/linux-mm/msg81582.html

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11 17:06:04 -08:00
Joonsoo Kim
e34d85f0e3 mm/compaction: print current range where compaction work
It'd be useful to know current range where compaction work for detailed
analysis.  With it, we can know pageblock where we actually scan and
isolate, and, how much pages we try in that pageblock and can guess why it
doesn't become freepage with pageblock order roughly.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11 17:06:04 -08:00
Joonsoo Kim
16c4a097a0 mm/compaction: enhance tracepoint output for compaction begin/end
We now have tracepoint for begin event of compaction and it prints start
position of both scanners, but, tracepoint for end event of compaction
doesn't print finish position of both scanners.  It'd be also useful to
know finish position of both scanners so this patch add it.  It will help
to find odd behavior or problem on compaction internal logic.

And mode is added to both begin/end tracepoint output, since according to
mode, compaction behavior is quite different.

And lastly, status format is changed to string rather than status number
for readability.

[akpm@linux-foundation.org: fix sparse warning]
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11 17:06:04 -08:00
Joonsoo Kim
4645f06334 mm/compaction: change tracepoint format from decimal to hexadecimal
To check the range that compaction is working, tracepoint print
start/end pfn of zone and start pfn of both scanner with decimal format.
Since we manage all pages in order of 2 and it is well represented by
hexadecimal, this patch change the tracepoint format from decimal to
hexadecimal.  This would improve readability.  For example, it makes us
easily notice whether current scanner try to compact previously
attempted pageblock or not.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11 17:06:04 -08:00
Kirill A. Shutemov
dc6c9a35b6 mm: account pmd page tables to the process
Dave noticed that unprivileged process can allocate significant amount of
memory -- >500 MiB on x86_64 -- and stay unnoticed by oom-killer and
memory cgroup.  The trick is to allocate a lot of PMD page tables.  Linux
kernel doesn't account PMD tables to the process, only PTE.

The use-cases below use few tricks to allocate a lot of PMD page tables
while keeping VmRSS and VmPTE low.  oom_score for the process will be 0.

	#include <errno.h>
	#include <stdio.h>
	#include <stdlib.h>
	#include <unistd.h>
	#include <sys/mman.h>
	#include <sys/prctl.h>

	#define PUD_SIZE (1UL << 30)
	#define PMD_SIZE (1UL << 21)

	#define NR_PUD 130000

	int main(void)
	{
		char *addr = NULL;
		unsigned long i;

		prctl(PR_SET_THP_DISABLE);
		for (i = 0; i < NR_PUD ; i++) {
			addr = mmap(addr + PUD_SIZE, PUD_SIZE, PROT_WRITE|PROT_READ,
					MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
			if (addr == MAP_FAILED) {
				perror("mmap");
				break;
			}
			*addr = 'x';
			munmap(addr, PMD_SIZE);
			mmap(addr, PMD_SIZE, PROT_WRITE|PROT_READ,
					MAP_ANONYMOUS|MAP_PRIVATE|MAP_FIXED, -1, 0);
			if (addr == MAP_FAILED)
				perror("re-mmap"), exit(1);
		}
		printf("PID %d consumed %lu KiB in PMD page tables\n",
				getpid(), i * 4096 >> 10);
		return pause();
	}

The patch addresses the issue by account PMD tables to the process the
same way we account PTE.

The main place where PMD tables is accounted is __pmd_alloc() and
free_pmd_range(). But there're few corner cases:

 - HugeTLB can share PMD page tables. The patch handles by accounting
   the table to all processes who share it.

 - x86 PAE pre-allocates few PMD tables on fork.

 - Architectures with FIRST_USER_ADDRESS > 0. We need to adjust sanity
   check on exit(2).

Accounting only happens on configuration where PMD page table's level is
present (PMD is not folded).  As with nr_ptes we use per-mm counter.  The
counter value is used to calculate baseline for badness score by
oom-killer.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: David Rientjes <rientjes@google.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11 17:06:04 -08:00
Kirill A. Shutemov
4155b8e0a7 mm, asm-generic: define PUD_SHIFT in <asm-generic/4level-fixup.h>
If an architecure uses <asm-generic/4level-fixup.h>, build fails if we
try to use PUD_SHIFT in generic code:

   In file included from arch/microblaze/include/asm/bug.h:1:0,
                    from include/linux/bug.h:4,
                    from include/linux/thread_info.h:11,
                    from include/asm-generic/preempt.h:4,
                    from arch/microblaze/include/generated/asm/preempt.h:1,
                    from include/linux/preempt.h:18,
                    from include/linux/spinlock.h:50,
                    from include/linux/mmzone.h:7,
                    from include/linux/gfp.h:5,
                    from include/linux/slab.h:14,
                    from mm/mmap.c:12:
   mm/mmap.c: In function 'exit_mmap':
>> mm/mmap.c:2858:46: error: 'PUD_SHIFT' undeclared (first use in this function)
       round_up(FIRST_USER_ADDRESS, PUD_SIZE) >> PUD_SHIFT);
                                                 ^
   include/asm-generic/bug.h:86:25: note: in definition of macro 'WARN_ON'
     int __ret_warn_on = !!(condition);    \
                            ^
   mm/mmap.c:2858:46: note: each undeclared identifier is reported only once for each function it appears in
       round_up(FIRST_USER_ADDRESS, PUD_SIZE) >> PUD_SHIFT);
                                                 ^
   include/asm-generic/bug.h:86:25: note: in definition of macro 'WARN_ON'
     int __ret_warn_on = !!(condition);    \
                            ^
As with <asm-generic/pgtable-nopud.h>, let's define PUD_SHIFT to
PGDIR_SHIFT.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11 17:06:03 -08:00
Michal Hocko
c32b3cbe0d oom, PM: make OOM detection in the freezer path raceless
Commit 5695be142e20 ("OOM, PM: OOM killed task shouldn't escape PM
suspend") has left a race window when OOM killer manages to
note_oom_kill after freeze_processes checks the counter.  The race
window is quite small and really unlikely and partial solution deemed
sufficient at the time of submission.

Tejun wasn't happy about this partial solution though and insisted on a
full solution.  That requires the full OOM and freezer's task freezing
exclusion, though.  This is done by this patch which introduces oom_sem
RW lock and turns oom_killer_disable() into a full OOM barrier.

oom_killer_disabled check is moved from the allocation path to the OOM
level and we take oom_sem for reading for both the check and the whole
OOM invocation.

oom_killer_disable() takes oom_sem for writing so it waits for all
currently running OOM killer invocations.  Then it disable all the further
OOMs by setting oom_killer_disabled and checks for any oom victims.
Victims are counted via mark_tsk_oom_victim resp.  unmark_oom_victim.  The
last victim wakes up all waiters enqueued by oom_killer_disable().
Therefore this function acts as the full OOM barrier.

The page fault path is covered now as well although it was assumed to be
safe before.  As per Tejun, "We used to have freezing points deep in file
system code which may be reacheable from page fault." so it would be
better and more robust to not rely on freezing points here.  Same applies
to the memcg OOM killer.

out_of_memory tells the caller whether the OOM was allowed to trigger and
the callers are supposed to handle the situation.  The page allocation
path simply fails the allocation same as before.  The page fault path will
retry the fault (more on that later) and Sysrq OOM trigger will simply
complain to the log.

Normally there wouldn't be any unfrozen user tasks after
try_to_freeze_tasks so the function will not block. But if there was an
OOM killer racing with try_to_freeze_tasks and the OOM victim didn't
finish yet then we have to wait for it. This should complete in a finite
time, though, because

	- the victim cannot loop in the page fault handler (it would die
	  on the way out from the exception)
	- it cannot loop in the page allocator because all the further
	  allocation would fail and __GFP_NOFAIL allocations are not
	  acceptable at this stage
	- it shouldn't be blocked on any locks held by frozen tasks
	  (try_to_freeze expects lockless context) and kernel threads and
	  work queues are not frozen yet

Signed-off-by: Michal Hocko <mhocko@suse.cz>
Suggested-by: Tejun Heo <tj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11 17:06:03 -08:00
Michal Hocko
49550b6055 oom: add helpers for setting and clearing TIF_MEMDIE
This patchset addresses a race which was described in the changelog for
5695be142e20 ("OOM, PM: OOM killed task shouldn't escape PM suspend"):

: PM freezer relies on having all tasks frozen by the time devices are
: getting frozen so that no task will touch them while they are getting
: frozen.  But OOM killer is allowed to kill an already frozen task in order
: to handle OOM situtation.  In order to protect from late wake ups OOM
: killer is disabled after all tasks are frozen.  This, however, still keeps
: a window open when a killed task didn't manage to die by the time
: freeze_processes finishes.

The original patch hasn't closed the race window completely because that
would require a more complex solution as it can be seen by this patchset.

The primary motivation was to close the race condition between OOM killer
and PM freezer _completely_.  As Tejun pointed out, even though the race
condition is unlikely the harder it would be to debug weird bugs deep in
the PM freezer when the debugging options are reduced considerably.  I can
only speculate what might happen when a task is still runnable
unexpectedly.

On a plus side and as a side effect the oom enable/disable has a better
(full barrier) semantic without polluting hot paths.

I have tested the series in KVM with 100M RAM:
- many small tasks (20M anon mmap) which are triggering OOM continually
- s2ram which resumes automatically is triggered in a loop
	echo processors > /sys/power/pm_test
	while true
	do
		echo mem > /sys/power/state
		sleep 1s
	done
- simple module which allocates and frees 20M in 8K chunks. If it sees
  freezing(current) then it tries another round of allocation before calling
  try_to_freeze
- debugging messages of PM stages and OOM killer enable/disable/fail added
  and unmark_oom_victim is delayed by 1s after it clears TIF_MEMDIE and before
  it wakes up waiters.
- rebased on top of the current mmotm which means some necessary updates
  in mm/oom_kill.c. mark_tsk_oom_victim is now called under task_lock but
  I think this should be OK because __thaw_task shouldn't interfere with any
  locking down wake_up_process. Oleg?

As expected there are no OOM killed tasks after oom is disabled and
allocations requested by the kernel thread are failing after all the tasks
are frozen and OOM disabled.  I wasn't able to catch a race where
oom_killer_disable would really have to wait but I kinda expected the race
is really unlikely.

[  242.609330] Killed process 2992 (mem_eater) total-vm:24412kB, anon-rss:2164kB, file-rss:4kB
[  243.628071] Unmarking 2992 OOM victim. oom_victims: 1
[  243.636072] (elapsed 2.837 seconds) done.
[  243.641985] Trying to disable OOM killer
[  243.643032] Waiting for concurent OOM victims
[  243.644342] OOM killer disabled
[  243.645447] Freezing remaining freezable tasks ... (elapsed 0.005 seconds) done.
[  243.652983] Suspending console(s) (use no_console_suspend to debug)
[  243.903299] kmem_eater: page allocation failure: order:1, mode:0x204010
[...]
[  243.992600] PM: suspend of devices complete after 336.667 msecs
[  243.993264] PM: late suspend of devices complete after 0.660 msecs
[  243.994713] PM: noirq suspend of devices complete after 1.446 msecs
[  243.994717] ACPI: Preparing to enter system sleep state S3
[  243.994795] PM: Saving platform NVS memory
[  243.994796] Disabling non-boot CPUs ...

The first 2 patches are simple cleanups for OOM.  They should go in
regardless the rest IMO.

Patches 3 and 4 are trivial printk -> pr_info conversion and they should
go in ditto.

The main patch is the last one and I would appreciate acks from Tejun and
Rafael.  I think the OOM part should be OK (except for __thaw_task vs.
task_lock where a look from Oleg would appreciated) but I am not so sure I
haven't screwed anything in the freezer code.  I have found several
surprises there.

This patch (of 5):

This patch is just a preparatory and it doesn't introduce any functional
change.

Note:
I am utterly unhappy about lowmemory killer abusing TIF_MEMDIE just to
wait for the oom victim and to prevent from new killing. This is
just a side effect of the flag. The primary meaning is to give the oom
victim access to the memory reserves and that shouldn't be necessary
here.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11 17:06:03 -08:00
Johannes Weiner
241994ed86 mm: memcontrol: default hierarchy interface for memory
Introduce the basic control files to account, partition, and limit
memory using cgroups in default hierarchy mode.

This interface versioning allows us to address fundamental design
issues in the existing memory cgroup interface, further explained
below.  The old interface will be maintained indefinitely, but a
clearer model and improved workload performance should encourage
existing users to switch over to the new one eventually.

The control files are thus:

  - memory.current shows the current consumption of the cgroup and its
    descendants, in bytes.

  - memory.low configures the lower end of the cgroup's expected
    memory consumption range.  The kernel considers memory below that
    boundary to be a reserve - the minimum that the workload needs in
    order to make forward progress - and generally avoids reclaiming
    it, unless there is an imminent risk of entering an OOM situation.

  - memory.high configures the upper end of the cgroup's expected
    memory consumption range.  A cgroup whose consumption grows beyond
    this threshold is forced into direct reclaim, to work off the
    excess and to throttle new allocations heavily, but is generally
    allowed to continue and the OOM killer is not invoked.

  - memory.max configures the hard maximum amount of memory that the
    cgroup is allowed to consume before the OOM killer is invoked.

  - memory.events shows event counters that indicate how often the
    cgroup was reclaimed while below memory.low, how often it was
    forced to reclaim excess beyond memory.high, how often it hit
    memory.max, and how often it entered OOM due to memory.max.  This
    allows users to identify configuration problems when observing a
    degradation in workload performance.  An overcommitted system will
    have an increased rate of low boundary breaches, whereas increased
    rates of high limit breaches, maximum hits, or even OOM situations
    will indicate internally overcommitted cgroups.

For existing users of memory cgroups, the following deviations from
the current interface are worth pointing out and explaining:

  - The original lower boundary, the soft limit, is defined as a limit
    that is per default unset.  As a result, the set of cgroups that
    global reclaim prefers is opt-in, rather than opt-out.  The costs
    for optimizing these mostly negative lookups are so high that the
    implementation, despite its enormous size, does not even provide
    the basic desirable behavior.  First off, the soft limit has no
    hierarchical meaning.  All configured groups are organized in a
    global rbtree and treated like equal peers, regardless where they
    are located in the hierarchy.  This makes subtree delegation
    impossible.  Second, the soft limit reclaim pass is so aggressive
    that it not just introduces high allocation latencies into the
    system, but also impacts system performance due to overreclaim, to
    the point where the feature becomes self-defeating.

    The memory.low boundary on the other hand is a top-down allocated
    reserve.  A cgroup enjoys reclaim protection when it and all its
    ancestors are below their low boundaries, which makes delegation
    of subtrees possible.  Secondly, new cgroups have no reserve per
    default and in the common case most cgroups are eligible for the
    preferred reclaim pass.  This allows the new low boundary to be
    efficiently implemented with just a minor addition to the generic
    reclaim code, without the need for out-of-band data structures and
    reclaim passes.  Because the generic reclaim code considers all
    cgroups except for the ones running low in the preferred first
    reclaim pass, overreclaim of individual groups is eliminated as
    well, resulting in much better overall workload performance.

  - The original high boundary, the hard limit, is defined as a strict
    limit that can not budge, even if the OOM killer has to be called.
    But this generally goes against the goal of making the most out of
    the available memory.  The memory consumption of workloads varies
    during runtime, and that requires users to overcommit.  But doing
    that with a strict upper limit requires either a fairly accurate
    prediction of the working set size or adding slack to the limit.
    Since working set size estimation is hard and error prone, and
    getting it wrong results in OOM kills, most users tend to err on
    the side of a looser limit and end up wasting precious resources.

    The memory.high boundary on the other hand can be set much more
    conservatively.  When hit, it throttles allocations by forcing
    them into direct reclaim to work off the excess, but it never
    invokes the OOM killer.  As a result, a high boundary that is
    chosen too aggressively will not terminate the processes, but
    instead it will lead to gradual performance degradation.  The user
    can monitor this and make corrections until the minimal memory
    footprint that still gives acceptable performance is found.

    In extreme cases, with many concurrent allocations and a complete
    breakdown of reclaim progress within the group, the high boundary
    can be exceeded.  But even then it's mostly better to satisfy the
    allocation from the slack available in other groups or the rest of
    the system than killing the group.  Otherwise, memory.max is there
    to limit this type of spillover and ultimately contain buggy or
    even malicious applications.

  - The original control file names are unwieldy and inconsistent in
    many different ways.  For example, the upper boundary hit count is
    exported in the memory.failcnt file, but an OOM event count has to
    be manually counted by listening to memory.oom_control events, and
    lower boundary / soft limit events have to be counted by first
    setting a threshold for that value and then counting those events.
    Also, usage and limit files encode their units in the filename.
    That makes the filenames very long, even though this is not
    information that a user needs to be reminded of every time they
    type out those names.

    To address these naming issues, as well as to signal clearly that
    the new interface carries a new configuration model, the naming
    conventions in it necessarily differ from the old interface.

  - The original limit files indicate the state of an unset limit with
    a very high number, and a configured limit can be unset by echoing
    -1 into those files.  But that very high number is implementation
    and architecture dependent and not very descriptive.  And while -1
    can be understood as an underflow into the highest possible value,
    -2 or -10M etc. do not work, so it's not inconsistent.

    memory.low, memory.high, and memory.max will use the string
    "infinity" to indicate and set the highest possible value.

[akpm@linux-foundation.org: use seq_puts() for basic strings]
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Cc: Greg Thelen <gthelen@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11 17:06:02 -08:00
Johannes Weiner
650c5e5654 mm: page_counter: pull "-1" handling out of page_counter_memparse()
The unified hierarchy interface for memory cgroups will no longer use "-1"
to mean maximum possible resource value.  In preparation for this, make
the string an argument and let the caller supply it.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Cc: Greg Thelen <gthelen@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11 17:06:02 -08:00
Vladimir Davydov
90cbc25088 vmscan: force scan offline memory cgroups
Since commit b2052564e66d ("mm: memcontrol: continue cache reclaim from
offlined groups") pages charged to a memory cgroup are not reparented when
the cgroup is removed.  Instead, they are supposed to be reclaimed in a
regular way, along with pages accounted to online memory cgroups.

However, an lruvec of an offline memory cgroup will sooner or later get so
small that it will be scanned only at low scan priorities (see
get_scan_count()).  Therefore, if there are enough reclaimable pages in
big lruvecs, pages accounted to offline memory cgroups will never be
scanned at all, wasting memory.

Fix this by unconditionally forcing scanning dead lruvecs from kswapd.

[akpm@linux-foundation.org: fix build]
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11 17:06:02 -08:00
Vlastimil Babka
05891fb065 mm: microoptimize zonelist operations
next_zones_zonelist() returns a zoneref pointer, as well as a zone pointer
via extra parameter.  Since the latter can be trivially obtained by
dereferencing the former, the overhead of the extra parameter is
unjustified.

This patch thus removes the zone parameter from next_zones_zonelist().
Both callers happen to be in the same header file, so it's simple to add
the zoneref dereference inline.  We save some bytes of code size.

add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-105 (-105)
function                                     old     new   delta
nr_free_zone_pages                           129     115     -14
__alloc_pages_nodemask                      2300    2285     -15
get_page_from_freelist                      2652    2576     -76

add/remove: 0/0 grow/shrink: 1/0 up/down: 10/0 (10)
function                                     old     new   delta
try_to_compact_pages                         569     579     +10

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11 17:06:02 -08:00
Vlastimil Babka
1a6d53a105 mm: reduce try_to_compact_pages parameters
Expand the usage of the struct alloc_context introduced in the previous
patch also for calling try_to_compact_pages(), to reduce the number of its
parameters.  Since the function is in different compilation unit, we need
to move alloc_context definition in the shared mm/internal.h header.

With this change we get simpler code and small savings of code size and stack
usage:

add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-27 (-27)
function                                     old     new   delta
__alloc_pages_direct_compact                 283     256     -27
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-13 (-13)
function                                     old     new   delta
try_to_compact_pages                         582     569     -13

Stack usage of __alloc_pages_direct_compact goes from 24 to none (per
scripts/checkstack.pl).

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11 17:06:02 -08:00
Naoya Horiguchi
e66f17ff71 mm/hugetlb: take page table lock in follow_huge_pmd()
We have a race condition between move_pages() and freeing hugepages, where
move_pages() calls follow_page(FOLL_GET) for hugepages internally and
tries to get its refcount without preventing concurrent freeing.  This
race crashes the kernel, so this patch fixes it by moving FOLL_GET code
for hugepages into follow_huge_pmd() with taking the page table lock.

This patch intentionally removes page==NULL check after pte_page.
This is justified because pte_page() never returns NULL for any
architectures or configurations.

This patch changes the behavior of follow_huge_pmd() for tail pages and
then tail pages can be pinned/returned.  So the caller must be changed to
properly handle the returned tail pages.

We could have a choice to add the similar locking to
follow_huge_(addr|pud) for consistency, but it's not necessary because
currently these functions don't support FOLL_GET flag, so let's leave it
for future development.

Here is the reproducer:

  $ cat movepages.c
  #include <stdio.h>
  #include <stdlib.h>
  #include <numaif.h>

  #define ADDR_INPUT      0x700000000000UL
  #define HPS             0x200000
  #define PS              0x1000

  int main(int argc, char *argv[]) {
          int i;
          int nr_hp = strtol(argv[1], NULL, 0);
          int nr_p  = nr_hp * HPS / PS;
          int ret;
          void **addrs;
          int *status;
          int *nodes;
          pid_t pid;

          pid = strtol(argv[2], NULL, 0);
          addrs  = malloc(sizeof(char *) * nr_p + 1);
          status = malloc(sizeof(char *) * nr_p + 1);
          nodes  = malloc(sizeof(char *) * nr_p + 1);

          while (1) {
                  for (i = 0; i < nr_p; i++) {
                          addrs[i] = (void *)ADDR_INPUT + i * PS;
                          nodes[i] = 1;
                          status[i] = 0;
                  }
                  ret = numa_move_pages(pid, nr_p, addrs, nodes, status,
                                        MPOL_MF_MOVE_ALL);
                  if (ret == -1)
                          err("move_pages");

                  for (i = 0; i < nr_p; i++) {
                          addrs[i] = (void *)ADDR_INPUT + i * PS;
                          nodes[i] = 0;
                          status[i] = 0;
                  }
                  ret = numa_move_pages(pid, nr_p, addrs, nodes, status,
                                        MPOL_MF_MOVE_ALL);
                  if (ret == -1)
                          err("move_pages");
          }
          return 0;
  }

  $ cat hugepage.c
  #include <stdio.h>
  #include <sys/mman.h>
  #include <string.h>

  #define ADDR_INPUT      0x700000000000UL
  #define HPS             0x200000

  int main(int argc, char *argv[]) {
          int nr_hp = strtol(argv[1], NULL, 0);
          char *p;

          while (1) {
                  p = mmap((void *)ADDR_INPUT, nr_hp * HPS, PROT_READ | PROT_WRITE,
                           MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0);
                  if (p != (void *)ADDR_INPUT) {
                          perror("mmap");
                          break;
                  }
                  memset(p, 0, nr_hp * HPS);
                  munmap(p, nr_hp * HPS);
          }
  }

  $ sysctl vm.nr_hugepages=40
  $ ./hugepage 10 &
  $ ./movepages 10 $(pgrep -f hugepage)

Fixes: e632a938d914 ("mm: migrate: add hugepage migration code to move_pages()")
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reported-by: Hugh Dickins <hughd@google.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Steve Capper <steve.capper@linaro.org>
Cc: <stable@vger.kernel.org>	[3.12+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11 17:06:01 -08:00
Baoquan He
44628d9755 mm: fix typo of MIGRATE_RESERVE in comment
Found it when I want to jump to the definition of MIGRATE_RESERVE ctags.

Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11 17:06:01 -08:00
Johannes Weiner
6de226191d mm: memcontrol: track move_lock state internally
The complexity of memcg page stat synchronization is currently leaking
into the callsites, forcing them to keep track of the move_lock state and
the IRQ flags.  Simplify the API by tracking it in the memcg.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11 17:06:00 -08:00
Vladimir Davydov
93aa7d9524 swap: remove unused mem_cgroup_uncharge_swapcache declaration
The body of this function was removed by commit 0a31bc97c80c ("mm:
memcontrol: rewrite uncharge API").

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11 17:06:00 -08:00
Wang, Yalin
56873f43ab mm:add KPF_ZERO_PAGE flag for /proc/kpageflags
Add KPF_ZERO_PAGE flag for zero_page, so that userspace processes can
detect zero_page in /proc/kpageflags, and then do memory analysis more
accurately.

Signed-off-by: Yalin Wang <yalin.wang@sonymobile.com>
Acked-by: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11 17:06:00 -08:00
Wang, Yalin
1d148e218a mm: add VM_BUG_ON_PAGE() to page_mapcount()
Add VM_BUG_ON_PAGE() for slab pages.  _mapcount is an union with slab
struct in struct page, so we must avoid accessing _mapcount if this page
is a slab page.  Also remove the unneeded bracket.

Signed-off-by: Yalin Wang <yalin.wang@sonymobile.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11 17:06:00 -08:00
Kirill A. Shutemov
e4b294c2d8 mm: add fields for compound destructor and order into struct page
Currently, we use lru.next/lru.prev plus cast to access or set
destructor and order of compound page.

Let's replace it with explicit fields in struct page.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11 17:06:00 -08:00
Jaegeuk Kim
29e7043f40 f2fs: fix sparse warnings
This patch resolves the following warnings.

include/trace/events/f2fs.h:150:1: warning: expression using sizeof bool
include/trace/events/f2fs.h:180:1: warning: expression using sizeof bool
include/trace/events/f2fs.h:990:1: warning: expression using sizeof bool
include/trace/events/f2fs.h:990:1: warning: expression using sizeof bool
include/trace/events/f2fs.h:150:1: warning: odd constant _Bool cast (ffffffffffffffff becomes 1)
include/trace/events/f2fs.h:180:1: warning: odd constant _Bool cast (ffffffffffffffff becomes 1)
include/trace/events/f2fs.h:990:1: warning: odd constant _Bool cast (ffffffffffffffff becomes 1)
include/trace/events/f2fs.h:990:1: warning: odd constant _Bool cast (ffffffffffffffff becomes 1)

fs/f2fs/checkpoint.c:27:19: warning: symbol 'inode_entry_slab' was not declared. Should it be static?
fs/f2fs/checkpoint.c:577:15: warning: cast to restricted __le32
fs/f2fs/checkpoint.c:592:15: warning: cast to restricted __le32

fs/f2fs/trace.c:19:1: warning: symbol 'pids' was not declared. Should it be static?
fs/f2fs/trace.c:21:21: warning: symbol 'last_io' was not declared. Should it be static?

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:49 -08:00
Jaegeuk Kim
f7ef9b83b5 f2fs: introduce macros to convert bytes and blocks in f2fs
This patch adds two macros for transition between byte and block offsets.
Currently, f2fs only supports 4KB blocks, so use the default size for now.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:48 -08:00
Jaegeuk Kim
119ee91445 f2fs: split UMOUNT and FASTBOOT flags
This patch adds FASTBOOT flag into checkpoint as follows.

 - CP_UMOUNT_FLAG is set when system is umounted.
 - CP_FASTBOOT_FLAG is set when intermediate checkpoint having node summaries
   was done.

So, if you get CP_UMOUNT_FLAG from checkpoint, the system was umounted cleanly.
Instead, if there was sudden-power-off, you can get CP_FASTBOOT_FLAG or nothing.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:41 -08:00
Chao Yu
caf0047e7e f2fs: merge flags in struct f2fs_sb_info
Currently, there are several variables with Boolean type as below:

struct f2fs_sb_info {
...
	int s_dirty;
	bool need_fsck;
	bool s_closing;
...
	bool por_doing;
...
}

For this there are some issues:
1. there are some space of f2fs_sb_info is wasted due to aligning after Boolean
   type variables by compiler.
2. if we continuously add new flag into f2fs_sb_info, structure will be messed
   up.

So in this patch, we try to:
1. switch s_dirty to Boolean type variable since it has two status 0/1.
2. merge s_dirty/need_fsck/s_closing/por_doing variables into s_flag.
3. introduce an enum type which can indicate different states of sbi.
4. use new introduced universal interfaces is_sbi_flag_set/{set,clear}_sbi_flag
   to operate flags for sbi.

After that, above issues will be fixed.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-02-11 17:04:38 -08:00
Tom Herbert
fe881ef11c gue: Use checksum partial with remote checksum offload
Change remote checksum handling to set checksum partial as default
behavior. Added an iflink parameter to configure not using
checksum partial (calling csum_partial to update checksum).

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-11 15:12:13 -08:00
Tom Herbert
0ace2ca89c vxlan: Use checksum partial with remote checksum offload
Change remote checksum handling to set checksum partial as default
behavior. Added an iflink parameter to configure not using
checksum partial (calling csum_partial to update checksum).

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-11 15:12:13 -08:00
Tom Herbert
15e2396d4e net: Infrastructure for CHECKSUM_PARTIAL with remote checsum offload
This patch adds infrastructure so that remote checksum offload can
set CHECKSUM_PARTIAL instead of calling csum_partial and writing
the modfied checksum field.

Add skb_remcsum_adjust_partial function to set an skb for using
CHECKSUM_PARTIAL with remote checksum offload.  Changed
skb_remcsum_process and skb_gro_remcsum_process to take a boolean
argument to indicate if checksum partial can be set or the
checksum needs to be modified using the normal algorithm.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-11 15:12:12 -08:00
Tom Herbert
baa32ff428 net: Use more bit fields in napi_gro_cb
This patch moves the free and same_flow fields to be bit fields
(2 and 1 bit sized respectively). This frees up some space for u16's.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-11 15:12:11 -08:00
Tom Herbert
6edec0e61b net: Clarify meaning of CHECKSUM_PARTIAL for receive path
The current meaning of CHECKSUM_PARTIAL for validating checksums
is that _all_ checksums in the packet are considered valid.
However, in the manner that CHECKSUM_PARTIAL is set only the checksum
at csum_start+csum_offset and any preceding checksums may
be considered valid. If there are checksums in the packet after
csum_offset it is possible they have not been verfied.

This patch changes CHECKSUM_PARTIAL logic in skb_csum_unnecessary and
__skb_gro_checksum_validate_needed to only considered checksums
referring to csum_start and any preceding checksums (with starting
offset before csum_start) to be verified.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-11 15:12:10 -08:00
Tom Herbert
26c4f7da3e net: Fix remcsum in GRO path to not change packet
Remote checksum offload processing is currently the same for both
the GRO and non-GRO path. When the remote checksum offload option
is encountered, the checksum field referred to is modified in
the packet. So in the GRO case, the packet is modified in the
GRO path and then the operation is skipped when the packet goes
through the normal path based on skb->remcsum_offload. There is
a problem in that the packet may be modified in the GRO path, but
then forwarded off host still containing the remote checksum option.
A remote host will again perform RCO but now the checksum verification
will fail since GRO RCO already modified the checksum.

To fix this, we ensure that GRO restores a packet to it's original
state before returning. In this model, when GRO processes a remote
checksum option it still changes the checksum per the algorithm
but on return from lower layer processing the checksum is restored
to its original value.

In this patch we add define gro_remcsum structure which is passed
to skb_gro_remcsum_process to save offset and delta for the checksum
being changed. After lower layer processing, skb_gro_remcsum_cleanup
is called to restore the checksum before returning from GRO.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-11 15:12:09 -08:00
Joe Perches
673e2baaa6 treewide: Remove unnecessary SSB_DEVTABLE_END macro
Use the normal {} instead of a macro to terminate an array.

Remove the macro too.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-11 14:38:29 -08:00
Joe Perches
f7219b527b treewide: Remove unnecessary BCMA_CORETABLE_END macro
Use the normal {} instead of a macro to terminate an array.

Remove the macro too.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-11 14:38:28 -08:00
Paul Moore
04f81f0154 cipso: don't use IPCB() to locate the CIPSO IP option
Using the IPCB() macro to get the IPv4 options is convenient, but
unfortunately NetLabel often needs to examine the CIPSO option outside
of the scope of the IP layer in the stack.  While historically IPCB()
worked above the IP layer, due to the inclusion of the inet_skb_param
struct at the head of the {tcp,udp}_skb_cb structs, recent commit
971f10ec ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
reordered the tcp_skb_cb struct and invalidated this IPCB() trick.

This patch fixes the problem by creating a new function,
cipso_v4_optptr(), which locates the CIPSO option inside the IP header
without calling IPCB().  Unfortunately, this isn't as fast as a simple
lookup so some additional tweaks were made to limit the use of this
new function.

Cc: <stable@vger.kernel.org> # 3.18
Reported-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Tested-by: Casey Schaufler <casey@schaufler-ca.com>
2015-02-11 14:46:37 -05:00
Linus Torvalds
ce01e871a1 This is the bulk of pin control changes for the v3.20 cycle:
- Framework changes and enhancements:
   - Passing -DDEBUG recursively to subdir drivers so we get
     debug messages properly turned on.
   - Infer map type from DT property in the groups parsing code
     in the generic pinconfig code.
   - Support for custom parameter passing in generic pin config.
     This is used when you are using the generic pin config, but
     want to add a few custom properties that no other driver
     will use.
 
 - New drivers:
   - Driver for the Xilinx Zynq
   - Driver for the AmLogic Meson SoCs
 
 - New features in drivers:
   - Sleep support (suspend/resume) for the Cherryview driver
   - mvebeu a38x can now mux a UART on pins MPP19 and MPP20
   - Migrated the qualcomm driver to generic pin config handling
     of extended config options in the core code.
   - Support BUS1 and AUDIO in the Exynos pin controller.
   - Add some missing functions in the sun6i driver.
   - Add support for the A31S variant in the sun6i driver.
   - EMEv2 support in the Renesas PFC driver.
   - Ass support for Qualcomm MSM8916 in the qcom driver.
 
 - Deleted features
   - Drop support for the SiRF Marco that was never released to
     the market.
   - Drop SH7372 support as the support for this platform is
     removed from the kernel.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJU2w2GAAoJEEEQszewGV1z7kYQAKw4Y4SY9Mq9O97GBq0JWvzv
 uLK4P8NvegHkgX0IDc/etAtHzBN6L+4axh7rDAsaDhug+42CbZxVZjXCfLGFClZP
 kJ/gz4II28AGWiP7TZPNHspIJYgKUdWcVfg0cTZwpM22/AEdBAo9HQ2a/FltvrCn
 eYwzFlAOKUUygmDdbfXHk5Z+ndrJw0ahLjXn8zjBe1HkD2QVaigM9ecA2aQHiG9a
 QvABUJ2qVVs9rqTIxoVzSIGTLeLzrv8cezDLQhZ4KaEasAkxtWKM4kYQSMx/PoTB
 mg+FZ5B8IXqlksnSljT+wOcSP1nmtRdjnED/MpsSLbo9RfJgHkA4Lu4Q8iqt7rZL
 +k/kKi3+p9pTE2pIi56nSpHnfgF8JHgdRAYIXBea5Ug0YnBp83/5jLrU7Fmcjr7s
 l0PH0Fk0iRFRdfn6crcs+SLrhQKtuP+Douwg+3ujVOQiKIW6m+b161GwEVYkvWlq
 1JRWPSjncpsmyg5O8dEwZDwgtzPU65UMEsLgRk9wMNJYw0TqGPugEy4+2rBdJWLy
 CYzJo2At9OcHbB2rT8UKwtErQF85IcWmfnMyfo2PANTLGaj5EFruhmSc2J0m7kOe
 ExPXtWOWxGCtWG53ZDeJUYBg9ySMyleY10LYBP9fPQnotvNB3vfyAkBwV74fnBXs
 ijxO6/Uamd4Rrs5LDRBL
 =Nbzh
 -----END PGP SIGNATURE-----

Merge tag 'pinctrl-v3.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

Pull pincontrol updates from Linus Walleij:
 :This is the bulk of pin control changes for the v3.20 cycle:

  Framework changes and enhancements:
   - Passing -DDEBUG recursively to subdir drivers so we get debug
     messages properly turned on.
   - Infer map type from DT property in the groups parsing code in the
     generic pinconfig code.
   - Support for custom parameter passing in generic pin config.  This
     is used when you are using the generic pin config, but want to add
     a few custom properties that no other driver will use.

  New drivers:
   - Driver for the Xilinx Zynq
   - Driver for the AmLogic Meson SoCs

  New features in drivers:
   - Sleep support (suspend/resume) for the Cherryview driver
   - mvebeu a38x can now mux a UART on pins MPP19 and MPP20
   - Migrated the qualcomm driver to generic pin config handling of
     extended config options in the core code.
   - Support BUS1 and AUDIO in the Exynos pin controller.
   - Add some missing functions in the sun6i driver.
   - Add support for the A31S variant in the sun6i driver.
   - EMEv2 support in the Renesas PFC driver.
   - Add support for Qualcomm MSM8916 in the qcom driver.

  Deleted features
   - Drop support for the SiRF Marco that was never released to the
     market.
   - Drop SH7372 support as the support for this platform is removed
     from the kernel"

* tag 'pinctrl-v3.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (40 commits)
  sh-pfc: emev2 - Fix mangled author name
  pinctrl: cherryview: Configure HiZ pins to be input when requested as GPIOs
  pinctrl: imx25: fix numbering for pins
  pinctrl: pinctrl-imx: don't use invalid value of conf_reg
  pinctrl: qcom: delete pin_config_get/set pinconf operations
  pinctrl: qcom: Add msm8916 pinctrl driver
  DT: pinctrl: Document Qualcomm MSM8916 pinctrl binding
  pinctrl: qcom: increase variable size for register offsets
  pinctrl: hide PCONFDUMP in #ifdef
  pinctrl: rockchip: Only mask interrupts; never disable
  pinctrl: zynq: Fix usb0 pins
  pinctrl: sh-pfc: sh7372: Remove DT binding documentation
  pinctrl: sh-pfc: sh7372: Remove PFC support
  sh-pfc: Add emev2 pinmux support
  sh-pfc: add macro to define pinmux without function
  pinctrl: add driver for Amlogic Meson SoCs
  staging: drivers: pinctrl: Fixed checkpatch.pl warnings
  pinctrl: exynos: Add AUDIO pin controller for exynos7
  sh-pfc: r8a7790: add MLB+ pin group
  sh-pfc: r8a7791: add MLB+ pin group
  ...
2015-02-11 11:23:13 -08:00
Linus Torvalds
a1df7efeda This is the GPIO bulk changes for the v3.20 series:
- GPIOLIB core changes:
   - Create and use of_mm_gpiochip_remove() for removing
     memory-mapped OF GPIO chips
   - GPIO MMIO library suppports bgpio_set_multiple for
     switching several lines at once, a feature merged in
     the last cycle.
 - New drivers:
   - New driver for the APM X-gene standby GPIO controller
   - New driver for the Fujitsu MB86S7x GPIO controller
 - Cleanups:
   - Moved rcar driver to use gpiolib irqchip
   - Moxart converted to the GPIO MMIO library
   - GE driver converted to GPIO MMIO library
   - Move sx150x to irqdomain
   - Move max732x to irqdomain
   - Move vx855 to use managed resources
   - Move dwapb to use managed resources
   - Clean tc3589x from platform data
   - Clean stmpe driver to use device tree only probe
 - New subtypes:
   - sx1506 support in the sx150x driver
   - Quark 1000 SoC support in the SCH driver
   - Support X86 in the Xilinx driver
   - Support PXA1928 in the PXA driver
 - Extended drivers:
   - max732x supports device tree probe
   - sx150x supports device tree probe
 - Various minor cleanups and bug fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJU2W1HAAoJEEEQszewGV1zuRkQAKBukQwx46zQuFH8oalSuxmy
 H2/ESMnDmRlBI0zX+dAGDg4HlkO9VwIrdcyzMMe7uO8EFUu9d4/mu2E1f8cY2mxu
 kwUSntVaYGVPVj/Vok3kzq1wW/pUQ9E2iMbNeDVZJH/cqEvylPPa2LZ3hZVri57J
 orJzaYtZWf640y3McGTUDwIQokgxxdMMyWfm26P1iZkByjofUaYRHS1NIxhKVSVC
 Y6uA/Ivvh56ezlPQykc7m6YEjoUS91AMllJca1A3KF3+qvQ3Hnc+i9mB7hAQbJ8L
 6Iv2qCD4TgtWT5YXgP0eZsfSV/19kvlVlGX7QQT6o7uIFOLVNOmX9FJAc4n3SGNI
 HbxsDdlF9XHVFh0XyKcBQfg0uUmRWUDQ/LsXn9KL6Yrpyx3QZGH/PTYl/XoCj1IO
 IL6Bw51FCBXvCvLP5alXMouosAxrc19YpljcAuAMmjVTXe6RoULOZJs+lhTHMhvj
 S6FWr6l0XDr4yKb0SXTOGI4hvRJvL8SWnIt5ezZrNrTKLsklL3Bi/o3BoKzzxmqS
 6l/rxvcVUov455IrfqJ0ZEM6ZxC9/cvGec0RFyglopNSlKeTQsgWRNa6Pw2xZVk8
 orT+G3L4jZ1+0hcjcSwfvHng5Nv6DuhIfUNPbLF/ZzmJcBPqGc/DuhGkFqOrBaXg
 ey7Lua6+l+8zcDtugRRG
 =yWrB
 -----END PGP SIGNATURE-----

Merge tag 'gpio-v3.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio

Pull GPIO changes from Linus Walleij:
 "This is the GPIO bulk changes for the v3.20 series:

  GPIOLIB core changes:
   - Create and use of_mm_gpiochip_remove() for removing memory-mapped
     OF GPIO chips
   - GPIO MMIO library suppports bgpio_set_multiple for switching
     several lines at once, a feature merged in the last cycle.

  New drivers:
   - New driver for the APM X-gene standby GPIO controller
   - New driver for the Fujitsu MB86S7x GPIO controller

  Cleanups:
   - Moved rcar driver to use gpiolib irqchip
   - Moxart converted to the GPIO MMIO library
   - GE driver converted to GPIO MMIO library
   - Move sx150x to irqdomain
   - Move max732x to irqdomain
   - Move vx855 to use managed resources
   - Move dwapb to use managed resources
   - Clean tc3589x from platform data
   - Clean stmpe driver to use device tree only probe

  New subtypes:
   - sx1506 support in the sx150x driver
   - Quark 1000 SoC support in the SCH driver
   - Support X86 in the Xilinx driver
   - Support PXA1928 in the PXA driver

  Extended drivers:
   - max732x supports device tree probe
   - sx150x supports device tree probe

  Various minor cleanups and bug fixes"

* tag 'gpio-v3.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (61 commits)
  gpio: kconfig: replace PPC_OF with PPC
  gpio: pxa: add PXA1928 gpio type support
  dt/bindings: gpio: add compatible string for marvell,pxa1928-gpio
  gpio: pxa: remove mach IRQ includes
  gpio: max732x: use an inline function for container cast
  gpio: use sizeof() instead of hardcoded values
  gpio: max732x: add set_multiple function
  gpio: sch: Consolidate similar algorithms
  gpio: tz1090-pdc: Use resource_size to fix off-by-one resource size calculation
  gpio: ge: Convert to use devm_kstrdup
  gpio: correctly use const char * const
  gpio: sx150x: fixup OF support
  gpio: mpc8xxx: Use of_mm_gpiochip_remove
  gpio: Add Fujitsu MB86S7x GPIO driver
  gpio: mpc8xxx: Convert to platform device interface.
  gpio: zevio: Use of_mm_gpiochip_remove
  gpio: gpio-mm-lantiq: Use of_mm_gpiochip_remove
  gpio: gpio-mm-lantiq: Use of_property_read_u32
  gpio: gpio-mm-lantiq: Do not replicate code
  gpio :gpio-mm-lantiq: Use devm_kzalloc
  ...
2015-02-11 11:17:34 -08:00
Linus Torvalds
aa7ed01f93 MMC core:
- Support for MMC power sequences.
 - SDIO function devicetree subnode parsing.
 - Refactor the hardware reset routines and enable it for SD cards.
 - Various code quality improvements, especially for slot-gpio.
 
 MMC host:
 - dw_mmc: Various fixes and cleanups.
 - dw_mmc: Convert to mmc_send_tuning().
 - moxart: Fix probe logic.
 - sdhci: Various fixes and cleanups
 - sdhci: Asynchronous request handling support.
 - sdhci-pxav3: Various fixes and cleanups.
 - sdhci-tegra: Fixes for T114, T124 and T132.
 - rtsx: Various fixes and cleanups.
 - rtsx: Support for SDIO.
 - sdhi/tmio: Refactor and cleanup of header files.
 - omap_hsmmc: Use slot-gpio and common MMC DT parser.
 - Make all hosts to deal with errors from mmc_of_parse().
 - sunxi: Various fixes and cleanups.
 - sdhci: Support for Fujitsu SDHCI controller f_sdh30.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJU2b9MAAoJEP4mhCVzWIwp678P/2Hjoo17FDnCQT2qXCRWmMmx
 98n7mrkPw20cVm6dlXyVxHFxrgRWan1eATiu1vBdnNmXkeUmThMbuGpATDi40fIT
 C2g9wPDM1/naJ+Qg8mPGy0vEDQYHEzxHHlAyfOaeXdhxhll1iHqhk+Jb6cFQN5DP
 /CvNmuL/7m9uuFhHlGJnqSNMyenLAFFXthIiVJrQeZeYq9NZ1ZZfW7+esHDmu2lP
 EFkrZf+xYFmFWAqccyTR58QZsYKlDv4NS/0UMU941DkO7x7R8ZsQG8xFu9bIN5Wn
 EJfgP7EfEXHlD5a1/QQ918IT1ifxhPGiCbBXpdfAUt7Xte6zYyASpTyAm8v7vT2I
 2hot1T1BZgADALE2EHAP4kzK49ipfhQmlVZgFeYVsTpPKk8Nvczio7Y3LYlzNmBo
 V0jaTUTtU7u7ICtGbo7OqOybW/Sm5E00xsq22txIXObURa7bPbZ4CnxJpstSaU2Z
 nweZaa79HaHZE7xyUNh9kAbxfGC0pOT0oPoPYcTxcpk2vva+atULEYnLEHUULrgs
 D4+m8tnbuwoZoGanlMKqgPXP8Xkau/meEdz4WaYrXQEIafrVIR2/kcXGQjhD8ucO
 VkjUaZDKxNXTkwOzM/siOxJwj75Ka6GDHM7JGx4F30QHqgRTtg2wzInU9nsViuiA
 02698dNk9CdP3JirDtbm
 =ojsj
 -----END PGP SIGNATURE-----

Merge tag 'mmc-v3.20-1' of git://git.linaro.org/people/ulf.hansson/mmc

Pull MMC updates from Ulf Hansson:
 "MMC core:
   - Support for MMC power sequences.
   - SDIO function devicetree subnode parsing.
   - Refactor the hardware reset routines and enable it for SD cards.
   - Various code quality improvements, especially for slot-gpio.

  MMC host:
   - dw_mmc: Various fixes and cleanups.
   - dw_mmc: Convert to mmc_send_tuning().
   - moxart: Fix probe logic.
   - sdhci: Various fixes and cleanups
   - sdhci: Asynchronous request handling support.
   - sdhci-pxav3: Various fixes and cleanups.
   - sdhci-tegra: Fixes for T114, T124 and T132.
   - rtsx: Various fixes and cleanups.
   - rtsx: Support for SDIO.
   - sdhi/tmio: Refactor and cleanup of header files.
   - omap_hsmmc: Use slot-gpio and common MMC DT parser.
   - Make all hosts to deal with errors from mmc_of_parse().
   - sunxi: Various fixes and cleanups.
   - sdhci: Support for Fujitsu SDHCI controller f_sdh30"

* tag 'mmc-v3.20-1' of git://git.linaro.org/people/ulf.hansson/mmc: (117 commits)
  mmc: sdhci-s3c: solve problem with sleeping in atomic context
  mmc: pwrseq: add driver for emmc hardware reset
  mmc: moxart: fix probe logic
  mmc: core: Invoke mmc_pwrseq_post_power_on() prior MMC_POWER_ON state
  mmc: pwrseq_simple: Add optional reference clock support
  mmc: pwrseq: Document optional clock for the simple power sequence
  mmc: pwrseq_simple: Extend to support more pins
  mmc: pwrseq: Document that simple sequence support more than one GPIO
  mmc: Add hardware dependencies for sdhci-pxav3 and sdhci-pxav2
  mmc: sdhci-pxav3: Modify clock settings for the SDR50 and DDR50 modes
  mmc: sdhci-pxav3: Extend binding with SDIO3 conf reg for the Armada 38x
  mmc: sdhci-pxav3: Fix Armada 38x controller's caps according to erratum ERR-7878951
  mmc: sdhci-pxav3: Fix SDR50 and DDR50 capabilities for the Armada 38x flavor
  mmc: sdhci: switch voltage before sdhci_set_ios in runtime resume
  mmc: tegra: Write xfer_mode, CMD regs in together
  mmc: Resolve BKOPS compatability issue
  mmc: sdhci-pxav3: fix setting of pdata->clk_delay_cycles
  mmc: dw_mmc: rockchip: remove incorrect __exit_p()
  mmc: dw_mmc: exynos: remove incorrect __exit_p()
  mmc: Fix menuconfig alignment of MMC_SDHCI_* options
  ...
2015-02-11 10:56:48 -08:00
Linus Torvalds
540a7c5061 SCSI misc on 20150209
This is the usual grab bag of driver updates (hpsa, storvsc, mp2sas,
 megaraid_sas, ses) plus an assortment of minor updates.  There's also an
 update to ufs which adds new phy drivers and finally a new logging
 infrastructure for SCSI.
 
 Signed-off-by: James Bottomley <JBottomley@Parallels.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABAgAGBQJU2Ty5AAoJEDeqqVYsXL0M9rAH/1xNpAxXuxQq+dW5Z+uOaX60
 5RRIu7/xA1HEfzkT5FTHrolmogDjVqawu4PZS66iHDeo05RBVUlbTA8qCK+MlRcN
 U6s0cLEw59eH3EaCfOGuYp/MnbhuV0eNxe0btmqJIQwuW3+gwZKGJdOq6LS2YasJ
 k/DyIBVmkJAVsN56vm9q2vbtcZp+Bg+ngqBS+SC4TF7vV1WCtFmS6yaUf62PYW3D
 +Irx37qHZntDR5wdw3dsuKDi5U8bl6myPjaVLnVJqg/WIF9RlCkjk5xpWT99AmVO
 NmtYQxLLBlAQ5K+sIlBUwxZe+8q1l+Aj4TTmJHAfFtyfp25s7JR9I6/QtOyC5Kw=
 =odol
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull first round of SCSI updates from James Bottomley:
 "This is the usual grab bag of driver updates (hpsa, storvsc, mp2sas,
  megaraid_sas, ses) plus an assortment of minor updates.

  There's also an update to ufs which adds new phy drivers and finally a
  new logging infrastructure for SCSI"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (114 commits)
  scsi_logging: return void for dev_printk() functions
  scsi: print single-character strings with seq_putc
  scsi: merge consecutive seq_puts calls
  scsi: replace seq_printf with seq_puts
  aha152x: replace seq_printf with seq_puts
  advansys: replace seq_printf with seq_puts
  scsi: remove SPRINTF macro
  sg: remove an unused variable
  hpsa: Use local workqueues instead of system workqueues
  hpsa: add in P840ar controller model name
  hpsa: add in gen9 controller model names
  hpsa: detect and report failures changing controller transport modes
  hpsa: shorten the wait for the CISS doorbell mode change ack
  hpsa: refactor duplicated scan completion code into a new routine
  hpsa: move SG descriptor set-up out of hpsa_scatter_gather()
  hpsa: do not use function pointers in fast path command submission
  hpsa: print CDBs instead of kernel virtual addresses for uncommon errors
  hpsa: do not use a void pointer for scsi_cmd field of struct CommandList
  hpsa: return failed from device reset/abort handlers
  hpsa: check for ctlr lockup after command allocation in main io path
  ...
2015-02-11 10:28:45 -08:00
Christoph Hellwig
d427e3c82e block: remove unused function blk_bio_map_sg
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-02-11 11:24:14 -07:00
Linus Torvalds
718749d562 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input updates from Dmitry Torokhov:
 "The first round of updates for the input subsystem.

  A few new drivers (power button handler for AXP20x PMIC, tps65218
  power button driver, sun4i keys driver, regulator haptic driver, NI
  Ettus Research USRP E3x0 button, Alwinner A10/A20 PS/2 controller).

  Updates to Synaptics and ALPS touchpad drivers (with more to come
  later), brand new Focaltech PS/2 support, update to Cypress driver to
  handle Gen5 (in addition to Gen3) devices, and number of other fixups
  to various drivers as well as input core"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (54 commits)
  Input: elan_i2c - fix wrong %p extension
  Input: evdev - do not queue SYN_DROPPED if queue is empty
  Input: gscps2 - fix MODULE_DEVICE_TABLE invocation
  Input: synaptics - use dmax in input_mt_assign_slots
  Input: pxa27x_keypad - remove unnecessary ARM includes
  Input: ti_am335x_tsc - replace delta filtering with median filtering
  ARM: dts: AM335x: Make charge delay a DT parameter for TSC
  Input: ti_am335x_tsc - read charge delay from DT
  Input: ti_am335x_tsc - remove udelay in interrupt handler
  Input: ti_am335x_tsc - interchange touchscreen and ADC steps
  Input: MT - add support for balanced slot assignment
  Input: drv2667 - remove wrong and unneeded drv2667-haptics modalias
  Input: drv260x - remove wrong and unneeded drv260x-haptics modalias
  Input: cap11xx - remove wrong and unneeded cap11xx modalias
  Input: sun4i-ts - add support for touchpanel controller on A31
  Input: serio - add support for Alwinner A10/A20 PS/2 controller
  Input: gtco - use sign_extend32() for sign extension
  Input: elan_i2c - verify firmware signature applying it
  Input: elantech - remove stale comment from Kconfig
  Input: cyapa - off by one in cyapa_update_fw_store()
  ...
2015-02-11 09:32:08 -08:00
Linus Torvalds
e0c8453769 fbdev changes for v3.20
* omapdss: add DRA7xxx SoC support
 * fbdev: support DMT (Display Monitor Timing) calculation
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJU2yD3AAoJEPo9qoy8lh71mWQQAIakYyfFAYFnGOZU7vj9zxlj
 //UYaAWjjcksRd31hSBjGT/rQCmM/vM159W7RmIiJfqlw+hBIaHzWC3Wt9+4E3qt
 1p/eO/QdwRoOAixrY2WQhC1O70PldDIO75rw85EjxlISkw0gmEKeG2eSiYFVvPfI
 2afNj4gOkP1KUOZOTABMc0H+BMJo/EVQ34MJx8JNFGHRynGaDx7O44/0G8k/kfnk
 /tEit0iS4T7oF2Rz89fxFZxzoAtDmtR+ftFSkm42/2pmlmHXeh5Sn2Nxz3Kt6P0J
 bwvGXt7Q9VkKSB257wZ06tVER18JUNo6hOzEKZDYpfteDSX3pREMiNHi/EnDBLe+
 eXQ4GGozh50MfBYUnIYZ30vG8iY3oGzSPTENVfyMT6knVzTe2fbnu6vco231upBB
 DKak4+vqZk7ODC+PO3S3IjoxvpRziEiwbr4X7gk8CCU+5S8lwGZ1hAH91sUbiHVd
 p14wfMke5/RkgAF4USwbeyKxA/tNJosbrrKQW+9zpTAZL2iPR9g/6NM689LiEGpL
 uzM0Va0RxaFnqNDbbh4iFUEDcMD8/riRI6Tqa/QWtZvYVD+R/cdr4G/6aV8zG6gL
 B+yWPJxBOGOU3cuONWSC2jcUaT9v+AupV5oxRKcmmXNhByQ77g1ncX+TAPiv++ni
 1PMCAO2IIBt0GgY4SkfK
 =FBW1
 -----END PGP SIGNATURE-----

Merge tag 'fbdev-3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux

Pull fbdev changes from Tomi Valkeinen:

 - omapdss: add DRA7xxx SoC support

 - fbdev: support DMT (Display Monitor Timing) calculation

* tag 'fbdev-3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux: (40 commits)
  omapfb: Return error code when applying overlay settings fails
  OMAPDSS: DPI: DRA7xx support
  OMAPDSS: HDMI: Add DRA7xx support
  OMAPDSS: DISPC: program dispc polarities to control module
  OMAPDSS: DISPC: Add DRA7xx support
  OMAPDSS: Add Video PLLs for DRA7xx
  OMAPDSS: Add functions for external control of PLL
  OMAPDSS: DSS: Add DRA7xx base support
  Doc/DT: Add DT binding doc for DRA7xx DSS
  OMAPDSS: add define for DRA7xx HW version
  OMAPDSS: encoder-tpd12s015: Fix race issue with LS_OE
  OMAPDSS: OMAP5: fix digit output's allowed mgrs
  OMAPDSS: constify port arrays
  OMAPDSS: PLL: add dss_pll_wait_reset_done()
  OMAPDSS: Add enum dss_pll_id
  video: fbdev: fix sys_copyarea
  video/mmpfb: allow modular build
  fb: via: turn gpiolib and i2c selects into dependencies
  fbdev: ssd1307fb: return proper error code if write command fails
  fbdev: fix CVT vertical front and back porch values
  ...
2015-02-11 09:24:30 -08:00
Linus Torvalds
a323ae93a7 sound updates for 3.20-rc1
In this batch, you can find lots of cleanups through the whole
 subsystem, as our good New Year's resolution.  Lots of LOCs and
 commits are about LINE6 driver that was promoted finally from staging
 tree, and as usual, there've been widely spread ASoC changes.
 
 Here some highlights:
 
 ALSA core changes
   - Embedding struct device into ALSA core structures
   - sequencer core cleanups / fixes
   - PCM msbits constraints cleanups / fixes
   - New SNDRV_PCM_TRIGGER_DRAIN command
   - PCM kerneldoc fixes, header cleanups
   - PCM code cleanups using more standard codes
   - Control notification ID fixes
 
 Driver cleanups
   - Cleanups of PCI PM callbacks
   - Timer helper usages cleanups
   - Simplification (e.g. argument reduction) of many driver codes
 
 HD-audio
   - Hotkey and LED support on HP laptops with Realtek codecs
   - Dock station support on HP laptops
   - Toshiba Satellite S50D fixup
   - Enhanced wallclock timestamp handling for HD-audio
   - Componentization to simplify the linkage between i915 and hd-audio
     drivers for Intel HDMI/DP
 
 USB-audio
   - Akai MPC Element support
   - Enhanced timestamp handling
 
 ASoC
   - Lots of refactoringin ASoC core, moving drivers to more data
     driven initialization and rationalizing a lot of DAPM usage
   - Much improved handling of CDCLK clocks on Samsung I2S controllers
   - Lots of driver specific cleanups and feature improvements
   - CODEC support for TI PCM514x and TLV320AIC3104 devices
   - Board support for Tegra systems with Realtek RT5677
   - New driver for Maxim max98357a
   - More enhancements / fixes for Intel SST driver
 
 Others
   - Promotion of LINE6 driver from staging along with lots of rewrites
     and cleanups
   - DT support for old non-ASoC atmel driver
   - oxygen cleanups, XIO2001 init, Studio Evolution SE6x support
   - Emu8000 DRAM size detection fix on ISA(!!) AWE64 boards
   - A few more ak411x fixes for ice1724 boards
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJU2ySIAAoJEGwxgFQ9KSmk/YMP/1v0r60aDt6VxlTbadt008R/
 jTEIzD4oGEMkhFQdlmN8MegZlx+05vxQUCGrVy8PLelhfy/mnj6z/iUt9ohE1PqK
 530eVr5FnlAbHs1JzP8Tm8Xbbtk8RXt5uvgohJvt7HBrc0Def9N/w37fUQ0ytO+s
 Ot/0Xm8BNsdJ90DfMVLc0Ok9cAFn4Z70gylE/PuGxbBBzxQh8PYPXtJ6Q/s5lKLV
 QC7VitJa0H7vsFYb+Ve7GU4cKMTt8uEPw8CdnQbDwb63ia93iWJJrlqKVUWYF2Gu
 K+mX5Igdb88ToXbMPrLKXe73IfFcdpWNTbj8IAv+Rp9fArylzz+3GAYmrqTAdare
 JEE5qAZTtJZEeD2vgNCnA4JpSbRzL0bHrEow21LnPONq3V9FB044NAeMSx3dI4j1
 fk+SnqrpJMtlCtgj2PuWzIcqRiJ25F/Qax/xFeZHo7FwLIBF7z5pLu9DP4CfUSXj
 fDEcB9aNF2VirJkQdbhHaPqTYVf2rHQ/ebDpDHBwkqFe865IHlJ8g8MrHnAFInKN
 jQlSTOqi9V3two53U1JIKcB6QcBH3vh60w2JsWsQadsr45YYQ/bvBHGYNpQ00C3U
 rbDBANhAHaF/hFncNnOQDsH65FqHj/ZlBQRhzX0LqxN4K0DM1FqGcLf2k6u/pzZU
 09+QlcIOOzN8lbvHR8Qx
 =/84r
 -----END PGP SIGNATURE-----

Merge tag 'sound-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound updates from Takashi Iwai:
 "In this batch, you can find lots of cleanups through the whole
  subsystem, as our good New Year's resolution.  Lots of LOCs and
  commits are about LINE6 driver that was promoted finally from staging
  tree, and as usual, there've been widely spread ASoC changes.

  Here some highlights:

  ALSA core changes
   - Embedding struct device into ALSA core structures
   - sequencer core cleanups / fixes
   - PCM msbits constraints cleanups / fixes
   - New SNDRV_PCM_TRIGGER_DRAIN command
   - PCM kerneldoc fixes, header cleanups
   - PCM code cleanups using more standard codes
   - Control notification ID fixes

  Driver cleanups
   - Cleanups of PCI PM callbacks
   - Timer helper usages cleanups
   - Simplification (e.g. argument reduction) of many driver codes

  HD-audio
   - Hotkey and LED support on HP laptops with Realtek codecs
   - Dock station support on HP laptops
   - Toshiba Satellite S50D fixup
   - Enhanced wallclock timestamp handling for HD-audio
   - Componentization to simplify the linkage between i915 and hd-audio
     drivers for Intel HDMI/DP

  USB-audio
   - Akai MPC Element support
   - Enhanced timestamp handling

  ASoC
   - Lots of refactoringin ASoC core, moving drivers to more data driven
     initialization and rationalizing a lot of DAPM usage
   - Much improved handling of CDCLK clocks on Samsung I2S controllers
   - Lots of driver specific cleanups and feature improvements
   - CODEC support for TI PCM514x and TLV320AIC3104 devices
   - Board support for Tegra systems with Realtek RT5677
   - New driver for Maxim max98357a
   - More enhancements / fixes for Intel SST driver

  Others
   - Promotion of LINE6 driver from staging along with lots of rewrites
     and cleanups
   - DT support for old non-ASoC atmel driver
   - oxygen cleanups, XIO2001 init, Studio Evolution SE6x support
   - Emu8000 DRAM size detection fix on ISA(!!) AWE64 boards
   - A few more ak411x fixes for ice1724 boards"

* tag 'sound-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (542 commits)
  ALSA: line6: toneport: Use explicit type for firmware version
  ALSA: line6: Use explicit type for serial number
  ALSA: line6: Return EIO if read/write not successful
  ALSA: line6: Return error if device not responding
  ALSA: line6: Add delay before reading status
  ASoC: Intel: Clean data after SST fw fetch
  ALSA: hda - Add docking station support for another HP machine
  ALSA: control: fix failure to return new numerical ID in 'replace' event data
  ALSA: usb: update trigger timestamp on first non-zero URB submitted
  ALSA: hda: read trigger_timestamp immediately after starting DMA
  ALSA: pcm: allow for trigger_tstamp snapshot in .trigger
  ALSA: pcm: don't override timestamp unconditionally
  ALSA: off by one bug in snd_riptide_joystick_probe()
  ASoC: rt5670: Set use_single_rw flag for regmap
  ASoC: rt286: Add rt288 codec support
  ASoC: max98357a: Fix build in !CONFIG_OF case
  ASoC: Intel: fix platform_no_drv_owner.cocci warnings
  ARM: dts: Switch Odroid X2/U2 to simple-audio-card
  ARM: dts: Exynos4 and Odroid X2/U3 sound device nodes update
  ALSA: control: fix failure to return numerical ID in 'add' event
  ...
2015-02-11 08:51:59 -08:00
Linus Torvalds
3e63430a5c media updates for v3.20-rc1
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJU2NQQAAoJEAhfPr2O5OEV5BgQAIja/XsIgpeNhfN8kJ3GrdhL
 Z+QRTcHNc6AWGm1dkI+YTl4B38/xLlmxhUYPKsDl19N7n1oKkqdUxYtLe1mLdecW
 dvqMXMVBKQSCgyDP5sgZNHKlavEX1ZPTTtkrY8zYWaXbkcf4dOZyisbNQrmFdO3T
 wt4zwaO8+ziCEYbotLsaI1VpEDKFZV6AVhKnLsWxc4ZoCnAqJbmA31jtANxrQ0tw
 UgXRjJmf1uWrS+MWM5xFDi+v+FmZiUAHMJ5iksqWhp2pKj41geIqy7lAueytEN+Q
 vQHZ9cfhnoF/7VrqDtqq5CaJZPKfA80PSxml9mbjc4wytvWLevoc4UxFtU+lohOf
 YbM3nB5J3nAcq0bNF/cSpuYUoiGnK86FazuM6YAQy2CaucrVKALKHHmziWbK6gBv
 1yA4qnDuRYKps3SQSQQKuNlv8dmcVTD/sVhf8EIx62son6xxeXf21nas61lw8k5P
 lrUVH9nJxkwTkRJ7wMjlAZeh0pTyB/Ag1bSn81myziv0r4AsNyWJT5qxN8szmZDe
 nXGIdQ1h5JkMQ0kCfhhLqgdIUwhx7dMXIlXcCfR/8a9uYm4StegPNCEZDybIi6co
 8Ok3rPYt15PlrCyfMjXFOG/TYi/cZ/xIbffLbSFMOqnCUZElaA7RNpOnswNc9fc6
 2WsY54Lb4ftC4bQ7hM90
 =VH6m
 -----END PGP SIGNATURE-----

Merge tag 'media/v3.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media

Pull media updates from Mauro Carvalho Chehab:

 - Some documentation updates and a few new pixel formats

 - Stop btcx-risc abuse by cx88 and move it to bt8xx driver

 - New platform driver: am437x

 - New webcam driver: toptek

 - New remote controller hardware protocols added to img-ir driver

 - Removal of a few very old drivers that relies on old kABIs and are
   for very hard to find hardware: parallel port webcam drivers
   (bw-qcam, c-cam, pms and w9966), tlg2300, Video In/Out for SGI (vino)

 - Removal of the USB Telegent driver (tlg2300).  The company that
   developed this driver has long gone and the hardware is hard to find.
   As it relies on a legacy set of kABI symbols and nobody seems to care
   about it, remove it.

 - several improvements at rtl2832 driver

 - conversion on cx28521 and au0828 to use videobuf2 (VB2)

 - several improvements, fixups and board additions

* tag 'media/v3.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (321 commits)
  [media] dvb_net: Convert local hex dump to print_hex_dump_debug
  [media] dvb_net: Use standard debugging facilities
  [media] dvb_net: Use vsprintf %pM extension to print Ethernet addresses
  [media] staging: lirc_serial: adjust boolean assignments
  [media] stb0899: use sign_extend32() for sign extension
  [media] si2168: add support for 1.7MHz bandwidth
  [media] si2168: return error if set_frontend is called with invalid parameters
  [media] lirc_dev: avoid potential null-dereference
  [media] mn88472: simplify bandwidth registers setting code
  [media] dvb: tc90522: re-add symbol-rate report
  [media] lmedm04: add read snr, signal strength and ber call backs
  [media] lmedm04: Create frontend call back for read status
  [media] lmedm04: create frontend callbacks for signal/snr/ber/ucblocks
  [media] lmedm04: Fix usb_submit_urb BOGUS urb xfer, pipe 1 != type 3 in interrupt urb
  [media] lmedm04: Increase Interupt due time to 200 msec
  [media] cx88-dvb: whitespace cleanup
  [media] rtl28xxu: properly initialize pdata
  [media] rtl2832: declare functions as static
  [media] rtl2830: declare functions as static
  [media] rtl2832_sdr: add kernel-doc comments for platform_data
  ...
2015-02-11 08:45:40 -08:00