If we're in FIPS mode, we should panic if we fail to verify the signature on a
module or we're asked to load an unsigned module in signature enforcing mode.
Possibly FIPS mode should automatically enable enforcing mode.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We do a very simple search for a particular string appended to the module
(which is cache-hot and about to be SHA'd anyway). There's both a config
option and a boot parameter which control whether we accept or fail with
unsigned modules and modules that are signed with an unknown key.
If module signing is enabled, the kernel will be tainted if a module is
loaded that is unsigned or has a signature for which we don't have the
key.
(Useful feedback and tweaks by David Howells <dhowells@redhat.com>)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Add a crypto key parser for binary (DER) encoded X.509 certificates. The
certificate is parsed and, if possible, the signature is verified.
An X.509 key can be added like this:
# keyctl padd crypto bar @s </tmp/x509.cert
15768135
and displayed like this:
# cat /proc/keys
00f09a47 I--Q--- 1 perm 39390000 0 0 asymmetri bar: X509.RSA e9fd6d08 []
Note that this only works with binary certificates. PEM encoded certificates
are ignored by the parser.
Note also that the X.509 key ID is not congruent with the PGP key ID, but for
the moment, they will match.
If a NULL or "" name is given to add_key(), then the parser will generate a key
description from the CertificateSerialNumber and Name fields of the
TBSCertificate:
00aefc4e I--Q--- 1 perm 39390000 0 0 asymmetri bfbc0cd76d050ea4:/C=GB/L=Cambridge/O=Red Hat/CN=kernel key: X509.RSA 0c688c7b []
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Provide a function to read raw data of a predetermined size into an MPI rather
than expecting the size to be encoded within the data. The data is assumed to
represent an unsigned integer, and the resulting MPI will be positive.
The function looks like this:
MPI mpi_read_raw_data(const void *, size_t);
This is useful for reading ASN.1 integer primitives where the length is encoded
in the ASN.1 metadata.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Add an ASN.1 BER/DER/CER decoder. This uses the bytecode from the ASN.1
compiler in the previous patch to inform it as to what to expect to find in the
encoded byte stream. The output from the compiler also tells it what functions
to call on what tags, thus allowing the caller to retrieve information.
The decoder is called as follows:
int asn1_decoder(const struct asn1_decoder *decoder,
void *context,
const unsigned char *data,
size_t datalen);
The decoder argument points to the bytecode from the ASN.1 compiler. context
is the caller's context and is passed to the action functions. data and
datalen define the byte stream to be decoded.
Note that the decoder is currently limited to datalen being less than 64K.
This reduces the amount of stack space used by the decoder because ASN.1 is a
nested construct. Similarly, the decoder is limited to a maximum of 10 levels
of constructed data outside of a leaf node also in an effort to keep stack
usage down.
These restrictions can be raised if necessary.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Add a simple ASN.1 grammar compiler. This produces a bytecode output that can
be fed to a decoder to inform the decoder how to interpret the ASN.1 stream it
is trying to parse.
Action functions can be specified in the grammar by interpolating:
({ foo })
after a type, for example:
SubjectPublicKeyInfo ::= SEQUENCE {
algorithm AlgorithmIdentifier,
subjectPublicKey BIT STRING ({ do_key_data })
}
The decoder is expected to call these after matching this type and parsing the
contents if it is a constructed type.
The grammar compiler does not currently support the SET type (though it does
support SET OF) as I can't see a good way of tracking which members have been
encountered yet without using up extra stack space.
Currently, the grammar compiler will fail if more than 256 bytes of bytecode
would be produced or more than 256 actions have been specified as it uses
8-bit jump values and action indices to keep space usage down.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Add a pair of utility functions to render OIDs as strings. The first takes an
encoded OID and turns it into a "a.b.c.d" form string:
int sprint_oid(const void *data, size_t datasize,
char *buffer, size_t bufsize);
The second takes an OID enum index and calls the first on the data held
therein:
int sprint_OID(enum OID oid, char *buffer, size_t bufsize);
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Implement a simple static OID registry that allows the mapping of an encoded
OID to an enum value for ease of use.
The OID registry index enum appears in the:
linux/oid_registry.h
header file. A script generates the registry from lines in the header file
that look like:
<sp*>OID_foo,<sp*>/*<sp*>1.2.3.4<sp*>*/
The actual OID is taken to be represented by the numbers with interpolated
dots in the comment.
All other lines in the header are ignored.
The registry is queries by calling:
OID look_up_oid(const void *data, size_t datasize);
This returns a number from the registry enum representing the OID if found or
OID__NR if not.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
gpg can produce a signature file where length of signature is less than the
modulus size because the amount of space an MPI takes up is kept as low as
possible by discarding leading zeros. This regularly happens for several
modules during the build.
Fix it by relaxing check in RSA verification code.
Thanks to Tomas Mraz and Miloslav Trmac for help.
Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Implement RSA public key cryptography [PKCS#1 / RFC3447]. At this time, only
the signature verification algorithm is supported. This uses the asymmetric
public key subtype to hold its key data.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reinstate and export mpi_cmp() and mpi_cmp_ui() from the MPI library for use by
RSA signature verification as per RFC3447 section 5.2.2 step 1.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Provide signature verification using an asymmetric-type key to indicate the
public key to be used.
The API is a single function that can be found in crypto/public_key.h:
int verify_signature(const struct key *key,
const struct public_key_signature *sig)
The first argument is the appropriate key to be used and the second argument
is the parsed signature data:
struct public_key_signature {
u8 *digest;
u16 digest_size;
enum pkey_hash_algo pkey_hash_algo : 8;
union {
MPI mpi[2];
struct {
MPI s; /* m^d mod n */
} rsa;
struct {
MPI r;
MPI s;
} dsa;
};
};
This should be filled in prior to calling the function. The hash algorithm
should already have been called and the hash finalised and the output should
be in a buffer pointed to by the 'digest' member.
Any extra data to be added to the hash by the hash format (eg. PGP) should
have been added by the caller prior to finalising the hash.
It is assumed that the signature is made up of a number of MPI values. If an
algorithm becomes available for which this is not the case, the above structure
will have to change.
It is also assumed that it will have been checked that the signature algorithm
matches the key algorithm.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Add a subtype for supporting asymmetric public-key encryption algorithms such
as DSA (FIPS-186) and RSA (PKCS#1 / RFC1337).
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The instantiation data passed to the asymmetric key type are expected to be
formatted in some way, and there are several possible standard ways to format
the data.
The two obvious standards are OpenPGP keys and X.509 certificates. The latter
is especially useful when dealing with UEFI, and the former might be useful
when dealing with, say, eCryptfs.
Further, it might be desirable to provide formatted blobs that indicate
hardware is to be accessed to retrieve the keys or that the keys live
unretrievably in a hardware store, but that the keys can be used by means of
the hardware.
From userspace, the keys can be loaded using the keyctl command, for example,
an X.509 binary certificate:
keyctl padd asymmetric foo @s <dhowells.pem
or a PGP key:
keyctl padd asymmetric bar @s <dhowells.pub
or a pointer into the contents of the TPM:
keyctl add asymmetric zebra "TPM:04982390582905f8" @s
Inside the kernel, pluggable parsers register themselves and then get to
examine the payload data to see if they can handle it. If they can, they get
to:
(1) Propose a name for the key, to be used it the name is "" or NULL.
(2) Specify the key subtype.
(3) Provide the data for the subtype.
The key type asks the parser to do its stuff before a key is allocated and thus
before the name is set. If successful, the parser stores the suggested data
into the key_preparsed_payload struct, which will be either used (if the key is
successfully created and instantiated or updated) or discarded.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Create a key type that can be used to represent an asymmetric key type for use
in appropriate cryptographic operations, such as encryption, decryption,
signature generation and signature verification.
The key type is "asymmetric" and can provide access to a variety of
cryptographic algorithms.
Possibly, this would be better as "public_key" - but that has the disadvantage
that "public key" is an overloaded term.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In-source documentation for the asymmetric key type. This will be located in:
Documentation/crypto/asymmetric-keys.txt
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Provide count_leading/trailing_zeros() macros based on extant arch bit scanning
functions rather than reimplementing from scratch in MPILIB.
Whilst we're at it, turn count_foo_zeros(n, x) into n = count_foo_zeros(x).
Also move the definition to asm-generic as other people may be interested in
using it.
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
Cc: Arnd Bergmann <arnd@arndb.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Give the key type the opportunity to preparse the payload prior to the
instantiation and update routines being called. This is done with the
provision of two new key type operations:
int (*preparse)(struct key_preparsed_payload *prep);
void (*free_preparse)(struct key_preparsed_payload *prep);
If the first operation is present, then it is called before key creation (in
the add/update case) or before the key semaphore is taken (in the update and
instantiate cases). The second operation is called to clean up if the first
was called.
preparse() is given the opportunity to fill in the following structure:
struct key_preparsed_payload {
char *description;
void *type_data[2];
void *payload;
const void *data;
size_t datalen;
size_t quotalen;
};
Before the preparser is called, the first three fields will have been cleared,
the payload pointer and size will be stored in data and datalen and the default
quota size from the key_type struct will be stored into quotalen.
The preparser may parse the payload in any way it likes and may store data in
the type_data[] and payload fields for use by the instantiate() and update()
ops.
The preparser may also propose a description for the key by attaching it as a
string to the description field. This can be used by passing a NULL or ""
description to the add_key() system call or the key_create_or_update()
function. This cannot work with request_key() as that required the description
to tell the upcall about the key to be created.
This, for example permits keys that store PGP public keys to generate their own
name from the user ID and public key fingerprint in the key.
The instantiate() and update() operations are then modified to look like this:
int (*instantiate)(struct key *key, struct key_preparsed_payload *prep);
int (*update)(struct key *key, struct key_preparsed_payload *prep);
and the new payload data is passed in *prep, whether or not it was preparsed.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The original module-init-tools module loader used a fnctl lock on the
.ko file to avoid attempts to simultaneously load a module.
Unfortunately, you can't get an exclusive fcntl lock on a read-only
fd, making this not work for read-only mounted filesystems.
module-init-tools has a hacky sleep-and-loop for this now.
It's not that hard to wait in the kernel, and only return -EEXIST once
the first module has finished loading (or continue loading the module
if the first one failed to initialize for some reason). It's also
consistent with what we do for dependent modules which are still loading.
Suggested-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We use resolve_symbol_wait(), which blocks if the module containing
the symbol is still loading. However:
1) The module_wq we use is only woken after calling the modules' init
function, but there are other failure paths after the module is
placed in the linked list where we need to do the same thing.
2) wake_up() only wakes one waiter, and our waitqueue is shared by all
modules, so we need to wake them all.
3) wake_up_all() doesn't imply a memory barrier: I feel happier calling
it after we've grabbed and dropped the module_mutex, not just after
the state assignment.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Use the mapping of Elf_[SPE]hdr, Elf_Addr, Elf_Sym, Elf_Dyn, Elf_Rel/Rela,
ELF_R_TYPE() and ELF_R_SYM() to either the 32-bit version or the 64-bit version
into asm-generic/module.h for all arches bar MIPS.
Also, use the generic definition mod_arch_specific where possible.
To this end, I've defined three new config bools:
(*) HAVE_MOD_ARCH_SPECIFIC
Arches define this if they don't want to use the empty generic
mod_arch_specific struct.
(*) MODULES_USE_ELF_RELA
Arches define this if their modules can contain RELA records. This causes
the Elf_Rela mapping to be emitted and allows apply_relocate_add() to be
defined by the arch rather than have the core emit an error message.
(*) MODULES_USE_ELF_REL
Arches define this if their modules can contain REL records. This causes
the Elf_Rel mapping to be emitted and allows apply_relocate() to be
defined by the arch rather than have the core emit an error message.
Note that it is possible to allow both REL and RELA records: m68k and mips are
two arches that do this.
With this, some arch asm/module.h files can be deleted entirely and replaced
with a generic-y marker in the arch Kbuild file.
Additionally, I have removed the bits from m32r and score that handle the
unsupported type of relocation record as that's now handled centrally.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Fixes build failure introduced by "Make most arch asm/module.h files use
asm-generic/module.h" by moving all the RELA processing code to a
separate file to be used only for RELA processing on 64-bit kernels.
CC arch/mips/kernel/module.o
arch/mips/kernel/module.c:250:14: error: 'reloc_handlers_rela' defined but not
used [-Werror=unused-variable]
cc1: all warnings being treated as errors
make[6]: *** [arch/mips/kernel/module.o] Error 1
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cloudlinux have a product called lve that includes a kernel module. This
was previously GPLed but is now under a proprietary license, but the
module continues to declare MODULE_LICENSE("GPL") and makes use of some
EXPORT_SYMBOL_GPL symbols. Forcibly taint it in order to avoid this.
Signed-off-by: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Alex Lyashkov <umka@cloudlinux.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org
IBM reported a soft lockup after applying the fix for the rename_lock
deadlock. Commit c83ce989cb ("VFS: Fix the nfs sillyrename regression
in kernel 2.6.38") was found to be the culprit.
The nfs sillyrename fix used DCACHE_DISCONNECTED to indicate that the
dentry was killed. This flag can be set on non-killed dentries too,
which results in infinite retries when trying to traverse the dentry
tree.
This patch introduces a separate flag: DCACHE_DENTRY_KILLED, which is
only set in d_kill() and makes try_to_ascend() test only this flag.
IBM reported successful test results with this patch.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull another workqueue fix from Tejun Heo:
"Unfortunately, yet another late fix. This too is discovered and fixed
by Lai. This bug was introduced during this merge window by commit
25511a4776 ("workqueue: reimplement CPU online rebinding to handle
idle workers") which started using WORKER_REBIND flag for idle rebind
too.
The bug is relatively easy to trigger if the CPU rapidly goes through
off, on and then off (and stay off). The fix is on the safer side.
This hasn't been on linux-next yet but I'm pushing early so that it
can get more exposure before v3.6 release."
* 'for-3.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: always clear WORKER_REBIND in busy_worker_rebind_fn()
Merge fixes from Andrew Morton:
"13 patches. 12 are fixes and one is a little preparatory thing for
Andi."
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (13 commits)
memory hotplug: fix section info double registration bug
mm/page_alloc: fix the page address of higher page's buddy calculation
drivers/rtc/rtc-twl.c: ensure all interrupts are disabled during probe
compiler.h: add __visible
pid-namespace: limit value of ns_last_pid to (0, max_pid)
include/net/sock.h: squelch compiler warning in sk_rmem_schedule()
slub: consider pfmemalloc_match() in get_partial_node()
slab: fix starting index for finding another object
slab: do ClearSlabPfmemalloc() for all pages of slab
nbd: clear waiting_queue on shutdown
MAINTAINERS: fix TXT maintainer list and source repo path
mm/ia64: fix a memory block size bug
memory hotplug: reset pgdat->kswapd to NULL if creating kernel thread fails
There may be a bug when registering section info. For example, on my
Itanium platform, the pfn range of node0 includes the other nodes, so
other nodes' section info will be double registered, and memmap's page
count will equal to 3.
node0: start_pfn=0x100, spanned_pfn=0x20fb00, present_pfn=0x7f8a3, => 0x000100-0x20fc00
node1: start_pfn=0x80000, spanned_pfn=0x80000, present_pfn=0x80000, => 0x080000-0x100000
node2: start_pfn=0x100000, spanned_pfn=0x80000, present_pfn=0x80000, => 0x100000-0x180000
node3: start_pfn=0x180000, spanned_pfn=0x80000, present_pfn=0x80000, => 0x180000-0x200000
free_all_bootmem_node()
register_page_bootmem_info_node()
register_page_bootmem_info_section()
When hot remove memory, we can't free the memmap's page because
page_count() is 2 after put_page_bootmem().
sparse_remove_one_section()
free_section_usemap()
free_map_bootmem()
put_page_bootmem()
[akpm@linux-foundation.org: add code comment]
Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The heuristic method for buddy has been introduced since commit
43506fad21 ("mm/page_alloc.c: simplify calculation of combined index
of adjacent buddy lists"). But the page address of higher page's buddy
was wrongly calculated, which will lead page_is_buddy to fail for ever.
IOW, the heuristic method would be disabled with the wrong page address
of higher page's buddy.
Calculating the page address of higher page's buddy should be based
higher_page with the offset between index of higher page and index of
higher page's buddy.
Signed-off-by: Haifeng Li <omycle@gmail.com>
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: KyongHo Cho <pullip.cho@samsung.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: <stable@vger.kernel.org> [2.6.38+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On some platforms, bootloaders are known to do some interesting RTC
programming. Without going into the obscurities as to why this may be
the case, suffice it to say the the driver should not make any
assumptions about the state of the RTC when the driver loads. In
particular, the driver probe should be sure that all interrupts are
disabled until otherwise programmed.
This was discovered when finding bursty I2C traffic every second on
Overo platforms. This I2C overhead was keeping the SoC from hitting
deep power states. The cause was found to be the RTC firing every
second on the I2C-connected TWL PMIC.
Special thanks to Felipe Balbi for suggesting to look for a rogue driver
as the source of the I2C traffic rather than the I2C driver itself.
Special thanks to Steve Sakoman for helping track down the source of the
continuous RTC interrups on the Overo boards.
Signed-off-by: Kevin Hilman <khilman@ti.com>
Cc: Felipe Balbi <balbi@ti.com>
Tested-by: Steve Sakoman <steve@sakoman.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Tested-by: Shubhrajyoti Datta <omaplinuxkernel@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
gcc 4.6+ has support for a externally_visible attribute that prevents the
optimizer from optimizing unused symbols away. Add a __visible macro to
use it with that compiler version or later.
This is used (at least) by the "Link Time Optimization" patchset.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The kernel doesn't check the pid for negative values, so if you try to
write -2 to /proc/sys/kernel/ns_last_pid, you will get a kernel panic.
The crash happens because the next pid is -1, and alloc_pidmap() will
try to access to a nonexistent pidmap.
map = &pid_ns->pidmap[pid/BITS_PER_PAGE];
Signed-off-by: Andrew Vagin <avagin@openvz.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This warning:
In file included from linux/include/linux/tcp.h:227:0,
from linux/include/linux/ipv6.h:221,
from linux/include/net/ipv6.h:16,
from linux/include/linux/sunrpc/clnt.h:26,
from linux/net/sunrpc/stats.c:22:
linux/include/net/sock.h: In function `sk_rmem_schedule':
linux/nfs-2.6/include/net/sock.h:1339:13: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
is seen with gcc (GCC) 4.6.3 20120306 (Red Hat 4.6.3-2) using the
-Wextra option.
Commit c76562b670 ("netvm: prevent a stream-specific deadlock")
accidentally replaced the "size" parameter of sk_rmem_schedule() with an
unsigned int. This changes the semantics of the comparison in the
return statement.
In sk_wmem_schedule we have syntactically the same comparison, but
"size" is a signed integer. In addition, __sk_mem_schedule() takes a
signed integer for its "size" parameter, so there is an implicit type
conversion in sk_rmem_schedule() anyway.
Revert the "size" parameter back to a signed integer so that the
semantics of the expressions in both sk_[rw]mem_schedule() are exactly
the same.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: David Miller <davem@davemloft.net>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
get_partial() is currently not checking pfmemalloc_match() meaning that
it is possible for pfmemalloc pages to leak to non-pfmemalloc users.
This is a problem in the following situation. Assume that there is a
request from normal allocation and there are no objects in the per-cpu
cache and no node-partial slab.
In this case, slab_alloc enters the slow path and new_slab_objects() is
called which may return a PFMEMALLOC page. As the current user is not
allowed to access PFMEMALLOC page, deactivate_slab() is called
([5091b74a: mm: slub: optimise the SLUB fast path to avoid pfmemalloc
checks]) and returns an object from PFMEMALLOC page.
Next time, when we get another request from normal allocation,
slab_alloc() enters the slow-path and calls new_slab_objects(). In
new_slab_objects(), we call get_partial() and get a partial slab which
was just deactivated but is a pfmemalloc page. We extract one object
from it and re-deactivate.
"deactivate -> re-get in get_partial -> re-deactivate" occures repeatedly.
As a result, access to PFMEMALLOC page is not properly restricted and it
can cause a performance degradation due to frequent deactivation.
deactivation frequently.
This patch changes get_partial_node() to take pfmemalloc_match() into
account and prevents the "deactivate -> re-get in get_partial()
scenario. Instead, new_slab() is called.
Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: David Miller <davem@davemloft.net>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In array cache, there is a object at index 0, check it.
Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: David Miller <davem@davemloft.net>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Right now, we call ClearSlabPfmemalloc() for first page of slab when we
clear SlabPfmemalloc flag. This is fine for most swap-over-network use
cases as it is expected that order-0 pages are in use. Unfortunately it
is possible that that __ac_put_obj() checks SlabPfmemalloc on a tail
page and while this is harmless, it is sloppy. This patch ensures that
the head page is always used.
This problem was originally identified by Joonsoo Kim.
[js1304@gmail.com: Original implementation and problem identification]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: David Miller <davem@davemloft.net>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix a serious but uncommon bug in nbd which occurs when there is heavy
I/O going to the nbd device while, at the same time, a failure (server,
network) or manual disconnect of the nbd connection occurs.
There is a small window between the time that the nbd_thread is stopped
and the socket is shutdown where requests can continue to be queued to
nbd's internal waiting_queue. When this happens, those requests are
never completed or freed.
The fix is to clear the waiting_queue on shutdown of the nbd device, in
the same way that the nbd request queue (queue_head) is already being
cleared.
Signed-off-by: Paul Clements <paul.clements@steeleye.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Gang Wei <gang.wei@intel.com>
Cc: Richard L Maliszewski <richard.l.maliszewski@intel.com>
Cc: Gang Wei <gang.wei@intel.com>
Cc: Shane Wang <shane.wang@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I found following definition in include/linux/memory.h, in my IA64
platform, SECTION_SIZE_BITS is equal to 32, and MIN_MEMORY_BLOCK_SIZE
will be 0.
#define MIN_MEMORY_BLOCK_SIZE (1 << SECTION_SIZE_BITS)
Because MIN_MEMORY_BLOCK_SIZE is int type and length of 32bits,
so MIN_MEMORY_BLOCK_SIZE(1 << 32) will will equal to 0.
Actually when SECTION_SIZE_BITS >= 31, MIN_MEMORY_BLOCK_SIZE will be wrong.
This will cause wrong system memory infomation in sysfs.
I think it should be:
#define MIN_MEMORY_BLOCK_SIZE (1UL << SECTION_SIZE_BITS)
And "echo offline > memory0/state" will cause following call trace:
kernel BUG at mm/memory_hotplug.c:885!
sh[6455]: bugcheck! 0 [1]
Pid: 6455, CPU 0, comm: sh
psr : 0000101008526030 ifs : 8000000000000fa4 ip : [<a0000001008c40f0>] Not tainted (3.6.0-rc1)
ip is at offline_pages+0x210/0xee0
Call Trace:
show_stack+0x80/0xa0
show_regs+0x640/0x920
die+0x190/0x2c0
die_if_kernel+0x50/0x80
ia64_bad_break+0x3d0/0x6e0
ia64_native_leave_kernel+0x0/0x270
offline_pages+0x210/0xee0
alloc_pages_current+0x180/0x2a0
Signed-off-by: Jianguo Wu <wujianguo@huawei.com>
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If kthread_run() fails, pgdat->kswapd contains errno. When we stop this
thread, we only check whether pgdat->kswapd is NULL and access it. If
it contains errno, it will cause page fault. Reset pgdat->kswapd to
NULL when creating kernel thread fails can avoid this problem.
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Couple more IPoIB fixes for regressions introduced by path database conversion
- Minor other fixes to low-level drivers (cxgb4, mlx4, qib, ocrdma)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABCAAGBQJQV0eiAAoJEENa44ZhAt0hTScP/3+Y8v4w3Nyop7iNj9uvnsCn
PhgfY9XUp9d0zOHM7XK+pTGLJheLLiyXfcCFTwgoC6tIyj7J1ulhYv+RBr7GbS+e
K7aG7QtLDyM17ETgOBVIAvkz1CFKwdI843xRQKo0/oVMQJ9XLHg0nvzd+XaxZuud
C+7Lfl5RH9Bl1tHJuOZmdyJinSQyLO/uV+tqwL5upF5YMTCJ2XsmVa1HQKhOAkBE
bH4zXtFViNEKVuKXkpGc1yJFKZiwysirluYXedDkkEdr2jf9Mysw5JvZKVUOCh+4
qYMjEAWQaUBia/tFUs9k6iCwE8ws7tRC4OSisb3cVzCUaqkfmx1Sp7vAQcDqx4xB
r2dy4oc+vEFdrmjerOVEgmdgTwnDrWMjUlPuGyRX9935I7ybKA7NQPGlImKU8oiq
8vyptsTN6r7AYc7kw6aYaE9kqGPnST7I3cZg71lHfwCuHKMNYfkxd1Mqxx/W6rMZ
G1VnstVdioQK2HvcFIeArXQ6H6QjUAY92WMoyhIsrU4KwEx2Zpr0pfKWCD0eIOuB
f4s9v0f5/6DqSwJ9uhhpAW/5bmV+RUCDjm8Ep9i70bGk7SliwT0MNPhHbaMfm7aU
c+awy3HWpWKGCtod9zUzJH/4NsHzpggdHUlyAOuU5nqsh9GarYb0cvLjTNfsfdyV
haQtMR+fSUK7Durd9MQl
=nqvR
-----END PGP SIGNATURE-----
Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
Pull InfiniBand/RDMA fixes from Roland Dreier:
- A couple more IPoIB fixes for regressions introduced by path database
conversion
- Minor other fixes to low-level drivers (cxgb4, mlx4, qib, ocrdma)
* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
IB/qib: Fix failure of compliance test C14-024#06_LocalPortNum
RDMA/ocrdma: Fix CQE expansion of unsignaled WQE
mlx4_core: Fix integer overflows so 8TBs of memory registration works
IPoIB: Fix AB-BA deadlock when deleting neighbours
IPoIB: Fix memory leak in the neigh table deletion flow
RDMA/cxgb4: Move dereference below NULL test
The unregister_sysctl_table() function hangs if all references to its
ctl_table_header structure are not dropped.
This can happen sometimes because of a leak in proc_sys_lookup():
proc_sys_lookup() gets a reference to the table via lookup_entry(), but
it does not release it when a subsequent call to sysctl_follow_link()
fails.
This patch fixes this leak by making sure the reference is always
dropped on return.
See also commit 076c3eed2c ("sysctl: Rewrite proc_sys_lookup
introducing find_entry and lookup_entry") which reorganized this code in
3.4.
Tested in Linux 3.4.4.
Signed-off-by: Francesco Ruggeri <fruggeri@aristanetworks.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- A tps65217 build error fix.
- A lcp_ich regression fix caused by the MFD driver failing to initialize the
watchdog sub device due to ACPI conflicts.
- 2 MAX77693 interrupt handling bug fixes.
- An MFD core fix, adding an IRQ domain argument to the MFD device addition
API in order to prevent silent and potentially harmful remapping behaviour
changes for drivers supporting non-DT platforms.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJQVQpwAAoJEIqAPN1PVmxKpnQP/iDPakfGT0m00nxk9VvuXfaA
SlNxhhM1ZPxlZcBrVFULZ+dCVuUaBYIfwoT4Wwl2yZsmp8uYb62Rd6LGXDnjO+FH
zKMg3b/KDHq/jicYxqFy4bVq3I850jTLN4Hti6hArn8HM6gxZJQxM810Poxt6M0T
odud6oHSevGpCoL7GU4O+gmx1wKSDRrw6TZPokJDfXaQcFPJEt0YGGtI2zAO1sV+
iBPQE/bI3bF5r/kowbh9ro5WybGNyFrwiYtaTzJWntXoPKN1lmr0F2F2NaCEo1xj
2vJPFSCK3qC9Ft7sPQEBYxrSXzyzgu7G8HGgkcBG5SaSo+W3awe9DEaHfk5E7sdr
iyc01kh4FmGlMAvHG/zcK3YvchR7GJdRI7BrR+MTQ03ZUv/ZrmTYFAHv5fAzBQ+N
6ZUoC/1bqPKpfeFPOKYpDeYJVnFBWLYr+t7McTqqg+kpxUuYnQ0HyJVjDh01ZVQT
AtCjjW7R+Ka44B+xfMOartPZMqZZEHSJ0UjacyEyzMpAAY14eUDMtTqs/jtNp+8Z
B0bZiA8ZhUmNVH7TZoT/u57FgEW34JnM8oH6jLVN+yjInpuugIvqpZn0TA90GL7V
Fh2DALMgUN5z/TXduGSsSPg1hvXFPOujaS+4o78e8PWiVQd5XUk/mc8OOS/mrRwj
+mILnzxJcp8iYrYqESbt
=nP/y
-----END PGP SIGNATURE-----
Merge tag 'mfd-for-linus-3.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6
Pull mfd fixes from Samuel Ortiz:
"This is the remaining MFD fixes for 3.6, with 5 pending fixes:
- A tps65217 build error fix.
- A lcp_ich regression fix caused by the MFD driver failing to
initialize the watchdog sub device due to ACPI conflicts.
- 2 MAX77693 interrupt handling bug fixes.
- An MFD core fix, adding an IRQ domain argument to the MFD device
addition API in order to prevent silent and potentially harmful
remapping behaviour changes for drivers supporting non-DT
platforms."
* tag 'mfd-for-linus-3.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6:
mfd: MAX77693: Fix NULL pointer error when initializing irqs
mfd: MAX77693: Fix interrupt handling bug
mfd: core: Push irqdomain mapping out into devices
mfd: lpc_ich: Fix a 3.5 kernel regression for iTCO_wdt driver
mfd: Move tps65217 regulator plat data handling to regulator
minor and touch only new drivers so I think these are still safe for
merging.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iQIcBAABAgAGBQJQVcWAAAoJEN0jrNd/PrOhmtEP/jy9tG/aP15UM9lPacVNz89V
LaO0jYRBZtS8A8Gi5smdjQj6nQ+ir7P06jF5YyRzA1gNkUU2pDCrdIs8Sf0KJkL0
sJ5dRHEYuK3MNdwK2ajR6LkpFtMtGXpQtoyilwvRZWNgWCo7LMBK7k2xGYPfyF6r
yc8/OuOyZzC83p54prW8JCun7XShgel4KYoCvYWnYzmZZ5ct0jARCgl49eJagul3
I3nlvGfDYI4nZd9KSuWiKIor8lyW7Sk2WKfl7g9EIX+EKzsxFToQ/sRiT5mGtrTG
gzGp31DJGVaCpdhk2/2AkjCxkh/JYNo3YqBXmQRMpGj7nkHOpQ6tz4nz1DGmJU9f
c5ZY9rW8AUXTP0vQcW5PbxE0xnOZOVDCgkX2ZQuE7av34QPjVDsl/zDI9amx/EMC
imyoZPk/lssVy3fuXRG4uxjmyULqk/TMnrx6pNcPOxeFr9JKs3NJVELFd6z0enrt
6AmaaRol+XiF9IGRzai6CwwIDN8hYd9qp6e1U0bkk7bmo1D05OsUodClxX17Rr/S
kGH6hAhhE4miWUJ2pYyfJOC1R7Tbre1G2H2wlv5jjVzNVrzasU/8NYJ8G0dYuj9z
LCExF0QBCfZlHjW3Jn7tpLhFY0S0pSEViGckdb1kLoY3qXGHCnJhQdVon5ceWS/v
0y7277cvE9R11TJiI2aF
=tCsJ
-----END PGP SIGNATURE-----
Merge tag 'for-3.6-rc6' of git://gitorious.org/linux-pwm/linux-pwm
Pull pwm fixes from Thierry Reding:
"While this comes a bit later than I had wished, both patches are
rather minor and touch only new drivers so I think these are still
safe for merging."
* tag 'for-3.6-rc6' of git://gitorious.org/linux-pwm/linux-pwm:
pwm: pwm-tiehrpwm: Fix conflicting channel period setting
pwm: pwm-tiecap: Disable APWM mode after configure
Pull scsi target fixes from Nicholas Bellinger:
"Here is the current set of target-pending fixes headed for v3.6-final
The main parts of this series include bug-fixes from Paolo Bonzini to
address an use-after-free bug in pSCSI sense exception handling, along
with addressing some long-standing bugs wrt the handling of zero-
length SCSI CDB payloads also specific to pSCSI pass-through device
backends."
* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
target: go through normal processing for zero-length REQUEST_SENSE
target: support zero allocation length in REQUEST SENSE
target: support zero-size allocation lengths in transport_kmap_data_sg
target: fail REPORT LUNS with less than 16 bytes of payload
target: report too-small parameter lists everywhere
target: go through normal processing for zero-length PSCSI commands
target: fix use-after-free with PSCSI sense data
target: simplify code around transport_get_sense_data
target: move transport_get_sense_data
target: Check idr_get_new return value in iscsi_login_zero_tsih_s1
target: Fix ->data_length re-assignment bug with SCSI overflow
* Three ACPI device power management fixes related to checking and
setting device power states.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
iQIcBAABAgAGBQJQVN5wAAoJEKhOf7ml8uNsUEUQAJN7kQI4x/3BRw+JPXsuQcZ+
HKoAT23eKuPpkuBgfnyacB0NMVB8VVLoYqFq/L282k2mkuO/dJ1NVruAOBikuApH
U/bf7Rct2gFI0AkPBAx+itGnNx2xxALd0NbpZLgccKxgmbc5TT5BtzYK9uFSNSuQ
y0QzKzGYrlfE7PV1439eQvHn9nffpeJHUoZVJtZlHOPuGlX12SpOEIHnaHnFH1rF
XY58zYFPDCE0qaWR9r4i/bcgdcatDP30JHSoULtRDse46En2ebGlyuiGZrbs8HZ1
PzFzjGd3X6vbycX+gpEGuFMzKAyUjFhChZ8DGGlTQ7H3mxDBBtNIDiqIdKw27DUJ
LvwpOk8LPAcgkO0zvoUf3Won92HcXS2TPC6/AjosOYXi9n9gZFzCVuKTnvUDcEP3
AI1+MVGELEJowIYkx8Thafcr0ipEsMXMb2GQfnSuotdL6V/pzlMZsaJneMp34W3v
P/3CkOj+JEH3K6Ode4AoRQB+pwLBLPBHAN/sNTUtNLDUrkNYmNdDHiam7ZyG6ffh
2Sxrc9N1e3Q0Ogy1h+vMr/ZjUbiPyCl5YlsC+Iasm0AuvioN00IFNMEcSlRzS6X8
HhjNVMUIOoIwU4ZBm3lnTnMh2DnlRtJDH9NHuW9qR/g/7cHn3oiBEGZ5GOrA5Q+L
3QkosGc/V7tMPuA+tZiO
=TTmm
-----END PGP SIGNATURE-----
Merge tag 'pm-for-3.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael J. Wysocki:
"Three ACPI device power management fixes related to checking and
setting device power states."
* tag 'pm-for-3.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / PM: Use KERN_DEBUG when no power resources are found
ACPI / PM: Fix resource_lock dead lock in acpi_power_on_device
ACPI / PM: Infer parent power state from child if unknown, v2
Pull a btrfs revert from Chris Mason:
"My for-linus branch has one revert in the new quota code.
We're building up more fixes at etc for the next merge window, but I'm
keeping them out unless they are bigger regressions or have a huge
impact."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Revert "Btrfs: fix some error codes in btrfs_qgroup_inherit()"
Yet more (a bunch of) small fixes that slipped from the previous
pull request. Most of commits are pending ASoC fixes, all of which
are fairly trivial commits.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iQIcBAABAgAGBQJQVGAHAAoJEGwxgFQ9KSmkyW8P/idw2CiiB2bdU3SwDfg6AXJE
CmvpTHtxAVtYvejq/WIFP9NkUAwZZePg+DQoVTOYmzdrZjNyCIW0rq7TG2PCo4Dx
v6Ek9MITrWbT8dil75SF0JeqglWznQFNkUinBCVIZPEzvpvTmjbnQNvja9iVQ41G
LpWbB0KPTNw88cILnH8YTO0tlvPFhOTx4ZRMZpq26q7nmph5abSjLlkKmYMa59sp
lbq8P9y2HRSLM7YR5WAV7ydg3L+euFe7ppbCqnp0l0mmhYjj3/ltI/wxkGIWNfRN
mSAW3ZM2Xz0ZO0NLuLEMcgoCZAoHy3KUMOJqt+DKe91Vn7DpBU/xWrcwU+wT7I3v
9O4vM6C4h89xxB41n1AejUQivPHIyT1ZmfSRByB5t5l2KwUI2VDD2p7VNHvY6FWF
JkbYfb2c1VB3sUZKDv0dKDfZDsc5ddLVSnujoRjApel9ghVI7wDZr5ZsLPMW8z/Y
6wJ5PsBAf1iPc+CS05mQXrLc8LQB3u3bR7xTEt9yVsj8lQIXmN6W6Vm0N0hut6Vs
snDpKHD0AQ9LjQZysUsX45qPPiSX6PlX2wEFyA49C1ahBKUJ0Nh7wmqvvF9/GA2R
kqK652uM7Mworw26eYrNfbyL2/DFrPea67lks1tqW3s1o7NQ9A1gNmrF0ZIIbaTt
zY0D01eyFWkIeeKqKtoU
=fIXH
-----END PGP SIGNATURE-----
Merge tag 'sound-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull more sound fixes from Takashi Iwai:
"Yet more (a bunch of) small fixes that slipped from the previous pull
request. Most of commits are pending ASoC fixes, all of which are
fairly trivial commits."
* tag 'sound-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ASoC: wm8904: correct the index
ALSA: hda - Yet another position_fix quirk for ASUS machines
ASoC: tegra: fix maxburst settings in dmaengine code
ASoC: samsung dma - Don't indicate support for pause/resume.
ASoC: mc13783: Remove mono support
ASoC: arizona: Fix typo in 44.1kHz rates
ASoC: spear: correct the check for NULL dma_buffer pointer
sound: tegra_alc5632: remove HP detect GPIO inversion
ASoC: atmel-ssc: include linux/io.h for raw io
ASoC: dapm: Don't force card bias level to be updated
ASoC: dapm: Make sure we update the bias level for CODECs with no op
ASoC: am3517evm: fix error return code
ASoC: ux500_msp_i2s: better use devm functions and fix error return code
ASoC: imx-sgtl5000: fix error return code