Commit Graph

2282 Commits

Author SHA1 Message Date
David Howells
9c5e45df21 KEYS: Fix searching of nested keyrings
If a keyring contains more than 16 keyrings (the capacity of a single node in
the associative array) then those keyrings are split over multiple nodes
arranged as a tree.

If search_nested_keyrings() is called to search the keyring then it will
attempt to manually walk over just the 0 branch of the associative array tree
where all the keyring links are stored.  This works provided the key is found
before the algorithm steps from one node containing keyrings to a child node
or if there are sufficiently few keyring links that the keyrings are all in
one node.

However, if the algorithm does need to step from a node to a child node, it
doesn't change the node pointer unless a shortcut also gets transited.  This
means that the algorithm will keep scanning the same node over and over again
without terminating and without returning.

To fix this, move the internal-pointer-to-node translation from inside the
shortcut transit handler so that it applies it to node arrival as well.

This can be tested by:

	r=`keyctl newring sandbox @s`
	for ((i=0; i<=16; i++)); do keyctl newring ring$i $r; done
	for ((i=0; i<=16; i++)); do keyctl add user a$i a %:ring$i; done
	for ((i=0; i<=16; i++)); do keyctl search $r user a$i; done
	for ((i=17; i<=20; i++)); do keyctl search $r user a$i; done

The searches should all complete successfully (or with an error for 17-20),
but instead one or more of them will hang.

Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Stephen Gallagher <sgallagh@redhat.com>
2013-12-02 11:24:19 +00:00
David Howells
23fd78d764 KEYS: Fix multiple key add into associative array
If sufficient keys (or keyrings) are added into a keyring such that a node in
the associative array's tree overflows (each node has a capacity N, currently
16) and such that all N+1 keys have the same index key segment for that level
of the tree (the level'th nibble of the index key), then assoc_array_insert()
calls ops->diff_objects() to indicate at which bit position the two index keys
vary.

However, __key_link_begin() passes a NULL object to assoc_array_insert() with
the intention of supplying the correct pointer later before we commit the
change.  This means that keyring_diff_objects() is given a NULL pointer as one
of its arguments which it does not expect.  This results in an oops like the
attached.

With the previous patch to fix the keyring hash function, this can be forced
much more easily by creating a keyring and only adding keyrings to it.  Add any
other sort of key and a different insertion path is taken - all 16+1 objects
must want to cluster in the same node slot.

This can be tested by:

	r=`keyctl newring sandbox @s`
	for ((i=0; i<=16; i++)); do keyctl newring ring$i $r; done

This should work fine, but oopses when the 17th keyring is added.

Since ops->diff_objects() is always called with the first pointer pointing to
the object to be inserted (ie. the NULL pointer), we can fix the problem by
changing the to-be-inserted object pointer to point to the index key passed
into assoc_array_insert() instead.

Whilst we're at it, we also switch the arguments so that they are the same as
for ->compare_object().

BUG: unable to handle kernel NULL pointer dereference at 0000000000000088
IP: [<ffffffff81191ee4>] hash_key_type_and_desc+0x18/0xb0
...
RIP: 0010:[<ffffffff81191ee4>] hash_key_type_and_desc+0x18/0xb0
...
Call Trace:
 [<ffffffff81191f9d>] keyring_diff_objects+0x21/0xd2
 [<ffffffff811f09ef>] assoc_array_insert+0x3b6/0x908
 [<ffffffff811929a7>] __key_link_begin+0x78/0xe5
 [<ffffffff81191a2e>] key_create_or_update+0x17d/0x36a
 [<ffffffff81192e0a>] SyS_add_key+0x123/0x183
 [<ffffffff81400ddb>] tracesys+0xdd/0xe2

Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Stephen Gallagher <sgallagh@redhat.com>
2013-12-02 11:24:18 +00:00
David Howells
d54e58b7f0 KEYS: Fix the keyring hash function
The keyring hash function (used by the associative array) is supposed to clear
the bottommost nibble of the index key (where the hash value resides) for
keyrings and make sure it is non-zero for non-keyrings.  This is done to make
keyrings cluster together on one branch of the tree separately to other keys.

Unfortunately, the wrong mask is used, so only the bottom two bits are
examined and cleared and not the whole bottom nibble.  This means that keys
and keyrings can still be successfully searched for under most circumstances
as the hash is consistent in its miscalculation, but if a keyring's
associative array bottom node gets filled up then approx 75% of the keyrings
will not be put into the 0 branch.

The consequence of this is that a key in a keyring linked to by another
keyring, ie.

	keyring A -> keyring B -> key

may not be found if the search starts at keyring A and then descends into
keyring B because search_nested_keyrings() only searches up the 0 branch (as it
"knows" all keyrings must be there and not elsewhere in the tree).

The fix is to use the right mask.

This can be tested with:

	r=`keyctl newring sandbox @s`
	for ((i=0; i<=16; i++)); do keyctl newring ring$i $r; done
	for ((i=0; i<=16; i++)); do keyctl add user a$i a %:ring$i; done
	for ((i=0; i<=16; i++)); do keyctl search $r user a$i; done

This creates a sandbox keyring, then creates 17 keyrings therein (labelled
ring0..ring16).  This causes the root node of the sandbox's associative array
to overflow and for the tree to have extra nodes inserted.

Each keyring then is given a user key (labelled aN for ringN) for us to search
for.

We then search for the user keys we added, starting from the sandbox.  If
working correctly, it should return the same ordered list of key IDs as
for...keyctl add... did.  Without this patch, it reports ENOKEY "Required key
not available" for some of the keys.  Just which keys get this depends as the
kernel pointer to the key type forms part of the hash function.

Reported-by: Nalin Dahyabhai <nalin@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Stephen Gallagher <sgallagh@redhat.com>
2013-12-02 11:24:18 +00:00
David Howells
2480f57fb3 KEYS: Pre-clear struct key on allocation
The second word of key->payload does not get initialised in key_alloc(), but
the big_key type is relying on it having been cleared.  The problem comes when
big_key fails to instantiate a large key and doesn't then set the payload.  The
big_key_destroy() op is called from the garbage collector and this assumes that
the dentry pointer stored in the second word will be NULL if instantiation did
not complete.

Therefore just pre-clear the entire struct key on allocation rather than trying
to be clever and only initialising to 0 only those bits that aren't otherwise
initialised.

The lack of initialisation can lead to a bug report like the following if
big_key failed to initialise its file:

	general protection fault: 0000 [#1] SMP
	Modules linked in: ...
	CPU: 0 PID: 51 Comm: kworker/0:1 Not tainted 3.10.0-53.el7.x86_64 #1
	Hardware name: Dell Inc. PowerEdge 1955/0HC513, BIOS 1.4.4 12/09/2008
	Workqueue: events key_garbage_collector
	task: ffff8801294f5680 ti: ffff8801296e2000 task.ti: ffff8801296e2000
	RIP: 0010:[<ffffffff811b4a51>] dput+0x21/0x2d0
	...
	Call Trace:
	 [<ffffffff811a7b06>] path_put+0x16/0x30
	 [<ffffffff81235604>] big_key_destroy+0x44/0x60
	 [<ffffffff8122dc4b>] key_gc_unused_keys.constprop.2+0x5b/0xe0
	 [<ffffffff8122df2f>] key_garbage_collector+0x1df/0x3c0
	 [<ffffffff8107759b>] process_one_work+0x17b/0x460
	 [<ffffffff8107834b>] worker_thread+0x11b/0x400
	 [<ffffffff81078230>] ? rescuer_thread+0x3e0/0x3e0
	 [<ffffffff8107eb00>] kthread+0xc0/0xd0
	 [<ffffffff8107ea40>] ? kthread_create_on_node+0x110/0x110
	 [<ffffffff815c4bec>] ret_from_fork+0x7c/0xb0
	 [<ffffffff8107ea40>] ? kthread_create_on_node+0x110/0x110

Reported-by: Patrik Kis <pkis@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Stephen Gallagher <sgallagh@redhat.com>
2013-12-02 11:24:18 +00:00
Roberto Sassu
af91706d5d ima: store address of template_fmt_copy in a pointer before calling strsep
This patch stores the address of the 'template_fmt_copy' variable in a new
variable, called 'template_fmt_ptr', so that the latter is passed as an
argument of strsep() instead of the former. This modification is needed
in order to correctly free the memory area referenced by
'template_fmt_copy' (strsep() modifies the pointer of the passed string).

Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Reported-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2013-11-30 13:09:53 +11:00
Paul Moore
dd0a11815a Linux 3.12
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.15 (GNU/Linux)
 
 iQEcBAABAgAGBQJSdt9HAAoJEHm+PkMAQRiGnzEH/345Keg5dp+oKACnokBfzOtp
 V0p3g5EBsGtzEVnV+1B96trczDUtWdDFFr5GfGSj565NBQpFyc+iZC1mC99RDJCs
 WUquGFqlLMK2aV0SbKwCO4K1rJ5A0TRVj0ZRJOUJUY7jwNf5Qahny0WBVjO/8qAY
 UvJK1rktBClhKdH53YtpDHHgXBeZ2LOrzt1fQ/AMpujGbZauGvnLdNOli5r2kCFK
 jzoOgFLvX+PHU/5/d4/QyJPeQNPva5hjk5Ho9UuSJYhnFtPO3EkD4XZLcpcbNEJb
 LqBvbnZWm6CS435lfU1l93RqQa5xMO9ITk0oe4h69syTSHwWk9aJ+ZTc/4Up+t8=
 =57MC
 -----END PGP SIGNATURE-----

Merge tag 'v3.12'

Linux 3.12
2013-11-26 17:32:55 -05:00
Geyslan G. Bem
8e645c345a selinux: fix possible memory leak
Free 'ctx_str' when necessary.

Signed-off-by: Geyslan G. Bem <geyslan@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
2013-11-25 17:00:33 -05:00
Roberto Sassu
dbc335d2dc ima: make a copy of template_fmt in template_desc_init_fields()
This patch makes a copy of the 'template_fmt' function argument so that
the latter will not be modified by strsep(), which does the splitting by
replacing the given separator with '\0'.

 IMA: No TPM chip found, activating TPM-bypass!
 Unable to handle kernel pointer dereference at virtual kernel address 0000000000842000
 Oops: 0004 [#1] SMP
 Modules linked in:
 CPU: 3 PID: 1 Comm: swapper/0 Not tainted 3.12.0-rc2-00098-g3ce1217d6cd5 #17
 task: 000000003ffa0000 ti: 000000003ff84000 task.ti: 000000003ff84000
 Krnl PSW : 0704e00180000000 000000000044bf88 (strsep+0x7c/0xa0)
            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 EA:3
 Krnl GPRS: 000000000000007c 000000000000007c 000000003ff87d90 0000000000821fd8
            0000000000000000 000000000000007c 0000000000aa37e0 0000000000aa9008
            0000000000000051 0000000000a114d8 0000000100000002 0000000000842bde
            0000000000842bdf 00000000006f97f0 000000000040062c 000000003ff87cf0
 Krnl Code: 000000000044bf7c: a7f4000a           brc     15,44bf90
            000000000044bf80: b90200cc           ltgr    %r12,%r12
           #000000000044bf84: a7840006           brc     8,44bf90
           >000000000044bf88: 9200c000           mvi     0(%r12),0
            000000000044bf8c: 41c0c001           la      %r12,1(%r12)
            000000000044bf90: e3c020000024       stg     %r12,0(%r2)
            000000000044bf96: b904002b           lgr     %r2,%r11
            000000000044bf9a: ebbcf0700004       lmg     %r11,%r12,112(%r15)
 Call Trace:
 ([<00000000004005fe>] ima_init_template+0xa2/0x1bc)
  [<0000000000a7c896>] ima_init+0x7a/0xa8
  [<0000000000a7c938>] init_ima+0x24/0x40
  [<00000000001000e8>] do_one_initcall+0x68/0x128
  [<0000000000a4eb56>] kernel_init_freeable+0x20a/0x2b4
  [<00000000006a1ff4>] kernel_init+0x30/0x178
  [<00000000006b69fe>] kernel_thread_starter+0x6/0xc
  [<00000000006b69f8>] kernel_thread_starter+0x0/0xc
 Last Breaking-Event-Address:
  [<000000000044bf42>] strsep+0x36/0xa0

Fixes commit: adf53a7 ima: new templates management mechanism

Changelog v1:
- make template_fmt 'const char *' (reported-by James Morris)
- fix kstrdup memory leak (reported-by James Morris)

Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2013-11-25 15:05:33 -05:00
Roberto Sassu
3e8e5503a3 ima: do not send field length to userspace for digest of ima template
This patch defines a new value for the 'ima_show_type' enumerator
(IMA_SHOW_BINARY_NO_FIELD_LEN) to prevent that the field length
is transmitted through the 'binary_runtime_measurements' interface
for the digest field of the 'ima' template.

Fixes commit: 3ce1217 ima: define template fields library and new helpers

Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
2013-11-25 07:31:14 -05:00
Roberto Sassu
b6f8f16f41 ima: do not include field length in template digest calc for ima template
To maintain compatibility with userspace tools, the field length must not
be included in the template digest calculation for the 'ima' template.

Fixes commit: a71dc65 ima: switch to new template management mechanism

Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
2013-11-25 07:26:28 -05:00
Linus Torvalds
34ef7bd382 Revert "ima: define '_ima' as a builtin 'trusted' keyring"
This reverts commit 217091dd7a, which
caused the following build error:

  security/integrity/digsig.c:70:5: error: redefinition of ‘integrity_init_keyring’
  security/integrity/integrity.h:149:12: note: previous definition of ‘integrity_init_keyring’ w
  security/integrity/integrity.h:149:12: warning: ‘integrity_init_keyring’ defined but not used

reported by Krzysztof Kolasa. Mimi says:

 "I made the classic mistake of requesting this patch to be upstreamed
  at the last second, rather than waiting until the next open window.

  At this point, the best course would probably be to revert the two
  commits and fix them for the next open window"

Reported-by: Krzysztof Kolasa <kkolasa@winsoft.pl>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-23 16:36:35 -08:00
Eric Paris
fc582aef7d Linux 3.12
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.15 (GNU/Linux)
 
 iQEcBAABAgAGBQJSdt9HAAoJEHm+PkMAQRiGnzEH/345Keg5dp+oKACnokBfzOtp
 V0p3g5EBsGtzEVnV+1B96trczDUtWdDFFr5GfGSj565NBQpFyc+iZC1mC99RDJCs
 WUquGFqlLMK2aV0SbKwCO4K1rJ5A0TRVj0ZRJOUJUY7jwNf5Qahny0WBVjO/8qAY
 UvJK1rktBClhKdH53YtpDHHgXBeZ2LOrzt1fQ/AMpujGbZauGvnLdNOli5r2kCFK
 jzoOgFLvX+PHU/5/d4/QyJPeQNPva5hjk5Ho9UuSJYhnFtPO3EkD4XZLcpcbNEJb
 LqBvbnZWm6CS435lfU1l93RqQa5xMO9ITk0oe4h69syTSHwWk9aJ+ZTc/4Up+t8=
 =57MC
 -----END PGP SIGNATURE-----

Merge tag 'v3.12'

Linux 3.12

Conflicts:
	fs/exec.c
2013-11-22 18:57:54 -05:00
Linus Torvalds
78dc53c422 Merge branch 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem updates from James Morris:
 "In this patchset, we finally get an SELinux update, with Paul Moore
  taking over as maintainer of that code.

  Also a significant update for the Keys subsystem, as well as
  maintenance updates to Smack, IMA, TPM, and Apparmor"

and since I wanted to know more about the updates to key handling,
here's the explanation from David Howells on that:

 "Okay.  There are a number of separate bits.  I'll go over the big bits
  and the odd important other bit, most of the smaller bits are just
  fixes and cleanups.  If you want the small bits accounting for, I can
  do that too.

   (1) Keyring capacity expansion.

        KEYS: Consolidate the concept of an 'index key' for key access
        KEYS: Introduce a search context structure
        KEYS: Search for auth-key by name rather than target key ID
        Add a generic associative array implementation.
        KEYS: Expand the capacity of a keyring

     Several of the patches are providing an expansion of the capacity of a
     keyring.  Currently, the maximum size of a keyring payload is one page.
     Subtract a small header and then divide up into pointers, that only gives
     you ~500 pointers on an x86_64 box.  However, since the NFS idmapper uses
     a keyring to store ID mapping data, that has proven to be insufficient to
     the cause.

     Whatever data structure I use to handle the keyring payload, it can only
     store pointers to keys, not the keys themselves because several keyrings
     may point to a single key.  This precludes inserting, say, and rb_node
     struct into the key struct for this purpose.

     I could make an rbtree of records such that each record has an rb_node
     and a key pointer, but that would use four words of space per key stored
     in the keyring.  It would, however, be able to use much existing code.

     I selected instead a non-rebalancing radix-tree type approach as that
     could have a better space-used/key-pointer ratio.  I could have used the
     radix tree implementation that we already have and insert keys into it by
     their serial numbers, but that means any sort of search must iterate over
     the whole radix tree.  Further, its nodes are a bit on the capacious side
     for what I want - especially given that key serial numbers are randomly
     allocated, thus leaving a lot of empty space in the tree.

     So what I have is an associative array that internally is a radix-tree
     with 16 pointers per node where the index key is constructed from the key
     type pointer and the key description.  This means that an exact lookup by
     type+description is very fast as this tells us how to navigate directly to
     the target key.

     I made the data structure general in lib/assoc_array.c as far as it is
     concerned, its index key is just a sequence of bits that leads to a
     pointer.  It's possible that someone else will be able to make use of it
     also.  FS-Cache might, for example.

   (2) Mark keys as 'trusted' and keyrings as 'trusted only'.

        KEYS: verify a certificate is signed by a 'trusted' key
        KEYS: Make the system 'trusted' keyring viewable by userspace
        KEYS: Add a 'trusted' flag and a 'trusted only' flag
        KEYS: Separate the kernel signature checking keyring from module signing

     These patches allow keys carrying asymmetric public keys to be marked as
     being 'trusted' and allow keyrings to be marked as only permitting the
     addition or linkage of trusted keys.

     Keys loaded from hardware during kernel boot or compiled into the kernel
     during build are marked as being trusted automatically.  New keys can be
     loaded at runtime with add_key().  They are checked against the system
     keyring contents and if their signatures can be validated with keys that
     are already marked trusted, then they are marked trusted also and can
     thus be added into the master keyring.

     Patches from Mimi Zohar make this usable with the IMA keyrings also.

   (3) Remove the date checks on the key used to validate a module signature.

        X.509: Remove certificate date checks

     It's not reasonable to reject a signature just because the key that it was
     generated with is no longer valid datewise - especially if the kernel
     hasn't yet managed to set the system clock when the first module is
     loaded - so just remove those checks.

   (4) Make it simpler to deal with additional X.509 being loaded into the kernel.

        KEYS: Load *.x509 files into kernel keyring
        KEYS: Have make canonicalise the paths of the X.509 certs better to deduplicate

     The builder of the kernel now just places files with the extension ".x509"
     into the kernel source or build trees and they're concatenated by the
     kernel build and stuffed into the appropriate section.

   (5) Add support for userspace kerberos to use keyrings.

        KEYS: Add per-user_namespace registers for persistent per-UID kerberos caches
        KEYS: Implement a big key type that can save to tmpfs

     Fedora went to, by default, storing kerberos tickets and tokens in tmpfs.
     We looked at storing it in keyrings instead as that confers certain
     advantages such as tickets being automatically deleted after a certain
     amount of time and the ability for the kernel to get at these tokens more
     easily.

     To make this work, two things were needed:

     (a) A way for the tickets to persist beyond the lifetime of all a user's
         sessions so that cron-driven processes can still use them.

         The problem is that a user's session keyrings are deleted when the
         session that spawned them logs out and the user's user keyring is
         deleted when the UID is deleted (typically when the last log out
         happens), so neither of these places is suitable.

         I've added a system keyring into which a 'persistent' keyring is
         created for each UID on request.  Each time a user requests their
         persistent keyring, the expiry time on it is set anew.  If the user
         doesn't ask for it for, say, three days, the keyring is automatically
         expired and garbage collected using the existing gc.  All the kerberos
         tokens it held are then also gc'd.

     (b) A key type that can hold really big tickets (up to 1MB in size).

         The problem is that Active Directory can return huge tickets with lots
         of auxiliary data attached.  We don't, however, want to eat up huge
         tracts of unswappable kernel space for this, so if the ticket is
         greater than a certain size, we create a swappable shmem file and dump
         the contents in there and just live with the fact we then have an
         inode and a dentry overhead.  If the ticket is smaller than that, we
         slap it in a kmalloc()'d buffer"

* 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (121 commits)
  KEYS: Fix keyring content gc scanner
  KEYS: Fix error handling in big_key instantiation
  KEYS: Fix UID check in keyctl_get_persistent()
  KEYS: The RSA public key algorithm needs to select MPILIB
  ima: define '_ima' as a builtin 'trusted' keyring
  ima: extend the measurement list to include the file signature
  kernel/system_certificate.S: use real contents instead of macro GLOBAL()
  KEYS: fix error return code in big_key_instantiate()
  KEYS: Fix keyring quota misaccounting on key replacement and unlink
  KEYS: Fix a race between negating a key and reading the error set
  KEYS: Make BIG_KEYS boolean
  apparmor: remove the "task" arg from may_change_ptraced_domain()
  apparmor: remove parent task info from audit logging
  apparmor: remove tsk field from the apparmor_audit_struct
  apparmor: fix capability to not use the current task, during reporting
  Smack: Ptrace access check mode
  ima: provide hash algo info in the xattr
  ima: enable support for larger default filedata hash algorithms
  ima: define kernel parameter 'ima_template=' to change configured default
  ima: add Kconfig default measurement list template
  ...
2013-11-21 19:46:00 -08:00
Linus Torvalds
3eaded86ac Merge git://git.infradead.org/users/eparis/audit
Pull audit updates from Eric Paris:
 "Nothing amazing.  Formatting, small bug fixes, couple of fixes where
  we didn't get records due to some old VFS changes, and a change to how
  we collect execve info..."

Fixed conflict in fs/exec.c as per Eric and linux-next.

* git://git.infradead.org/users/eparis/audit: (28 commits)
  audit: fix type of sessionid in audit_set_loginuid()
  audit: call audit_bprm() only once to add AUDIT_EXECVE information
  audit: move audit_aux_data_execve contents into audit_context union
  audit: remove unused envc member of audit_aux_data_execve
  audit: Kill the unused struct audit_aux_data_capset
  audit: do not reject all AUDIT_INODE filter types
  audit: suppress stock memalloc failure warnings since already managed
  audit: log the audit_names record type
  audit: add child record before the create to handle case where create fails
  audit: use given values in tty_audit enable api
  audit: use nlmsg_len() to get message payload length
  audit: use memset instead of trying to initialize field by field
  audit: fix info leak in AUDIT_GET requests
  audit: update AUDIT_INODE filter rule to comparator function
  audit: audit feature to set loginuid immutable
  audit: audit feature to only allow unsetting the loginuid
  audit: allow unsetting the loginuid (with priv)
  audit: remove CONFIG_AUDIT_LOGINUID_IMMUTABLE
  audit: loginuid functions coding style
  selinux: apply selinux checks on new audit message types
  ...
2013-11-21 19:18:14 -08:00
Tim Gardner
b5495b4217 SELinux: security_load_policy: Silence frame-larger-than warning
Dynamically allocate a couple of the larger stack variables in order to
reduce the stack footprint below 1024. gcc-4.8

security/selinux/ss/services.c: In function 'security_load_policy':
security/selinux/ss/services.c:1964:1: warning: the frame size of 1104 bytes is larger than 1024 bytes [-Wframe-larger-than=]
 }

Also silence a couple of checkpatch warnings at the same time.

WARNING: sizeof policydb should be sizeof(policydb)
+	memcpy(oldpolicydb, &policydb, sizeof policydb);

WARNING: sizeof policydb should be sizeof(policydb)
+	memcpy(&policydb, newpolicydb, sizeof policydb);

Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Eric Paris <eparis@parisplace.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
2013-11-19 17:35:18 -05:00
Richard Haines
a660bec1d8 SELinux: Update policy version to support constraints info
Update the policy version (POLICYDB_VERSION_CONSTRAINT_NAMES) to allow
holding of policy source info for constraints.

Signed-off-by: Richard Haines <richard_c_haines@btinternet.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
2013-11-19 17:34:23 -05:00
David Howells
62fe318256 KEYS: Fix keyring content gc scanner
Key pointers stored in the keyring are marked in bit 1 to indicate if they
point to a keyring.  We need to strip off this bit before using the pointer
when iterating over the keyring for the purpose of looking for links to garbage
collect.

This means that expirable keyrings aren't correctly expiring because the
checker is seeing their key pointer with 2 added to it.

Since the fix for this involves knowing about the internals of the keyring,
key_gc_keyring() is moved to keyring.c and merged into keyring_gc().

This can be tested by:

	echo 2 >/proc/sys/kernel/keys/gc_delay
	keyctl timeout `keyctl add keyring qwerty "" @s` 2
	cat /proc/keys
	sleep 5; cat /proc/keys

which should see a keyring called "qwerty" appear in the session keyring and
then disappear after it expires, and:

	echo 2 >/proc/sys/kernel/keys/gc_delay
	a=`keyctl get_persistent @s`
	b=`keyctl add keyring 0 "" $a`
	keyctl add user a a $b
	keyctl timeout $b 2
	cat /proc/keys
	sleep 5; cat /proc/keys

which should see a keyring called "0" with a key called "a" in it appear in the
user's persistent keyring (which will be attached to the session keyring) and
then both the "0" keyring and the "a" key should disappear when the "0" keyring
expires.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Simo Sorce <simo@redhat.com>
2013-11-14 14:09:53 +00:00
David Howells
97826c821e KEYS: Fix error handling in big_key instantiation
In the big_key_instantiate() function we return 0 if kernel_write() returns us
an error rather than returning an error.  This can potentially lead to
dentry_open() giving a BUG when called from big_key_read() with an unset
tmpfile path.

	------------[ cut here ]------------
	kernel BUG at fs/open.c:798!
	...
	RIP: 0010:[<ffffffff8119bbd1>] dentry_open+0xd1/0xe0
	...
	Call Trace:
	 [<ffffffff812350c5>] big_key_read+0x55/0x100
	 [<ffffffff81231084>] keyctl_read_key+0xb4/0xe0
	 [<ffffffff81231e58>] SyS_keyctl+0xf8/0x1d0
	 [<ffffffff815bb799>] system_call_fastpath+0x16/0x1b


Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Stephen Gallagher <sgallagh@redhat.com>
2013-11-13 16:51:06 +00:00
Linus Torvalds
42a2d923cc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) The addition of nftables.  No longer will we need protocol aware
    firewall filtering modules, it can all live in userspace.

    At the core of nftables is a, for lack of a better term, virtual
    machine that executes byte codes to inspect packet or metadata
    (arriving interface index, etc.) and make verdict decisions.

    Besides support for loading packet contents and comparing them, the
    interpreter supports lookups in various datastructures as
    fundamental operations.  For example sets are supports, and
    therefore one could create a set of whitelist IP address entries
    which have ACCEPT verdicts attached to them, and use the appropriate
    byte codes to do such lookups.

    Since the interpreted code is composed in userspace, userspace can
    do things like optimize things before giving it to the kernel.

    Another major improvement is the capability of atomically updating
    portions of the ruleset.  In the existing netfilter implementation,
    one has to update the entire rule set in order to make a change and
    this is very expensive.

    Userspace tools exist to create nftables rules using existing
    netfilter rule sets, but both kernel implementations will need to
    co-exist for quite some time as we transition from the old to the
    new stuff.

    Kudos to Patrick McHardy, Pablo Neira Ayuso, and others who have
    worked so hard on this.

 2) Daniel Borkmann and Hannes Frederic Sowa made several improvements
    to our pseudo-random number generator, mostly used for things like
    UDP port randomization and netfitler, amongst other things.

    In particular the taus88 generater is updated to taus113, and test
    cases are added.

 3) Support 64-bit rates in HTB and TBF schedulers, from Eric Dumazet
    and Yang Yingliang.

 4) Add support for new 577xx tigon3 chips to tg3 driver, from Nithin
    Sujir.

 5) Fix two fatal flaws in TCP dynamic right sizing, from Eric Dumazet,
    Neal Cardwell, and Yuchung Cheng.

 6) Allow IP_TOS and IP_TTL to be specified in sendmsg() ancillary
    control message data, much like other socket option attributes.
    From Francesco Fusco.

 7) Allow applications to specify a cap on the rate computed
    automatically by the kernel for pacing flows, via a new
    SO_MAX_PACING_RATE socket option.  From Eric Dumazet.

 8) Make the initial autotuned send buffer sizing in TCP more closely
    reflect actual needs, from Eric Dumazet.

 9) Currently early socket demux only happens for TCP sockets, but we
    can do it for connected UDP sockets too.  Implementation from Shawn
    Bohrer.

10) Refactor inet socket demux with the goal of improving hash demux
    performance for listening sockets.  With the main goals being able
    to use RCU lookups on even request sockets, and eliminating the
    listening lock contention.  From Eric Dumazet.

11) The bonding layer has many demuxes in it's fast path, and an RCU
    conversion was started back in 3.11, several changes here extend the
    RCU usage to even more locations.  From Ding Tianhong and Wang
    Yufen, based upon suggestions by Nikolay Aleksandrov and Veaceslav
    Falico.

12) Allow stackability of segmentation offloads to, in particular, allow
    segmentation offloading over tunnels.  From Eric Dumazet.

13) Significantly improve the handling of secret keys we input into the
    various hash functions in the inet hashtables, TCP fast open, as
    well as syncookies.  From Hannes Frederic Sowa.  The key fundamental
    operation is "net_get_random_once()" which uses static keys.

    Hannes even extended this to ipv4/ipv6 fragmentation handling and
    our generic flow dissector.

14) The generic driver layer takes care now to set the driver data to
    NULL on device removal, so it's no longer necessary for drivers to
    explicitly set it to NULL any more.  Many drivers have been cleaned
    up in this way, from Jingoo Han.

15) Add a BPF based packet scheduler classifier, from Daniel Borkmann.

16) Improve CRC32 interfaces and generic SKB checksum iterators so that
    SCTP's checksumming can more cleanly be handled.  Also from Daniel
    Borkmann.

17) Add a new PMTU discovery mode, IP_PMTUDISC_INTERFACE, which forces
    using the interface MTU value.  This helps avoid PMTU attacks,
    particularly on DNS servers.  From Hannes Frederic Sowa.

18) Use generic XPS for transmit queue steering rather than internal
    (re-)implementation in virtio-net.  From Jason Wang.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1622 commits)
  random32: add test cases for taus113 implementation
  random32: upgrade taus88 generator to taus113 from errata paper
  random32: move rnd_state to linux/random.h
  random32: add prandom_reseed_late() and call when nonblocking pool becomes initialized
  random32: add periodic reseeding
  random32: fix off-by-one in seeding requirement
  PHY: Add RTL8201CP phy_driver to realtek
  xtsonic: add missing platform_set_drvdata() in xtsonic_probe()
  macmace: add missing platform_set_drvdata() in mace_probe()
  ethernet/arc/arc_emac: add missing platform_set_drvdata() in arc_emac_probe()
  ipv6: protect for_each_sk_fl_rcu in mem_check with rcu_read_lock_bh
  vlan: Implement vlan_dev_get_egress_qos_mask as an inline.
  ixgbe: add warning when max_vfs is out of range.
  igb: Update link modes display in ethtool
  netfilter: push reasm skb through instead of original frag skbs
  ip6_output: fragment outgoing reassembled skb properly
  MAINTAINERS: mv643xx_eth: take over maintainership from Lennart
  net_sched: tbf: support of 64bit rates
  ixgbe: deleting dfwd stations out of order can cause null ptr deref
  ixgbe: fix build err, num_rx_queues is only available with CONFIG_RPS
  ...
2013-11-13 17:40:34 +09:00
Linus Torvalds
a998646456 Merge branch 'for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup changes from Tejun Heo:
 "Not too much activity this time around.  css_id is finally killed and
  a minor update to device_cgroup"

* 'for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  device_cgroup: remove can_attach
  cgroup: kill css_id
  memcg: stop using css id
  memcg: fail to create cgroup if the cgroup id is too big
  memcg: convert to use cgroup id
  memcg: convert to use cgroup_is_descendant()
2013-11-13 15:21:53 +09:00
Paul Moore
94851b18d4 Linux 3.12
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.15 (GNU/Linux)
 
 iQEcBAABAgAGBQJSdt9HAAoJEHm+PkMAQRiGnzEH/345Keg5dp+oKACnokBfzOtp
 V0p3g5EBsGtzEVnV+1B96trczDUtWdDFFr5GfGSj565NBQpFyc+iZC1mC99RDJCs
 WUquGFqlLMK2aV0SbKwCO4K1rJ5A0TRVj0ZRJOUJUY7jwNf5Qahny0WBVjO/8qAY
 UvJK1rktBClhKdH53YtpDHHgXBeZ2LOrzt1fQ/AMpujGbZauGvnLdNOli5r2kCFK
 jzoOgFLvX+PHU/5/d4/QyJPeQNPva5hjk5Ho9UuSJYhnFtPO3EkD4XZLcpcbNEJb
 LqBvbnZWm6CS435lfU1l93RqQa5xMO9ITk0oe4h69syTSHwWk9aJ+ZTc/4Up+t8=
 =57MC
 -----END PGP SIGNATURE-----

Merge tag 'v3.12'

Linux 3.12
2013-11-08 13:56:38 -05:00
David Howells
fbf8c53f1a KEYS: Fix UID check in keyctl_get_persistent()
If the UID is specified by userspace when calling the KEYCTL_GET_PERSISTENT
function and the process does not have the CAP_SETUID capability, then the
function will return -EPERM if the current process's uid, suid, euid and fsuid
all match the requested UID.  This is incorrect.

Fix it such that when a non-privileged caller requests a persistent keyring by
a specific UID they can only request their own (ie. the specified UID matches
either then process's UID or the process's EUID).

This can be tested by logging in as the user and doing:

	keyctl get_persistent @p
	keyctl get_persistent @p `id -u`
	keyctl get_persistent @p 0

The first two should successfully print the same key ID.  The third should do
the same if called by UID 0 or indicate Operation Not Permitted otherwise.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Stephen Gallagher <sgallagh@redhat.com>
2013-11-06 14:01:51 +00:00
Richard Guy Briggs
a20b62bdf7 audit: suppress stock memalloc failure warnings since already managed
Supress the stock memory allocation failure warnings for audit buffers
since audit alreay takes care of memory allocation failure warnings, including
rate-limiting, in audit_log_start().

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05 11:09:11 -05:00
Eric Paris
b805b198dc selinux: apply selinux checks on new audit message types
We use the read check to get the feature set (like AUDIT_GET) and the
write check to set the features (like AUDIT_SET).

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05 11:07:35 -05:00
Mimi Zohar
217091dd7a ima: define '_ima' as a builtin 'trusted' keyring
Require all keys added to the IMA keyring be signed by an
existing trusted key on the system trusted keyring.

Changelog:
- define stub integrity_init_keyring() function (reported-by Fengguang Wu)
- differentiate between regular and trusted keyring names.
- replace printk with pr_info (D. Kasatkin)

Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
2013-10-31 20:20:48 -04:00
Mimi Zohar
bcbc9b0cf6 ima: extend the measurement list to include the file signature
This patch defines a new template called 'ima-sig', which includes
the file signature in the template data, in addition to the file's
digest and pathname.

A template is composed of a set of fields.  Associated with each
field is an initialization and display function.  This patch defines
a new template field called 'sig', the initialization function
ima_eventsig_init(), and the display function ima_show_template_sig().

This patch modifies the .field_init() function definition to include
the 'security.ima' extended attribute and length.

Changelog:
- remove unused code (Dmitry Kasatkin)
- avoid calling ima_write_template_field_data() unnecesarily (Roberto Sassu)
- rename DATA_FMT_SIG to DATA_FMT_HEX
- cleanup ima_eventsig_init() based on Roberto's comments

Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
2013-10-31 20:19:35 -04:00
James Morris
42a20ba5c9 Merge branch 'keys-devel' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs into ra-next 2013-10-31 09:46:36 +11:00
Wei Yongjun
d2b8697024 KEYS: fix error return code in big_key_instantiate()
Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David Howells <dhowells@redhat.com>
2013-10-30 12:54:29 +00:00
David Howells
034faeb9ef KEYS: Fix keyring quota misaccounting on key replacement and unlink
If a key is displaced from a keyring by a matching one, then four more bytes
of quota are allocated to the keyring - despite the fact that the keyring does
not change in size.

Further, when a key is unlinked from a keyring, the four bytes of quota
allocated the link isn't recovered and returned to the user's pool.

The first can be tested by repeating:

	keyctl add big_key a fred @s
	cat /proc/key-users

(Don't put it in a shell loop otherwise the garbage collector won't have time
to clear the displaced keys, thus affecting the result).

This was causing the kerberos keyring to run out of room fairly quickly.

The second can be tested by:

	cat /proc/key-users
	a=`keyctl add user a a @s`
	cat /proc/key-users
	keyctl unlink $a
	sleep 1 # Give RCU a chance to delete the key
	cat /proc/key-users

assuming no system activity that otherwise adds/removes keys, the amount of
key data allocated should go up (say 40/20000 -> 47/20000) and then return to
the original value at the end.

Reported-by: Stephen Gallagher <sgallagh@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2013-10-30 11:15:24 +00:00
David Howells
74792b0001 KEYS: Fix a race between negating a key and reading the error set
key_reject_and_link() marking a key as negative and setting the error with
which it was negated races with keyring searches and other things that read
that error.

The fix is to switch the order in which the assignments are done in
key_reject_and_link() and to use memory barriers.

Kudos to Dave Wysochanski <dwysocha@redhat.com> and Scott Mayhew
<smayhew@redhat.com> for tracking this down.

This may be the cause of:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000070
IP: [<ffffffff81219011>] wait_for_key_construction+0x31/0x80
PGD c6b2c3067 PUD c59879067 PMD 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map
CPU 0
Modules linked in: ...

Pid: 13359, comm: amqzxma0 Not tainted 2.6.32-358.20.1.el6.x86_64 #1 IBM System x3650 M3 -[7945PSJ]-/00J6159
RIP: 0010:[<ffffffff81219011>] wait_for_key_construction+0x31/0x80
RSP: 0018:ffff880c6ab33758  EFLAGS: 00010246
RAX: ffffffff81219080 RBX: 0000000000000000 RCX: 0000000000000002
RDX: ffffffff81219060 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff880c6ab33768 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: ffff880adfcbce40
R13: ffffffffa03afb84 R14: ffff880adfcbce40 R15: ffff880adfcbce43
FS:  00007f29b8042700(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000070 CR3: 0000000c613dc000 CR4: 00000000000007f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process amqzxma0 (pid: 13359, threadinfo ffff880c6ab32000, task ffff880c610deae0)
Stack:
 ffff880adfcbce40 0000000000000000 ffff880c6ab337b8 ffffffff81219695
<d> 0000000000000000 ffff880a000000d0 ffff880c6ab337a8 000000000000000f
<d> ffffffffa03afb93 000000000000000f ffff88186c7882c0 0000000000000014
Call Trace:
 [<ffffffff81219695>] request_key+0x65/0xa0
 [<ffffffffa03a0885>] nfs_idmap_request_key+0xc5/0x170 [nfs]
 [<ffffffffa03a0eb4>] nfs_idmap_lookup_id+0x34/0x80 [nfs]
 [<ffffffffa03a1255>] nfs_map_group_to_gid+0x75/0xa0 [nfs]
 [<ffffffffa039a9ad>] decode_getfattr_attrs+0xbdd/0xfb0 [nfs]
 [<ffffffff81057310>] ? __dequeue_entity+0x30/0x50
 [<ffffffff8100988e>] ? __switch_to+0x26e/0x320
 [<ffffffffa039ae03>] decode_getfattr+0x83/0xe0 [nfs]
 [<ffffffffa039b610>] ? nfs4_xdr_dec_getattr+0x0/0xa0 [nfs]
 [<ffffffffa039b69f>] nfs4_xdr_dec_getattr+0x8f/0xa0 [nfs]
 [<ffffffffa02dada4>] rpcauth_unwrap_resp+0x84/0xb0 [sunrpc]
 [<ffffffffa039b610>] ? nfs4_xdr_dec_getattr+0x0/0xa0 [nfs]
 [<ffffffffa02cf923>] call_decode+0x1b3/0x800 [sunrpc]
 [<ffffffff81096de0>] ? wake_bit_function+0x0/0x50
 [<ffffffffa02cf770>] ? call_decode+0x0/0x800 [sunrpc]
 [<ffffffffa02d99a7>] __rpc_execute+0x77/0x350 [sunrpc]
 [<ffffffff81096c67>] ? bit_waitqueue+0x17/0xd0
 [<ffffffffa02d9ce1>] rpc_execute+0x61/0xa0 [sunrpc]
 [<ffffffffa02d03a5>] rpc_run_task+0x75/0x90 [sunrpc]
 [<ffffffffa02d04c2>] rpc_call_sync+0x42/0x70 [sunrpc]
 [<ffffffffa038ff80>] _nfs4_call_sync+0x30/0x40 [nfs]
 [<ffffffffa038836c>] _nfs4_proc_getattr+0xac/0xc0 [nfs]
 [<ffffffff810aac87>] ? futex_wait+0x227/0x380
 [<ffffffffa038b856>] nfs4_proc_getattr+0x56/0x80 [nfs]
 [<ffffffffa0371403>] __nfs_revalidate_inode+0xe3/0x220 [nfs]
 [<ffffffffa037158e>] nfs_revalidate_mapping+0x4e/0x170 [nfs]
 [<ffffffffa036f147>] nfs_file_read+0x77/0x130 [nfs]
 [<ffffffff811811aa>] do_sync_read+0xfa/0x140
 [<ffffffff81096da0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8100bb8e>] ? apic_timer_interrupt+0xe/0x20
 [<ffffffff8100b9ce>] ? common_interrupt+0xe/0x13
 [<ffffffff81228ffb>] ? selinux_file_permission+0xfb/0x150
 [<ffffffff8121bed6>] ? security_file_permission+0x16/0x20
 [<ffffffff81181a95>] vfs_read+0xb5/0x1a0
 [<ffffffff81181bd1>] sys_read+0x51/0x90
 [<ffffffff810dc685>] ? __audit_syscall_exit+0x265/0x290
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Dave Wysochanski <dwysocha@redhat.com>
cc: Scott Mayhew <smayhew@redhat.com>
2013-10-30 11:15:24 +00:00
Josh Boyer
2eaf6b5dca KEYS: Make BIG_KEYS boolean
Having the big_keys functionality as a module is very marginally useful.
The userspace code that would use this functionality will get odd error
messages from the keys layer if the module isn't loaded.  The code itself
is fairly small, so just have this as a boolean option and not a tristate.

Signed-off-by: Josh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: David Howells <dhowells@redhat.com>
2013-10-30 11:15:23 +00:00
Oleg Nesterov
51775fe736 apparmor: remove the "task" arg from may_change_ptraced_domain()
Unless task == current ptrace_parent(task) is not safe even under
rcu_read_lock() and most of the current users are not right.

So may_change_ptraced_domain(task) looks wrong as well. However it
is always called with task == current so the code is actually fine.
Remove this argument to make this fact clear.

Note: perhaps we should simply kill ptrace_parent(), it buys almost
nothing. And it is obviously racy, perhaps this should be fixed.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
2013-10-29 21:34:18 -07:00
John Johansen
4a7fc3018f apparmor: remove parent task info from audit logging
The reporting of the parent task info is a vestage from old versions of
apparmor. The need for this information was removed by unique null-
profiles before apparmor was upstreamed so remove this info from logging.

Signed-off-by: John Johansen <john.johansen@canonical.com>
2013-10-29 21:34:04 -07:00
John Johansen
61e3fb8aca apparmor: remove tsk field from the apparmor_audit_struct
Now that aa_capabile no longer sets the task field it can be removed
and the lsm_audit version of the field can be used.

Signed-off-by: John Johansen <john.johansen@canonical.com>
2013-10-29 21:33:52 -07:00
John Johansen
dd0c6e86f6 apparmor: fix capability to not use the current task, during reporting
Mediation is based off of the cred but auditing includes the current
task which may not be related to the actual request.

Signed-off-by: John Johansen <john.johansen@canonical.com>
2013-10-29 21:33:37 -07:00
James Morris
50b719f811 Merge branch 'smack-for-3.13' of git://git.gitorious.org/smack-next/kernel into ra-next 2013-10-30 14:07:10 +11:00
Casey Schaufler
b5dfd8075b Smack: Ptrace access check mode
When the ptrace security hooks were split the addition of
a mode parameter was not taken advantage of in the Smack
ptrace access check. This changes the access check from
always looking for read and write access to using the
passed mode. This will make use of /proc much happier.

Targeted for git://git.gitorious.org/smack-next/kernel.git

Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2013-10-28 10:23:36 -07:00
Dmitry Kasatkin
3ea7a56067 ima: provide hash algo info in the xattr
All files labeled with 'security.ima' hashes, are hashed using the
same hash algorithm.  Changing from one hash algorithm to another,
requires relabeling the filesystem.  This patch defines a new xattr
type, which includes the hash algorithm, permitting different files
to be hashed with different algorithms.

Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
2013-10-26 21:32:55 -04:00
Mimi Zohar
e7a2ad7eb6 ima: enable support for larger default filedata hash algorithms
The IMA measurement list contains two hashes - a template data hash
and a filedata hash.  The template data hash is committed to the TPM,
which is limited, by the TPM v1.2 specification, to 20 bytes.  The
filedata hash is defined as 20 bytes as well.

Now that support for variable length measurement list templates was
added, the filedata hash is not limited to 20 bytes.  This patch adds
Kconfig support for defining larger default filedata hash algorithms
and replacing the builtin default with one specified on the kernel
command line.

<uapi/linux/hash_info.h> contains a list of hash algorithms.  The
Kconfig default hash algorithm is a subset of this list, but any hash
algorithm included in the list can be specified at boot, using the
'ima_hash=' kernel command line option.

Changelog v2:
- update Kconfig

Changelog:
- support hashes that are configured
- use generic HASH_ALGO_ definitions
- add Kconfig support
- hash_setup must be called only once (Dmitry)
- removed trailing whitespaces (Roberto Sassu)

Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
2013-10-26 21:32:55 -04:00
Roberto Sassu
9b9d4ce592 ima: define kernel parameter 'ima_template=' to change configured default
This patch allows users to specify from the kernel command line the
template descriptor, among those defined, that will be used to generate
and display measurement entries. If an user specifies a wrong template,
IMA reverts to the template descriptor set in the kernel configuration.

Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
2013-10-26 21:32:54 -04:00
Mimi Zohar
4286587dcc ima: add Kconfig default measurement list template
This patch adds a Kconfig option to select the default IMA
measurement list template.  The 'ima' template limited the
filedata hash to 20 bytes and the pathname to 255 charaters.
The 'ima-ng' measurement list template permits larger hash
digests and longer pathnames.

Changelog:
- keep 'select CRYPTO_HASH_INFO' in 'config IMA' section (Kconfig)
  (Roberto Sassu);
- removed trailing whitespaces (Roberto Sassu).
- Lindent fixes

Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
2013-10-26 21:32:54 -04:00
Roberto Sassu
add1c05dce ima: defer determining the appraisal hash algorithm for 'ima' template
The same hash algorithm should be used for calculating the file
data hash for the IMA measurement list, as for appraising the file
data integrity.  (The appraise hash algorithm is stored in the
'security.ima' extended attribute.)  The exception is when the
reference file data hash digest, stored in the extended attribute,
is larger than the one supported by the template.  In this case,
the file data hash needs to be calculated twice, once for the
measurement list and, again, for appraisal.

Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
2013-10-26 21:32:53 -04:00
Mimi Zohar
5278aa52f3 ima: add audit log support for larger hashes
Different files might be signed based on different hash algorithms.
This patch prefixes the audit log measurement hash with the hash
algorithm.

Changelog:
- use generic HASH_ALGO defintions
- use ':' as delimiter between the hash algorithm and the digest
  (Roberto Sassu)
- always include the hash algorithm used when audit-logging a measurement

Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Signed-off-by: Peter Moody <pmoody@google.com>
2013-10-26 21:32:46 -04:00
Roberto Sassu
a71dc65d30 ima: switch to new template management mechanism
This patch performs the switch to the new template mechanism by modifying
the functions ima_alloc_init_template(), ima_measurements_show() and
ima_ascii_measurements_show(). The old function ima_template_show() was
removed as it is no longer needed. Also, if the template descriptor used
to generate a measurement entry is not 'ima', the whole length of field
data stored for an entry is provided before the data itself through the
binary_runtime_measurement interface.

Changelog:
- unnecessary to use strncmp() (Mimi Zohar)
- create new variable 'field' in ima_alloc_init_template() (Roberto Sassu)
- use GFP_NOFS flag in ima_alloc_init_template() (Roberto Sassu)
- new variable 'num_fields' in ima_store_template() (Roberto Sassu,
  proposed by Mimi Zohar)
- rename ima_calc_buffer_hash/template_hash() to ima_calc_field_array_hash(),
  something more generic (Mimi, requested by Dmitry)
- sparse error fix - Fengguang Wu
- fix lindent warnings
- always include the field length in the template data length
- include the template field length variable size in the template data length
- include both the template field data and field length in the template digest
  calculation. Simplifies verifying the template digest. (Mimi)

Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
2013-10-25 17:17:06 -04:00
Roberto Sassu
4d7aeee73f ima: define new template ima-ng and template fields d-ng and n-ng
This patch adds support for the new template 'ima-ng', whose format
is defined as 'd-ng|n-ng'.  These new field definitions remove the
size limitations of the original 'ima' template.  Further, the 'd-ng'
field prefixes the inode digest with the hash algorithim, when
displaying the new larger digest sizes.

Change log:
- scripts/Lindent fixes  - Mimi
- "always true comparison" - reported by Fengguang Wu, resolved Dmitry
- initialize hash_algo variable to HASH_ALGO__LAST
- always prefix digest with hash algorithm - Mimi

Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
2013-10-25 17:17:05 -04:00
Roberto Sassu
3ce1217d6c ima: define template fields library and new helpers
This patch defines a library containing two initial template fields,
inode digest (d) and file name (n), the 'ima' template descriptor,
whose format is 'd|n', and two helper functions,
ima_write_template_field_data() and ima_show_template_field_data().

Changelog:
- replace ima_eventname_init() parameter NULL checking with BUG_ON.
  (suggested by Mimi)
- include "new template fields for inode digest (d) and file name (n)"
  definitions to fix a compiler warning.  - Mimi
- unnecessary to prefix static function names with 'ima_'. remove
  prefix to resolve Lindent formatting changes. - Mimi
- abbreviated/removed inline comments - Mimi
- always send the template field length - Mimi

Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
2013-10-25 17:17:05 -04:00
Roberto Sassu
adf53a778a ima: new templates management mechanism
The original 'ima' template is fixed length, containing the filedata hash
and pathname.  The filedata hash is limited to 20 bytes (md5/sha1).  The
pathname is a null terminated string, limited to 255 characters.  To
overcome these limitations and to add additional file metadata, it is
necessary to extend the current version of IMA by defining additional
templates.

The main reason to introduce this feature is that, each time a new
template is defined, the functions that generate and display the
measurement list would include the code for handling a new format and,
thus, would significantly grow over time.

This patch set solves this problem by separating the template management
from the remaining IMA code. The core of this solution is the definition
of two new data structures: a template descriptor, to determine which
information should be included in the measurement list, and a template
field, to generate and display data of a given type.

To define a new template field, developers define the field identifier
and implement two functions, init() and show(), respectively to generate
and display measurement entries.  Initially, this patch set defines the
following template fields (support for additional data types will be
added later):
 - 'd': the digest of the event (i.e. the digest of a measured file),
        calculated with the SHA1 or MD5 hash algorithm;
 - 'n': the name of the event (i.e. the file name), with size up to
        255 bytes;
 - 'd-ng': the digest of the event, calculated with an arbitrary hash
           algorithm (field format: [<hash algo>:]digest, where the digest
           prefix is shown only if the hash algorithm is not SHA1 or MD5);
 - 'n-ng': the name of the event, without size limitations.

Defining a new template descriptor requires specifying the template format,
a string of field identifiers separated by the '|' character.  This patch
set defines the following template descriptors:
 - "ima": its format is 'd|n';
 - "ima-ng" (default): its format is 'd-ng|n-ng'

Further details about the new template architecture can be found in
Documentation/security/IMA-templates.txt.

Changelog:
- don't defer calling ima_init_template() - Mimi
- don't define ima_lookup_template_desc() until used - Mimi
- squashed with documentation patch - Mimi

Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
2013-10-25 17:17:04 -04:00
Roberto Sassu
7bc5f447ce ima: define new function ima_alloc_init_template() to API
Instead of allocating and initializing the template entry from multiple
places (eg. boot aggregate, violation, and regular measurements), this
patch defines a new function called ima_alloc_init_template().  The new
function allocates and initializes the measurement entry with the inode
digest and the filename.

In respect to the current behavior, it truncates the file name passed
in the 'filename' argument if the latter's size is greater than 255 bytes
and the passed file descriptor is NULL.

Changelog:
- initialize 'hash' variable for non TPM case - Mimi
- conform to expectation for 'iint' to be defined as a pointer. - Mimi
- add missing 'file' dependency for recalculating file hash. - Mimi

Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
2013-10-25 17:17:04 -04:00
Roberto Sassu
9803d413f4 ima: pass the filename argument up to ima_add_template_entry()
Pass the filename argument to ima_add_template_entry() in order to
eliminate a dependency on template specific data (third argument of
integrity_audit_msg).

This change is required because, with the new template management
mechanism, the generation of a new measurement entry will be performed
by new specific functions (introduced in next patches) and the current IMA
code will not be aware anymore of how data is stored in the entry payload.

Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
2013-10-25 17:17:03 -04:00
Roberto Sassu
7d802a227b ima: pass the file descriptor to ima_add_violation()
Pass the file descriptor instead of the inode to ima_add_violation(),
to make the latter consistent with ima_store_measurement() in
preparation for the new template architecture.

Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
2013-10-25 17:17:02 -04:00
Dmitry Kasatkin
09ef54359c ima: ima_calc_boot_agregate must use SHA1
With multiple hash algorithms, ima_hash_tfm is no longer guaranteed to be sha1.
Need to force to use sha1.

Changelog:
- pass ima_digest_data to ima_calc_boot_aggregate() instead of char *
  (Roberto Sassu);
- create an ima_digest_data structure in ima_add_boot_aggregate()
  (Roberto Sassu);
- pass hash->algo to ima_alloc_tfm() (Roberto Sassu, reported by Dmitry).
- "move hash definition in ima_add_boot_aggregate()" commit hunk to here.
- sparse warning fix - Fengguang Wu

Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
2013-10-25 17:17:02 -04:00
Dmitry Kasatkin
ea593993d3 ima: support arbitrary hash algorithms in ima_calc_buffer_hash
ima_calc_buffer_hash will be used with different hash algorithms.
This patch provides support for arbitrary hash algorithms in
ima_calc_buffer_hash.

Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
2013-10-25 17:17:01 -04:00
Dmitry Kasatkin
723326b927 ima: provide dedicated hash algo allocation function
This patch provides dedicated hash algo allocation and
deallocation function which can be used by different clients.

Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
2013-10-25 17:17:01 -04:00
Mimi Zohar
140d802240 ima: differentiate between template hash and file data hash sizes
The TPM v1.2 limits the template hash size to 20 bytes.  This
patch differentiates between the template hash size, as defined
in the ima_template_entry, and the file data hash size, as
defined in the ima_template_data.  Subsequent patches add support
for different file data hash algorithms.

Change log:
- hash digest definition in ima_store_template() should be TPM_DIGEST_SIZE

Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
2013-10-25 17:17:00 -04:00
Dmitry Kasatkin
a35c3fb649 ima: use dynamically allocated hash storage
For each inode in the IMA policy, an iint is allocated.  To support
larger hash digests, the iint digest size changed from 20 bytes to
the maximum supported hash digest size.  Instead of allocating the
maximum size, which most likely is not needed, this patch dynamically
allocates the needed hash storage.

Changelog:
- fix krealloc bug

Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
2013-10-25 17:17:00 -04:00
Dmitry Kasatkin
b1aaab22e2 ima: pass full xattr with the signature
For possibility to use xattr type for new signature formats,
pass full xattr to the signature verification function.

Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
2013-10-25 17:16:59 -04:00
Dmitry Kasatkin
d3634d0f42 ima: read and use signature hash algorithm
All files on the filesystem, currently, are hashed using the same hash
algorithm.  In preparation for files from different packages being
signed using different hash algorithms, this patch adds support for
reading the signature hash algorithm from the 'security.ima' extended
attribute and calculates the appropriate file data hash based on it.

Changelog:
- fix scripts Lindent and checkpatch msgs - Mimi
- fix md5 support for older version, which occupied 20 bytes in the
  xattr, not the expected 16 bytes.  Fix the comparison to compare
  only the first 16 bytes.

Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
2013-10-25 17:16:59 -04:00
Dmitry Kasatkin
c7c8bb237f ima: provide support for arbitrary hash algorithms
In preparation of supporting more hash algorithms with larger hash sizes
needed for signature verification, this patch replaces the 20 byte sized
digest, with a more flexible structure.  The new structure includes the
hash algorithm, digest size, and digest.

Changelog:
- recalculate filedata hash for the measurement list, if the signature
  hash digest size is greater than 20 bytes.
- use generic HASH_ALGO_
- make ima_calc_file_hash static
- scripts lindent and checkpatch fixes

Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
2013-10-25 17:16:58 -04:00
Mimi Zohar
08de59eb14 Revert "ima: policy for RAMFS"
This reverts commit 4c2c392763.

Everything in the initramfs should be measured and appraised,
but until the initramfs has extended attribute support, at
least measured.

Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Cc: Stable Kernel <stable@kernel.org>
2013-10-25 13:17:19 -04:00
Dmitry Kasatkin
089bc8e95a ima: fix script messages
Fix checkpatch, lindent, etc, warnings/errors

Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
2013-10-25 13:17:19 -04:00
Serge Hallyn
73ba353471 device_cgroup: remove can_attach
It is really only wanting to duplicate a check which is already done by the
cgroup subsystem.

With this patch, user jdoe still cannot move pid 1 into a devices cgroup
he owns, but now he can move his own other tasks into devices cgroups.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Aristeu Rozanski <aris@redhat.com>
2013-10-24 06:56:56 -04:00
David S. Miller
c3fa32b976 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/usb/qmi_wwan.c
	include/net/dst.h

Trivial merge conflicts, both were overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-23 16:49:34 -04:00
James Morris
6f799c97f3 Merge branch 'master' of git://git.infradead.org/users/pcmoore/selinux into ra-next 2013-10-22 22:26:41 +11:00
Casey Schaufler
c0ab6e56dc Smack: Implement lock security mode
Linux file locking does not follow the same rules
as other mechanisms. Even though it is a write operation
a process can set a read lock on files which it has open
only for read access. Two programs with read access to
a file can use read locks to communicate.

This is not acceptable in a Mandatory Access Control
environment. Smack treats setting a read lock as the
write operation that it is. Unfortunately, many programs
assume that setting a read lock is a read operation.
These programs are unhappy in the Smack environment.

This patch introduces a new access mode (lock) to address
this problem. A process with lock access to a file can
set a read lock. A process with write access to a file can
set a read lock or a write lock. This prevents a situation
where processes are granted write access just so they can
set read locks.

Targeted for git://git.gitorious.org/smack-next/kernel.git

Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2013-10-18 09:39:33 -07:00
John Johansen
ed2c7da3a4 apparmor: fix bad lock balance when introspecting policy
BugLink: http://bugs.launchpad.net/bugs/1235977

The profile introspection seq file has a locking bug when policy is viewed
from a virtual root (task in a policy namespace), introspection from the
real root is not affected.

The test for root
    while (parent) {
is correct for the real root, but incorrect for tasks in a policy namespace.
This allows the task to walk backup the policy tree past its virtual root
causing it to be unlocked before the virtual root should be in the p_stop
fn.

This results in the following lockdep back trace:
[   78.479744] [ BUG: bad unlock balance detected! ]
[   78.479792] 3.11.0-11-generic #17 Not tainted
[   78.479838] -------------------------------------
[   78.479885] grep/2223 is trying to release lock (&ns->lock) at:
[   78.479952] [<ffffffff817bf3be>] mutex_unlock+0xe/0x10
[   78.480002] but there are no more locks to release!
[   78.480037]
[   78.480037] other info that might help us debug this:
[   78.480037] 1 lock held by grep/2223:
[   78.480037]  #0:  (&p->lock){+.+.+.}, at: [<ffffffff812111bd>] seq_read+0x3d/0x3d0
[   78.480037]
[   78.480037] stack backtrace:
[   78.480037] CPU: 0 PID: 2223 Comm: grep Not tainted 3.11.0-11-generic #17
[   78.480037] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[   78.480037]  ffffffff817bf3be ffff880007763d60 ffffffff817b97ef ffff8800189d2190
[   78.480037]  ffff880007763d88 ffffffff810e1c6e ffff88001f044730 ffff8800189d2190
[   78.480037]  ffffffff817bf3be ffff880007763e00 ffffffff810e5bd6 0000000724fe56b7
[   78.480037] Call Trace:
[   78.480037]  [<ffffffff817bf3be>] ? mutex_unlock+0xe/0x10
[   78.480037]  [<ffffffff817b97ef>] dump_stack+0x54/0x74
[   78.480037]  [<ffffffff810e1c6e>] print_unlock_imbalance_bug+0xee/0x100
[   78.480037]  [<ffffffff817bf3be>] ? mutex_unlock+0xe/0x10
[   78.480037]  [<ffffffff810e5bd6>] lock_release_non_nested+0x226/0x300
[   78.480037]  [<ffffffff817bf2fe>] ? __mutex_unlock_slowpath+0xce/0x180
[   78.480037]  [<ffffffff817bf3be>] ? mutex_unlock+0xe/0x10
[   78.480037]  [<ffffffff810e5d5c>] lock_release+0xac/0x310
[   78.480037]  [<ffffffff817bf2b3>] __mutex_unlock_slowpath+0x83/0x180
[   78.480037]  [<ffffffff817bf3be>] mutex_unlock+0xe/0x10
[   78.480037]  [<ffffffff81376c91>] p_stop+0x51/0x90
[   78.480037]  [<ffffffff81211408>] seq_read+0x288/0x3d0
[   78.480037]  [<ffffffff811e9d9e>] vfs_read+0x9e/0x170
[   78.480037]  [<ffffffff811ea8cc>] SyS_read+0x4c/0xa0
[   78.480037]  [<ffffffff817ccc9d>] system_call_fastpath+0x1a/0x1f

Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2013-10-16 11:54:01 +11:00
John Johansen
5cb3e91ebd apparmor: fix memleak of the profile hash
BugLink: http://bugs.launchpad.net/bugs/1235523

This fixes the following kmemleak trace:
unreferenced object 0xffff8801e8c35680 (size 32):
  comm "apparmor_parser", pid 691, jiffies 4294895667 (age 13230.876s)
  hex dump (first 32 bytes):
    e0 d3 4e b5 ac 6d f4 ed 3f cb ee 48 1c fd 40 cf  ..N..m..?..H..@.
    5b cc e9 93 00 00 00 00 00 00 00 00 00 00 00 00  [...............
  backtrace:
    [<ffffffff817a97ee>] kmemleak_alloc+0x4e/0xb0
    [<ffffffff811ca9f3>] __kmalloc+0x103/0x290
    [<ffffffff8138acbc>] aa_calc_profile_hash+0x6c/0x150
    [<ffffffff8138074d>] aa_unpack+0x39d/0xd50
    [<ffffffff8137eced>] aa_replace_profiles+0x3d/0xd80
    [<ffffffff81376937>] profile_replace+0x37/0x50
    [<ffffffff811e9f2d>] vfs_write+0xbd/0x1e0
    [<ffffffff811ea96c>] SyS_write+0x4c/0xa0
    [<ffffffff817ccb1d>] system_call_fastpath+0x1a/0x1f
    [<ffffffffffffffff>] 0xffffffffffffffff

Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2013-10-16 11:53:59 +11:00
Patrick McHardy
795aa6ef6a netfilter: pass hook ops to hookfn
Pass the hook ops to the hookfn to allow for generic hook
functions. This change is required by nf_tables.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-14 11:29:31 +02:00
Eric Dumazet
c2bb06db59 net: fix build errors if ipv6 is disabled
CONFIG_IPV6=n is still a valid choice ;)

It appears we can remove dead code.

Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-09 13:04:03 -04:00
Eric Dumazet
efe4208f47 ipv6: make lookups simpler and faster
TCP listener refactoring, part 4 :

To speed up inet lookups, we moved IPv4 addresses from inet to struct
sock_common

Now is time to do the same for IPv6, because it permits us to have fast
lookups for all kind of sockets, including upcoming SYN_RECV.

Getting IPv6 addresses in TCP lookups currently requires two extra cache
lines, plus a dereference (and memory stall).

inet6_sk(sk) does the dereference of inet_sk(__sk)->pinet6

This patch is way bigger than its IPv4 counter part, because for IPv4,
we could add aliases (inet_daddr, inet_rcv_saddr), while on IPv6,
it's not doable easily.

inet6_sk(sk)->daddr becomes sk->sk_v6_daddr
inet6_sk(sk)->rcv_saddr becomes sk->sk_v6_rcv_saddr

And timewait socket also have tw->tw_v6_daddr & tw->tw_v6_rcv_saddr
at the same offset.

We get rid of INET6_TW_MATCH() as INET6_MATCH() is now the generic
macro.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-09 00:01:25 -04:00
Linus Torvalds
ab35406264 selinux: remove 'flags' parameter from avc_audit()
Now avc_audit() has no more users with that parameter. Remove it.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-04 14:13:25 -07:00
Linus Torvalds
cb4fbe5703 selinux: avc_has_perm_flags has no more users
.. so get rid of it.  The only indirect users were all the
avc_has_perm() callers which just expanded to have a zero flags
argument.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-04 14:13:14 -07:00
Linus Torvalds
19e49834d2 selinux: remove 'flags' parameter from inode_has_perm
Every single user passes in '0'.  I think we had non-zero users back in
some stone age when selinux_inode_permission() was implemented in terms
of inode_has_perm(), but that complicated case got split up into a
totally separate code-path so that we could optimize the much simpler
special cases.

See commit 2e33405785 ("SELinux: delay initialization of audit data in
selinux_inode_permission") for example.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-04 12:54:11 -07:00
David S. Miller
4fbef95af4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/emulex/benet/be.h
	drivers/net/usb/qmi_wwan.c
	drivers/net/wireless/brcm80211/brcmfmac/dhd_bus.h
	include/net/netfilter/nf_conntrack_synproxy.h
	include/net/secure_seq.h

The conflicts are of two varieties:

1) Conflicts with Joe Perches's 'extern' removal from header file
   function declarations.  Usually it's an argument signature change
   or a function being added/removed.  The resolutions are trivial.

2) Some overlapping changes in qmi_wwan.c and be.h, one commit adds
   a new value, another changes an existing value.  That sort of
   thing.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-01 17:06:14 -04:00
Eric W. Biederman
0bbf87d852 net ipv4: Convert ipv4.ip_local_port_range to be per netns v3
- Move sysctl_local_ports from a global variable into struct netns_ipv4.
- Modify inet_get_local_port_range to take a struct net, and update all
  of the callers.
- Move the initialization of sysctl_local_ports into
   sysctl_net_ipv4.c:ipv4_sysctl_init_net from inet_connection_sock.c

v2:
- Ensure indentation used tabs
- Fixed ip.h so it applies cleanly to todays net-next

v3:
- Compile fixes of strange callers of inet_get_local_port_range.
  This patch now successfully passes an allmodconfig build.
  Removed manual inlining of inet_get_local_port_range in ipv4_local_port_range

Originally-by: Samya <samya@twitter.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-30 21:59:38 -07:00
John Johansen
4cd4fc7703 apparmor: fix suspicious RCU usage warning in policy.c/policy.h
The recent 3.12 pull request for apparmor was missing a couple rcu _protected
access modifiers. Resulting in the follow suspicious RCU usage

 [   29.804534] [ INFO: suspicious RCU usage. ]
 [   29.804539] 3.11.0+ #5 Not tainted
 [   29.804541] -------------------------------
 [   29.804545] security/apparmor/include/policy.h:363 suspicious rcu_dereference_check() usage!
 [   29.804548]
 [   29.804548] other info that might help us debug this:
 [   29.804548]
 [   29.804553]
 [   29.804553] rcu_scheduler_active = 1, debug_locks = 1
 [   29.804558] 2 locks held by apparmor_parser/1268:
 [   29.804560]  #0:  (sb_writers#9){.+.+.+}, at: [<ffffffff81120a4c>] file_start_write+0x27/0x29
 [   29.804576]  #1:  (&ns->lock){+.+.+.}, at: [<ffffffff811f5d88>] aa_replace_profiles+0x166/0x57c
 [   29.804589]
 [   29.804589] stack backtrace:
 [   29.804595] CPU: 0 PID: 1268 Comm: apparmor_parser Not tainted 3.11.0+ #5
 [   29.804599] Hardware name: ASUSTeK Computer Inc.         UL50VT          /UL50VT    , BIOS 217     03/01/2010
 [   29.804602]  0000000000000000 ffff8800b95a1d90 ffffffff8144eb9b ffff8800b94db540
 [   29.804611]  ffff8800b95a1dc0 ffffffff81087439 ffff880138cc3a18 ffff880138cc3a18
 [   29.804619]  ffff8800b9464a90 ffff880138cc3a38 ffff8800b95a1df0 ffffffff811f5084
 [   29.804628] Call Trace:
 [   29.804636]  [<ffffffff8144eb9b>] dump_stack+0x4e/0x82
 [   29.804642]  [<ffffffff81087439>] lockdep_rcu_suspicious+0xfc/0x105
 [   29.804649]  [<ffffffff811f5084>] __aa_update_replacedby+0x53/0x7f
 [   29.804655]  [<ffffffff811f5408>] __replace_profile+0x11f/0x1ed
 [   29.804661]  [<ffffffff811f6032>] aa_replace_profiles+0x410/0x57c
 [   29.804668]  [<ffffffff811f16d4>] profile_replace+0x35/0x4c
 [   29.804674]  [<ffffffff81120fa3>] vfs_write+0xad/0x113
 [   29.804680]  [<ffffffff81121609>] SyS_write+0x44/0x7a
 [   29.804687]  [<ffffffff8145bfd2>] system_call_fastpath+0x16/0x1b
 [   29.804691]
 [   29.804694] ===============================
 [   29.804697] [ INFO: suspicious RCU usage. ]
 [   29.804700] 3.11.0+ #5 Not tainted
 [   29.804703] -------------------------------
 [   29.804706] security/apparmor/policy.c:566 suspicious rcu_dereference_check() usage!
 [   29.804709]
 [   29.804709] other info that might help us debug this:
 [   29.804709]
 [   29.804714]
 [   29.804714] rcu_scheduler_active = 1, debug_locks = 1
 [   29.804718] 2 locks held by apparmor_parser/1268:
 [   29.804721]  #0:  (sb_writers#9){.+.+.+}, at: [<ffffffff81120a4c>] file_start_write+0x27/0x29
 [   29.804733]  #1:  (&ns->lock){+.+.+.}, at: [<ffffffff811f5d88>] aa_replace_profiles+0x166/0x57c
 [   29.804744]
 [   29.804744] stack backtrace:
 [   29.804750] CPU: 0 PID: 1268 Comm: apparmor_parser Not tainted 3.11.0+ #5
 [   29.804753] Hardware name: ASUSTeK Computer Inc.         UL50VT          /UL50VT    , BIOS 217     03/01/2010
 [   29.804756]  0000000000000000 ffff8800b95a1d80 ffffffff8144eb9b ffff8800b94db540
 [   29.804764]  ffff8800b95a1db0 ffffffff81087439 ffff8800b95b02b0 0000000000000000
 [   29.804772]  ffff8800b9efba08 ffff880138cc3a38 ffff8800b95a1dd0 ffffffff811f4f94
 [   29.804779] Call Trace:
 [   29.804786]  [<ffffffff8144eb9b>] dump_stack+0x4e/0x82
 [   29.804791]  [<ffffffff81087439>] lockdep_rcu_suspicious+0xfc/0x105
 [   29.804798]  [<ffffffff811f4f94>] aa_free_replacedby_kref+0x4d/0x62
 [   29.804804]  [<ffffffff811f4f47>] ? aa_put_namespace+0x17/0x17
 [   29.804810]  [<ffffffff811f4f0b>] kref_put+0x36/0x40
 [   29.804816]  [<ffffffff811f5423>] __replace_profile+0x13a/0x1ed
 [   29.804822]  [<ffffffff811f6032>] aa_replace_profiles+0x410/0x57c
 [   29.804829]  [<ffffffff811f16d4>] profile_replace+0x35/0x4c
 [   29.804835]  [<ffffffff81120fa3>] vfs_write+0xad/0x113
 [   29.804840]  [<ffffffff81121609>] SyS_write+0x44/0x7a
 [   29.804847]  [<ffffffff8145bfd2>] system_call_fastpath+0x16/0x1b

Reported-by: miles.lane@gmail.com
CC: paulmck@linux.vnet.ibm.com
Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2013-09-30 09:54:01 +10:00
Tyler Hicks
71ac7f6255 apparmor: Use shash crypto API interface for profile hashes
Use the shash interface, rather than the hash interface, when hashing
AppArmor profiles. The shash interface does not use scatterlists and it
is a better fit for what AppArmor needs.

This fixes a kernel paging BUG when aa_calc_profile_hash() is passed a
buffer from vmalloc(). The hash interface requires callers to handle
vmalloc() buffers differently than what AppArmor was doing. Due to
vmalloc() memory not being physically contiguous, each individual page
behind the buffer must be assigned to a scatterlist with sg_set_page()
and then the scatterlist passed to crypto_hash_update().

The shash interface does not have that limitation and allows vmalloc()
and kmalloc() buffers to be handled in the same manner.

BugLink: https://launchpad.net/bugs/1216294/
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=62261

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Acked-by: Seth Arnold <seth.arnold@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2013-09-30 09:53:59 +10:00
Paul Moore
42d64e1add selinux: correct locking in selinux_netlbl_socket_connect)
The SELinux/NetLabel glue code has a locking bug that affects systems
with NetLabel enabled, see the kernel error message below.  This patch
corrects this problem by converting the bottom half socket lock to a
more conventional, and correct for this call-path, lock_sock() call.

 ===============================
 [ INFO: suspicious RCU usage. ]
 3.11.0-rc3+ #19 Not tainted
 -------------------------------
 net/ipv4/cipso_ipv4.c:1928 suspicious rcu_dereference_protected() usage!

 other info that might help us debug this:

 rcu_scheduler_active = 1, debug_locks = 0
 2 locks held by ping/731:
  #0:  (slock-AF_INET/1){+.-...}, at: [...] selinux_netlbl_socket_connect
  #1:  (rcu_read_lock){.+.+..}, at: [<...>] netlbl_conn_setattr

 stack backtrace:
 CPU: 1 PID: 731 Comm: ping Not tainted 3.11.0-rc3+ #19
 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  0000000000000001 ffff88006f659d28 ffffffff81726b6a ffff88003732c500
  ffff88006f659d58 ffffffff810e4457 ffff88006b845a00 0000000000000000
  000000000000000c ffff880075aa2f50 ffff88006f659d90 ffffffff8169bec7
 Call Trace:
  [<ffffffff81726b6a>] dump_stack+0x54/0x74
  [<ffffffff810e4457>] lockdep_rcu_suspicious+0xe7/0x120
  [<ffffffff8169bec7>] cipso_v4_sock_setattr+0x187/0x1a0
  [<ffffffff8170f317>] netlbl_conn_setattr+0x187/0x190
  [<ffffffff8170f195>] ? netlbl_conn_setattr+0x5/0x190
  [<ffffffff8131ac9e>] selinux_netlbl_socket_connect+0xae/0xc0
  [<ffffffff81303025>] selinux_socket_connect+0x135/0x170
  [<ffffffff8119d127>] ? might_fault+0x57/0xb0
  [<ffffffff812fb146>] security_socket_connect+0x16/0x20
  [<ffffffff815d3ad3>] SYSC_connect+0x73/0x130
  [<ffffffff81739a85>] ? sysret_check+0x22/0x5d
  [<ffffffff810e5e2d>] ? trace_hardirqs_on_caller+0xfd/0x1c0
  [<ffffffff81373d4e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
  [<ffffffff815d52be>] SyS_connect+0xe/0x10
  [<ffffffff81739a59>] system_call_fastpath+0x16/0x1b

Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
2013-09-26 17:00:46 -04:00
Duan Jiong
7d1db4b242 selinux: Use kmemdup instead of kmalloc + memcpy
Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2013-09-26 15:52:13 -04:00
Mimi Zohar
c124bde28b KEYS: initialize root uid and session keyrings early
In order to create the integrity keyrings (eg. _evm, _ima), root's
uid and session keyrings need to be initialized early.

Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2013-09-25 17:17:01 +01:00
David Howells
008643b86c KEYS: Add a 'trusted' flag and a 'trusted only' flag
Add KEY_FLAG_TRUSTED to indicate that a key either comes from a trusted source
or had a cryptographic signature chain that led back to a trusted key the
kernel already possessed.

Add KEY_FLAGS_TRUSTED_ONLY to indicate that a keyring will only accept links to
keys marked with KEY_FLAGS_TRUSTED.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
2013-09-25 17:17:01 +01:00
David Howells
f36f8c75ae KEYS: Add per-user_namespace registers for persistent per-UID kerberos caches
Add support for per-user_namespace registers of persistent per-UID kerberos
caches held within the kernel.

This allows the kerberos cache to be retained beyond the life of all a user's
processes so that the user's cron jobs can work.

The kerberos cache is envisioned as a keyring/key tree looking something like:

	struct user_namespace
	  \___ .krb_cache keyring		- The register
		\___ _krb.0 keyring		- Root's Kerberos cache
		\___ _krb.5000 keyring		- User 5000's Kerberos cache
		\___ _krb.5001 keyring		- User 5001's Kerberos cache
			\___ tkt785 big_key	- A ccache blob
			\___ tkt12345 big_key	- Another ccache blob

Or possibly:

	struct user_namespace
	  \___ .krb_cache keyring		- The register
		\___ _krb.0 keyring		- Root's Kerberos cache
		\___ _krb.5000 keyring		- User 5000's Kerberos cache
		\___ _krb.5001 keyring		- User 5001's Kerberos cache
			\___ tkt785 keyring	- A ccache
				\___ krbtgt/REDHAT.COM@REDHAT.COM big_key
				\___ http/REDHAT.COM@REDHAT.COM user
				\___ afs/REDHAT.COM@REDHAT.COM user
				\___ nfs/REDHAT.COM@REDHAT.COM user
				\___ krbtgt/KERNEL.ORG@KERNEL.ORG big_key
				\___ http/KERNEL.ORG@KERNEL.ORG big_key

What goes into a particular Kerberos cache is entirely up to userspace.  Kernel
support is limited to giving you the Kerberos cache keyring that you want.

The user asks for their Kerberos cache by:

	krb_cache = keyctl_get_krbcache(uid, dest_keyring);

The uid is -1 or the user's own UID for the user's own cache or the uid of some
other user's cache (requires CAP_SETUID).  This permits rpc.gssd or whatever to
mess with the cache.

The cache returned is a keyring named "_krb.<uid>" that the possessor can read,
search, clear, invalidate, unlink from and add links to.  Active LSMs get a
chance to rule on whether the caller is permitted to make a link.

Each uid's cache keyring is created when it first accessed and is given a
timeout that is extended each time this function is called so that the keyring
goes away after a while.  The timeout is configurable by sysctl but defaults to
three days.

Each user_namespace struct gets a lazily-created keyring that serves as the
register.  The cache keyrings are added to it.  This means that standard key
search and garbage collection facilities are available.

The user_namespace struct's register goes away when it does and anything left
in it is then automatically gc'd.

Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Simo Sorce <simo@redhat.com>
cc: Serge E. Hallyn <serge.hallyn@ubuntu.com>
cc: Eric W. Biederman <ebiederm@xmission.com>
2013-09-24 10:35:19 +01:00
David Howells
ab3c3587f8 KEYS: Implement a big key type that can save to tmpfs
Implement a big key type that can save its contents to tmpfs and thus
swapspace when memory is tight.  This is useful for Kerberos ticket caches.

Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Simo Sorce <simo@redhat.com>
2013-09-24 10:35:18 +01:00
David Howells
b2a4df200d KEYS: Expand the capacity of a keyring
Expand the capacity of a keyring to be able to hold a lot more keys by using
the previously added associative array implementation.  Currently the maximum
capacity is:

	(PAGE_SIZE - sizeof(header)) / sizeof(struct key *)

which, on a 64-bit system, is a little more 500.  However, since this is being
used for the NFS uid mapper, we need more than that.  The new implementation
gives us effectively unlimited capacity.

With some alterations, the keyutils testsuite runs successfully to completion
after this patch is applied.  The alterations are because (a) keyrings that
are simply added to no longer appear ordered and (b) some of the errors have
changed a bit.

Signed-off-by: David Howells <dhowells@redhat.com>
2013-09-24 10:35:18 +01:00
David Howells
e57e8669f2 KEYS: Drop the permissions argument from __keyring_search_one()
Drop the permissions argument from __keyring_search_one() as the only caller
passes 0 here - which causes all checks to be skipped.

Signed-off-by: David Howells <dhowells@redhat.com>
2013-09-24 10:35:17 +01:00
David Howells
ccc3e6d9c9 KEYS: Define a __key_get() wrapper to use rather than atomic_inc()
Define a __key_get() wrapper to use rather than atomic_inc() on the key usage
count as this makes it easier to hook in refcount error debugging.

Signed-off-by: David Howells <dhowells@redhat.com>
2013-09-24 10:35:16 +01:00
David Howells
d0a059cac6 KEYS: Search for auth-key by name rather than target key ID
Search for auth-key by name rather than by target key ID as, in a future
patch, we'll by searching directly by index key in preference to iteration
over all keys.

Signed-off-by: David Howells <dhowells@redhat.com>
2013-09-24 10:35:16 +01:00
David Howells
4bdf0bc300 KEYS: Introduce a search context structure
Search functions pass around a bunch of arguments, each of which gets copied
with each call.  Introduce a search context structure to hold these.

Whilst we're at it, create a search flag that indicates whether the search
should be directly to the description or whether it should iterate through all
keys looking for a non-description match.

This will be useful when keyrings use a generic data struct with generic
routines to manage their content as the search terms can just be passed
through to the iterator callback function.

Also, for future use, the data to be supplied to the match function is
separated from the description pointer in the search context.  This makes it
clear which is being supplied.

Signed-off-by: David Howells <dhowells@redhat.com>
2013-09-24 10:35:15 +01:00
David Howells
16feef4340 KEYS: Consolidate the concept of an 'index key' for key access
Consolidate the concept of an 'index key' for accessing keys.  The index key
is the search term needed to find a key directly - basically the key type and
the key description.  We can add to that the description length.

This will be useful when turning a keyring into an associative array rather
than just a pointer block.

Signed-off-by: David Howells <dhowells@redhat.com>
2013-09-24 10:35:15 +01:00
David Howells
7e55ca6dcd KEYS: key_is_dead() should take a const key pointer argument
key_is_dead() should take a const key pointer argument as it doesn't modify
what it points to.

Signed-off-by: David Howells <dhowells@redhat.com>
2013-09-24 10:35:14 +01:00
David Howells
a5b4bd2874 KEYS: Use bool in make_key_ref() and is_key_possessed()
Make make_key_ref() take a bool possession parameter and make
is_key_possessed() return a bool.

Signed-off-by: David Howells <dhowells@redhat.com>
2013-09-24 10:35:14 +01:00
David Howells
61ea0c0ba9 KEYS: Skip key state checks when checking for possession
Skip key state checks (invalidation, revocation and expiration) when checking
for possession.  Without this, keys that have been marked invalid, revoked
keys and expired keys are not given a possession attribute - which means the
possessor is not granted any possession permits and cannot do anything with
them unless they also have one a user, group or other permit.

This causes failures in the keyutils test suite's revocation and expiration
tests now that commit 96b5c8fea6 reduced the
initial permissions granted to a key.

The failures are due to accesses to revoked and expired keys being given
EACCES instead of EKEYREVOKED or EKEYEXPIRED.

Signed-off-by: David Howells <dhowells@redhat.com>
2013-09-24 10:35:13 +01:00
Eric Paris
a3c9e45d18 security: remove erroneous comment about capabilities.o link ordering
Back when we had half ass LSM stacking we had to link capabilities.o
after bigger LSMs so that on initialization the bigger LSM would
register first and the capabilities module would be the one stacked as
the 'seconday'.  Somewhere around 6f0f0fd496 (back in 2008) we
finally removed the last of the kinda module stacking code but this
comment in the makefile still lives today.

Reported-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2013-09-24 11:26:28 +10:00
Paul Moore
98f700f317 Merge git://git.infradead.org/users/eparis/selinux
Conflicts:
	security/selinux/hooks.c

Pull Eric's existing SELinux tree as there are a number of patches in
there that are not yet upstream.  There was some minor fixup needed to
resolve a conflict in security/selinux/hooks.c:selinux_set_mnt_opts()
between the labeled NFS patches and Eric's security_fs_use()
simplification patch.
2013-09-18 13:52:20 -04:00
Linus Torvalds
c7c4591db6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull namespace changes from Eric Biederman:
 "This is an assorted mishmash of small cleanups, enhancements and bug
  fixes.

  The major theme is user namespace mount restrictions.  nsown_capable
  is killed as it encourages not thinking about details that need to be
  considered.  A very hard to hit pid namespace exiting bug was finally
  tracked and fixed.  A couple of cleanups to the basic namespace
  infrastructure.

  Finally there is an enhancement that makes per user namespace
  capabilities usable as capabilities, and an enhancement that allows
  the per userns root to nice other processes in the user namespace"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  userns:  Kill nsown_capable it makes the wrong thing easy
  capabilities: allow nice if we are privileged
  pidns: Don't have unshare(CLONE_NEWPID) imply CLONE_THREAD
  userns: Allow PR_CAPBSET_DROP in a user namespace.
  namespaces: Simplify copy_namespaces so it is clear what is going on.
  pidns: Fix hang in zap_pid_ns_processes by sending a potentially extra wakeup
  sysfs: Restrict mounting sysfs
  userns: Better restrictions on when proc and sysfs can be mounted
  vfs: Don't copy mount bind mounts of /proc/<pid>/ns/mnt between namespaces
  kernel/nsproxy.c: Improving a snippet of code.
  proc: Restrict mounting the proc filesystem
  vfs: Lock in place mounts from more privileged users
2013-09-07 14:35:32 -07:00
Linus Torvalds
11c7b03d42 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem updates from James Morris:
 "Nothing major for this kernel, just maintenance updates"

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (21 commits)
  apparmor: add the ability to report a sha1 hash of loaded policy
  apparmor: export set of capabilities supported by the apparmor module
  apparmor: add the profile introspection file to interface
  apparmor: add an optional profile attachment string for profiles
  apparmor: add interface files for profiles and namespaces
  apparmor: allow setting any profile into the unconfined state
  apparmor: make free_profile available outside of policy.c
  apparmor: rework namespace free path
  apparmor: update how unconfined is handled
  apparmor: change how profile replacement update is done
  apparmor: convert profile lists to RCU based locking
  apparmor: provide base for multiple profiles to be replaced at once
  apparmor: add a features/policy dir to interface
  apparmor: enable users to query whether apparmor is enabled
  apparmor: remove minimum size check for vmalloc()
  Smack: parse multiple rules per write to load2, up to PAGE_SIZE-1 bytes
  Smack: network label match fix
  security: smack: add a hash table to quicken smk_find_entry()
  security: smack: fix memleak in smk_write_rules_list()
  xattr: Constify ->name member of "struct xattr".
  ...
2013-09-07 14:34:07 -07:00
Linus Torvalds
cc998ff881 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking changes from David Miller:
 "Noteworthy changes this time around:

   1) Multicast rejoin support for team driver, from Jiri Pirko.

   2) Centralize and simplify TCP RTT measurement handling in order to
      reduce the impact of bad RTO seeding from SYN/ACKs.  Also, when
      both timestamps and local RTT measurements are available prefer
      the later because there are broken middleware devices which
      scramble the timestamp.

      From Yuchung Cheng.

   3) Add TCP_NOTSENT_LOWAT socket option to limit the amount of kernel
      memory consumed to queue up unsend user data.  From Eric Dumazet.

   4) Add a "physical port ID" abstraction for network devices, from
      Jiri Pirko.

   5) Add a "suppress" operation to influence fib_rules lookups, from
      Stefan Tomanek.

   6) Add a networking development FAQ, from Paul Gortmaker.

   7) Extend the information provided by tcp_probe and add ipv6 support,
      from Daniel Borkmann.

   8) Use RCU locking more extensively in openvswitch data paths, from
      Pravin B Shelar.

   9) Add SCTP support to openvswitch, from Joe Stringer.

  10) Add EF10 chip support to SFC driver, from Ben Hutchings.

  11) Add new SYNPROXY netfilter target, from Patrick McHardy.

  12) Compute a rate approximation for sending in TCP sockets, and use
      this to more intelligently coalesce TSO frames.  Furthermore, add
      a new packet scheduler which takes advantage of this estimate when
      available.  From Eric Dumazet.

  13) Allow AF_PACKET fanouts with random selection, from Daniel
      Borkmann.

  14) Add ipv6 support to vxlan driver, from Cong Wang"

Resolved conflicts as per discussion.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1218 commits)
  openvswitch: Fix alignment of struct sw_flow_key.
  netfilter: Fix build errors with xt_socket.c
  tcp: Add missing braces to do_tcp_setsockopt
  caif: Add missing braces to multiline if in cfctrl_linkup_request
  bnx2x: Add missing braces in bnx2x:bnx2x_link_initialize
  vxlan: Fix kernel panic on device delete.
  net: mvneta: implement ->ndo_do_ioctl() to support PHY ioctls
  net: mvneta: properly disable HW PHY polling and ensure adjust_link() works
  icplus: Use netif_running to determine device state
  ethernet/arc/arc_emac: Fix huge delays in large file copies
  tuntap: orphan frags before trying to set tx timestamp
  tuntap: purge socket error queue on detach
  qlcnic: use standard NAPI weights
  ipv6:introduce function to find route for redirect
  bnx2x: VF RSS support - VF side
  bnx2x: VF RSS support - PF side
  vxlan: Notify drivers for listening UDP port changes
  net: usbnet: update addr_assign_type if appropriate
  driver/net: enic: update enic maintainers and driver
  driver/net: enic: Exposing symbols for Cisco's low latency driver
  ...
2013-09-05 14:54:29 -07:00
Linus Torvalds
3398d252a4 Minor fixes mainly, including a potential use-after-free on remove found by
CONFIG_DEBUG_KOBJECT_RELEASE which may be theoretical.
 
 Cheers,
 Rusty.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJSJsL7AAoJENkgDmzRrbjx/N0P+wU81L55uYdJnDVPsW/fzhyJ
 BEZtmhFWtqxHeuFMQpwm5N3/ORwMht4czF4Hf957ePtI2PelHu+kapnFFQ+/KHZs
 T5sjH0mAYf9+CPa8wj4OKVzvd8lH4udbV+E7INVfciDySRX5HKXynJ6pZHvfeu7y
 /2q2PH6kGzuGoTEkDwOOwJ5yyZqs+RW9ZkSMStgCOUE8GmoDXEsH1KwIYE/9buCh
 XTngMo7AhimQaQ9QKypJLjlcnI9X/9ljXqFRKqSFOeMA1Ba+h+7eUqd4FJI6jDJu
 tecMOxX9PezPK6Wdg8V7AFBSzOhDPqoKQBOcaqeLd1wVICi8oQirVzwQNlsoiVNu
 JC+8rDqaeuG3dazROhaAnez7nhHfTjnMsYLVMRUmYtqXetd0qWYSmlmcJRKSJwi4
 okl/Lv5BroQdB9bB8+sc7l34nE3HXZGV3tJcNXf91NNEbDt97xFZ2YYbdtsQcOqj
 igUHcjsZq1gLsdIhnkHjTGLkLPxMTdCi8mtUc9+uzXHvSPJqEUMcn+fEA7RdliUw
 /WvpUX2tj2Al44LfBy6L0D6AVyS2/zIQ9PzH6FHsgVDqrNHomkF20w3btnM3yyPA
 hakV6vPr+kOpfSJYlTSU7yhEJh+LGkfPXeaX4X3tqubKhsZjq8rS7vrfbcqmgwvT
 DbzDKRx5R3URiiFSbb2v
 =15lp
 -----END PGP SIGNATURE-----

Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull module updates from Rusty Russell:
 "Minor fixes mainly, including a potential use-after-free on remove
  found by CONFIG_DEBUG_KOBJECT_RELEASE which may be theoretical"

* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  module: Fix mod->mkobj.kobj potentially freed too early
  kernel/params.c: use scnprintf() instead of sprintf()
  kernel/module.c: use scnprintf() instead of sprintf()
  module/lsm: Have apparmor module parameters work with no args
  module: Add NOARG flag for ops with param_set_bool_enable_only() set function
  module: Add flag to allow mod params to have no arguments
  modules: add support for soft module dependencies
  scripts/mod/modpost.c: permit '.cranges' secton for sh64 architecture.
  module: fix sprintf format specifier in param_get_byte()
2013-09-04 17:34:29 -07:00
Linus Torvalds
32dad03d16 Merge branch 'for-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
 "A lot of activities on the cgroup front.  Most changes aren't visible
  to userland at all at this point and are laying foundation for the
  planned unified hierarchy.

   - The biggest change is decoupling the lifetime management of css
     (cgroup_subsys_state) from that of cgroup's.  Because controllers
     (cpu, memory, block and so on) will need to be dynamically enabled
     and disabled, css which is the association point between a cgroup
     and a controller may come and go dynamically across the lifetime of
     a cgroup.  Till now, css's were created when the associated cgroup
     was created and stayed till the cgroup got destroyed.

     Assumptions around this tight coupling permeated through cgroup
     core and controllers.  These assumptions are gradually removed,
     which consists bulk of patches, and css destruction path is
     completely decoupled from cgroup destruction path.  Note that
     decoupling of creation path is relatively easy on top of these
     changes and the patchset is pending for the next window.

   - cgroup has its own event mechanism cgroup.event_control, which is
     only used by memcg.  It is overly complex trying to achieve high
     flexibility whose benefits seem dubious at best.  Going forward,
     new events will simply generate file modified event and the
     existing mechanism is being made specific to memcg.  This pull
     request contains prepatory patches for such change.

   - Various fixes and cleanups"

Fixed up conflict in kernel/cgroup.c as per Tejun.

* 'for-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (69 commits)
  cgroup: fix cgroup_css() invocation in css_from_id()
  cgroup: make cgroup_write_event_control() use css_from_dir() instead of __d_cgrp()
  cgroup: make cgroup_event hold onto cgroup_subsys_state instead of cgroup
  cgroup: implement CFTYPE_NO_PREFIX
  cgroup: make cgroup_css() take cgroup_subsys * instead and allow NULL subsys
  cgroup: rename cgroup_css_from_dir() to css_from_dir() and update its syntax
  cgroup: fix cgroup_write_event_control()
  cgroup: fix subsystem file accesses on the root cgroup
  cgroup: change cgroup_from_id() to css_from_id()
  cgroup: use css_get() in cgroup_create() to check CSS_ROOT
  cpuset: remove an unncessary forward declaration
  cgroup: RCU protect each cgroup_subsys_state release
  cgroup: move subsys file removal to kill_css()
  cgroup: factor out kill_css()
  cgroup: decouple cgroup_subsys_state destruction from cgroup destruction
  cgroup: replace cgroup->css_kill_cnt with ->nr_css
  cgroup: bounce cgroup_subsys_state ref kill confirmation to a work item
  cgroup: move cgroup->subsys[] assignment to online_css()
  cgroup: reorganize css init / exit paths
  cgroup: add __rcu modifier to cgroup->subsys[]
  ...
2013-09-03 18:25:03 -07:00
Serge Hallyn
f54fb863c6 capabilities: allow nice if we are privileged
We allow task A to change B's nice level if it has a supserset of
B's privileges, or of it has CAP_SYS_NICE.  Also allow it if A has
CAP_SYS_NICE with respect to B - meaning it is root in the same
namespace, or it created B's namespace.

Signed-off-by: Serge Hallyn <serge.hallyn@canonical.com>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2013-08-30 23:44:09 -07:00
Eric W. Biederman
160da84dbb userns: Allow PR_CAPBSET_DROP in a user namespace.
As the capabilites and capability bounding set are per user namespace
properties it is safe to allow changing them with just CAP_SETPCAP
permission in the user namespace.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Tested-by: Richard Weinberger <richard@nod.at>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-08-30 17:30:39 -07:00
Eric Paris
0b4bdb3573 Revert "SELinux: do not handle seclabel as a special flag"
This reverts commit 308ab70c46.

It breaks my FC6 test box.  /dev/pts is not mounted.  dmesg says

SELinux: mount invalid.  Same superblock, different security settings
for (dev devpts, type devpts)

Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-08-28 14:45:21 -04:00
Anand Avati
102aefdda4 selinux: consider filesystem subtype in policies
Not considering sub filesystem has the following limitation. Support
for SELinux in FUSE is dependent on the particular userspace
filesystem, which is identified by the subtype. For e.g, GlusterFS,
a FUSE based filesystem supports SELinux (by mounting and processing
FUSE requests in different threads, avoiding the mount time
deadlock), whereas other FUSE based filesystems (identified by a
different subtype) have the mount time deadlock.

By considering the subtype of the filesytem in the SELinux policies,
allows us to specify a filesystem subtype, in the following way:

fs_use_xattr fuse.glusterfs gen_context(system_u:object_r:fs_t,s0);

This way not all FUSE filesystems are put in the same bucket and
subjected to the limitations of the other subtypes.

Signed-off-by: Anand Avati <avati@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-08-28 14:44:52 -04:00
James Morris
7320336146 Merge branch 'smack-for-3.12' of git://git.gitorious.org/smack-next/kernel into ra-next 2013-08-23 02:50:12 +10:00
Steven Rostedt
5265fc6219 module/lsm: Have apparmor module parameters work with no args
The apparmor module parameters for param_ops_aabool and
param_ops_aalockpolicy are both based off of the param_ops_bool,
and can handle a NULL value passed in as val. Have it enable the
new KERNEL_PARAM_FL_NOARGS flag to allow the parameters to be set
without having to state "=y" or "=1".

Cc: John Johansen <john.johansen@canonical.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-08-20 15:37:44 +09:30
David S. Miller
2ff1cf12c9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2013-08-16 15:37:26 -07:00
John Johansen
f8eb8a1324 apparmor: add the ability to report a sha1 hash of loaded policy
Provide userspace the ability to introspect a sha1 hash value for each
profile currently loaded.

Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Seth Arnold <seth.arnold@canonical.com>
2013-08-14 11:42:08 -07:00
John Johansen
84f1f78742 apparmor: export set of capabilities supported by the apparmor module
Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Seth Arnold <seth.arnold@canonical.com>
2013-08-14 11:42:07 -07:00
John Johansen
29b3822f1e apparmor: add the profile introspection file to interface
Add the dynamic namespace relative profiles file to the interace, to allow
introspection of loaded profiles and their modes.

Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Kees Cook <kees@ubuntu.com>
2013-08-14 11:42:07 -07:00
John Johansen
556d0be74b apparmor: add an optional profile attachment string for profiles
Add the ability to take in and report a human readable profile attachment
string for profiles so that attachment specifications can be easily
inspected.

Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Seth Arnold <seth.arnold@canonical.com>
2013-08-14 11:42:07 -07:00
John Johansen
0d259f043f apparmor: add interface files for profiles and namespaces
Add basic interface files to access namespace and profile information.
The interface files are created when a profile is loaded and removed
when the profile or namespace is removed.

Signed-off-by: John Johansen <john.johansen@canonical.com>
2013-08-14 11:42:07 -07:00
John Johansen
038165070a apparmor: allow setting any profile into the unconfined state
Allow emulating the default profile behavior from boot, by allowing
loading of a profile in the unconfined state into a new NS.

Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Seth Arnold <seth.arnold@canonical.com>
2013-08-14 11:42:07 -07:00
John Johansen
8651e1d657 apparmor: make free_profile available outside of policy.c
Signed-off-by: John Johansen <john.johansen@canonical.com>
2013-08-14 11:42:06 -07:00
John Johansen
742058b0f3 apparmor: rework namespace free path
namespaces now completely use the unconfined profile to track the
refcount and rcu freeing cycle. So rework the code to simplify (track
everything through the profile path right up to the end), and move the
rcu_head from policy base to profile as the namespace no longer needs
it.

Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Seth Arnold <seth.arnold@canonical.com>
2013-08-14 11:42:06 -07:00
John Johansen
fa2ac468db apparmor: update how unconfined is handled
ns->unconfined is being used read side without locking, nor rcu but is
being updated when a namespace is removed. This works for the root ns
which is never removed but has a race window and can cause failures when
children namespaces are removed.

Also ns and ns->unconfined have a circular refcounting dependency that
is problematic and must be broken. Currently this is done incorrectly
when the namespace is destroyed.

Fix this by forward referencing unconfined via the replacedby infrastructure
instead of directly updating the ns->unconfined pointer.

Remove the circular refcount dependency by making the ns and its unconfined
profile share the same refcount.

Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Seth Arnold <seth.arnold@canonical.com>
2013-08-14 11:42:06 -07:00
John Johansen
77b071b340 apparmor: change how profile replacement update is done
remove the use of replaced by chaining and move to profile invalidation
and lookup to handle task replacement.

Replacement chaining can result in large chains of profiles being pinned
in memory when one profile in the chain is use. With implicit labeling
this will be even more of a problem, so move to a direct lookup method.

Signed-off-by: John Johansen <john.johansen@canonical.com>
2013-08-14 11:42:06 -07:00
John Johansen
01e2b670aa apparmor: convert profile lists to RCU based locking
Signed-off-by: John Johansen <john.johansen@canonical.com>
2013-08-14 11:42:06 -07:00
John Johansen
dd51c84857 apparmor: provide base for multiple profiles to be replaced at once
previously profiles had to be loaded one at a time, which could result
in cases where a replacement of a set would partially succeed, and then fail
resulting in inconsistent policy.

Allow multiple profiles to replaced "atomically" so that the replacement
either succeeds or fails for the entire set of profiles.

Signed-off-by: John Johansen <john.johansen@canonical.com>
2013-08-14 11:42:06 -07:00
John Johansen
9d910a3bc0 apparmor: add a features/policy dir to interface
Add a policy directory to features to contain features that can affect
policy compilation but do not affect mediation. Eg of such features would
be types of dfa compression supported, etc.

Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Kees Cook <kees@ubuntu.com>
2013-08-14 11:42:05 -07:00
John Johansen
c611616cd3 apparmor: enable users to query whether apparmor is enabled
Signed-off-by: John Johansen <john.johansen@canonical.com>
2013-08-14 11:42:05 -07:00
Tetsuo Handa
dfe4ac28be apparmor: remove minimum size check for vmalloc()
This is a follow-up to commit b5b3ee6c "apparmor: no need to delay vfree()".

Since vmalloc() will do "size = PAGE_ALIGN(size);",
we don't need to check for "size >= sizeof(struct work_struct)".

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: John Johansen <john.johansen@canonical.com>
2013-08-14 11:42:05 -07:00
Rafal Krypa
10289b0f73 Smack: parse multiple rules per write to load2, up to PAGE_SIZE-1 bytes
Smack interface for loading rules has always parsed only single rule from
data written to it. This requires user program to call one write() per
each rule it wants to load.
This change makes it possible to write multiple rules, separated by new
line character. Smack will load at most PAGE_SIZE-1 characters and properly
return number of processed bytes. In case when user buffer is larger, it
will be additionally truncated. All characters after last \n will not get
parsed to avoid partial rule near input buffer boundary.

Signed-off-by: Rafal Krypa <r.krypa@samsung.com>
2013-08-12 11:51:40 -07:00
Tejun Heo
bd8815a6d8 cgroup: make css_for_each_descendant() and friends include the origin css in the iteration
Previously, all css descendant iterators didn't include the origin
(root of subtree) css in the iteration.  The reasons were maintaining
consistency with css_for_each_child() and that at the time of
introduction more use cases needed skipping the origin anyway;
however, given that css_is_descendant() considers self to be a
descendant, omitting the origin css has become more confusing and
looking at the accumulated use cases rather clearly indicates that
including origin would result in simpler code overall.

While this is a change which can easily lead to subtle bugs, cgroup
API including the iterators has recently gone through major
restructuring and no out-of-tree changes will be applicable without
adjustments making this a relatively acceptable opportunity for this
type of change.

The conversions are mostly straight-forward.  If the iteration block
had explicit origin handling before or after, it's moved inside the
iteration.  If not, if (pos == origin) continue; is added.  Some
conversions add extra reference get/put around origin handling by
consolidating origin handling and the rest.  While the extra ref
operations aren't strictly necessary, this shouldn't cause any
noticeable difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Balbir Singh <bsingharora@gmail.com>
2013-08-08 20:11:27 -04:00
Tejun Heo
492eb21b98 cgroup: make hierarchy iterators deal with cgroup_subsys_state instead of cgroup
cgroup is currently in the process of transitioning to using css
(cgroup_subsys_state) as the primary handle instead of cgroup in
subsystem API.  For hierarchy iterators, this is beneficial because

* In most cases, css is the only thing subsystems care about anyway.

* On the planned unified hierarchy, iterations for different
  subsystems will need to skip over different subtrees of the
  hierarchy depending on which subsystems are enabled on each cgroup.
  Passing around css makes it unnecessary to explicitly specify the
  subsystem in question as css is intersection between cgroup and
  subsystem

* For the planned unified hierarchy, css's would need to be created
  and destroyed dynamically independent from cgroup hierarchy.  Having
  cgroup core manage css iteration makes enforcing deref rules a lot
  easier.

Most subsystem conversions are straight-forward.  Noteworthy changes
are

* blkio: cgroup_to_blkcg() is no longer used.  Removed.

* freezer: cgroup_freezer() is no longer used.  Removed.

* devices: cgroup_to_devcgroup() is no longer used.  Removed.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Jens Axboe <axboe@kernel.dk>
2013-08-08 20:11:25 -04:00
Tejun Heo
182446d087 cgroup: pass around cgroup_subsys_state instead of cgroup in file methods
cgroup is currently in the process of transitioning to using struct
cgroup_subsys_state * as the primary handle instead of struct cgroup.
Please see the previous commit which converts the subsystem methods
for rationale.

This patch converts all cftype file operations to take @css instead of
@cgroup.  cftypes for the cgroup core files don't have their subsytem
pointer set.  These will automatically use the dummy_css added by the
previous patch and can be converted the same way.

Most subsystem conversions are straight forwards but there are some
interesting ones.

* freezer: update_if_frozen() is also converted to take @css instead
  of @cgroup for consistency.  This will make the code look simpler
  too once iterators are converted to use css.

* memory/vmpressure: mem_cgroup_from_css() needs to be exported to
  vmpressure while mem_cgroup_from_cont() can be made static.
  Updated accordingly.

* cpu: cgroup_tg() doesn't have any user left.  Removed.

* cpuacct: cgroup_ca() doesn't have any user left.  Removed.

* hugetlb: hugetlb_cgroup_form_cgroup() doesn't have any user left.
  Removed.

* net_cls: cgrp_cls_state() doesn't have any user left.  Removed.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Acked-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Steven Rostedt <rostedt@goodmis.org>
2013-08-08 20:11:24 -04:00
Tejun Heo
eb95419b02 cgroup: pass around cgroup_subsys_state instead of cgroup in subsystem methods
cgroup is currently in the process of transitioning to using struct
cgroup_subsys_state * as the primary handle instead of struct cgroup *
in subsystem implementations for the following reasons.

* With unified hierarchy, subsystems will be dynamically bound and
  unbound from cgroups and thus css's (cgroup_subsys_state) may be
  created and destroyed dynamically over the lifetime of a cgroup,
  which is different from the current state where all css's are
  allocated and destroyed together with the associated cgroup.  This
  in turn means that cgroup_css() should be synchronized and may
  return NULL, making it more cumbersome to use.

* Differing levels of per-subsystem granularity in the unified
  hierarchy means that the task and descendant iterators should behave
  differently depending on the specific subsystem the iteration is
  being performed for.

* In majority of the cases, subsystems only care about its part in the
  cgroup hierarchy - ie. the hierarchy of css's.  Subsystem methods
  often obtain the matching css pointer from the cgroup and don't
  bother with the cgroup pointer itself.  Passing around css fits
  much better.

This patch converts all cgroup_subsys methods to take @css instead of
@cgroup.  The conversions are mostly straight-forward.  A few
noteworthy changes are

* ->css_alloc() now takes css of the parent cgroup rather than the
  pointer to the new cgroup as the css for the new cgroup doesn't
  exist yet.  Knowing the parent css is enough for all the existing
  subsystems.

* In kernel/cgroup.c::offline_css(), unnecessary open coded css
  dereference is replaced with local variable access.

This patch shouldn't cause any behavior differences.

v2: Unnecessary explicit cgrp->subsys[] deref in css_online() replaced
    with local variable @css as suggested by Li Zefan.

    Rebased on top of new for-3.12 which includes for-3.11-fixes so
    that ->css_free() invocation added by da0a12caff ("cgroup: fix a
    leak when percpu_ref_init() fails") is converted too.  Suggested
    by Li Zefan.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Acked-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Steven Rostedt <rostedt@goodmis.org>
2013-08-08 20:11:23 -04:00
Tejun Heo
6387698699 cgroup: add css_parent()
Currently, controllers have to explicitly follow the cgroup hierarchy
to find the parent of a given css.  cgroup is moving towards using
cgroup_subsys_state as the main controller interface construct, so
let's provide a way to climb the hierarchy using just csses.

This patch implements css_parent() which, given a css, returns its
parent.  The function is guarnateed to valid non-NULL parent css as
long as the target css is not at the top of the hierarchy.

freezer, cpuset, cpu, cpuacct, hugetlb, memory, net_cls and devices
are converted to use css_parent() instead of accessing cgroup->parent
directly.

* __parent_ca() is dropped from cpuacct and its usage is replaced with
  parent_ca().  The only difference between the two was NULL test on
  cgroup->parent which is now embedded in css_parent() making the
  distinction moot.  Note that eventually a css->parent field will be
  added to css and the NULL check in css_parent() will go away.

This patch shouldn't cause any behavior differences.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2013-08-08 20:11:23 -04:00
Tejun Heo
a7c6d554aa cgroup: add/update accessors which obtain subsys specific data from css
css (cgroup_subsys_state) is usually embedded in a subsys specific
data structure.  Subsystems either use container_of() directly to cast
from css to such data structure or has an accessor function wrapping
such cast.  As cgroup as whole is moving towards using css as the main
interface handle, add and update such accessors to ease dealing with
css's.

All accessors explicitly handle NULL input and return NULL in those
cases.  While this looks like an extra branch in the code, as all
controllers specific data structures have css as the first field, the
casting doesn't involve any offsetting and the compiler can trivially
optimize out the branch.

* blkio, freezer, cpuset, cpu, cpuacct and net_cls didn't have such
  accessor.  Added.

* memory, hugetlb and devices already had one but didn't explicitly
  handle NULL input.  Updated.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2013-08-08 20:11:23 -04:00
Tejun Heo
8af01f56a0 cgroup: s/cgroup_subsys_state/cgroup_css/ s/task_subsys_state/task_css/
The names of the two struct cgroup_subsys_state accessors -
cgroup_subsys_state() and task_subsys_state() - are somewhat awkward.
The former clashes with the type name and the latter doesn't even
indicate it's somehow related to cgroup.

We're about to revamp large portion of cgroup API, so, let's rename
them so that they're less awkward.  Most per-controller usages of the
accessors are localized in accessor wrappers and given the amount of
scheduled changes, this isn't gonna add any noticeable headache.

Rename cgroup_subsys_state() to cgroup_css() and task_subsys_state()
to task_css().  This patch is pure rename.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2013-08-08 20:11:22 -04:00
Casey Schaufler
6ea062475a Smack: IPv6 casting error fix for 3.11
The original implementation of the Smack IPv6 port based
local controls works most of the time using a sockaddr as
a temporary variable, but not always as it overflows in
some circumstances. The correct data is a sockaddr_in6.
A struct sockaddr isn't as large as a struct sockaddr_in6.
There would need to be casting one way or the other. This
patch gets it the right way.

Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2013-08-06 20:53:54 +10:00
Casey Schaufler
677264e8fb Smack: network label match fix
The Smack code that matches incoming CIPSO tags with Smack labels
reaches through the NetLabel interfaces and compares the network
data with the CIPSO header associated with a Smack label. This was
done in a ill advised attempt to optimize performance. It works
so long as the categories fit in a single capset, but this isn't
always the case.

This patch changes the Smack code to use the appropriate NetLabel
interfaces to compare the incoming CIPSO header with the CIPSO
header associated with a label. It will always match the CIPSO
headers correctly.

Targeted for git://git.gitorious.org/smack-next/kernel.git

Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2013-08-01 20:04:02 -07:00
Tomasz Stanislawski
4d7cf4a1f4 security: smack: add a hash table to quicken smk_find_entry()
Accepted for the smack-next tree after changing the number of
slots from 128 to 16.

This patch adds a hash table to quicken searching of a smack label by its name.

Basically, the patch improves performance of SMACK initialization.  Parsing of
rules involves translation from a string to a smack_known (aka label) entity
which is done in smk_find_entry().

The current implementation of the function iterates over a global list of
smack_known resulting in O(N) complexity for smk_find_entry().  The total
complexity of SMACK initialization becomes O(rules * labels).  Therefore it
scales quadratically with a complexity of a system.

Applying the patch reduced the complexity of smk_find_entry() to O(1) as long
as number of label is in hundreds. If the number of labels is increased please
update SMACK_HASH_SLOTS constant defined in security/smack/smack.h. Introducing
the configuration of this constant with Kconfig or cmdline might be a good
idea.

The size of the hash table was adjusted experimentally.  The rule set used by
TIZEN contains circa 17K rules for 500 labels.  The table above contains
results of SMACK initialization using 'time smackctl apply' bash command.
The 'Ref' is a kernel without this patch applied. The consecutive values
refers to value of SMACK_HASH_SLOTS.  Every measurement was repeated three
times to reduce noise.

     |  Ref  |   1   |   2   |   4   |   8   |   16  |   32  |   64  |  128  |  256  |  512
--------------------------------------------------------------------------------------------
Run1 | 1.156 | 1.096 | 0.883 | 0.764 | 0.692 | 0.667 | 0.649 | 0.633 | 0.634 | 0.629 | 0.620
Run2 | 1.156 | 1.111 | 0.885 | 0.764 | 0.694 | 0.661 | 0.649 | 0.651 | 0.634 | 0.638 | 0.623
Run3 | 1.160 | 1.107 | 0.886 | 0.764 | 0.694 | 0.671 | 0.661 | 0.638 | 0.631 | 0.624 | 0.638
AVG  | 1.157 | 1.105 | 0.885 | 0.764 | 0.693 | 0.666 | 0.653 | 0.641 | 0.633 | 0.630 | 0.627

Surprisingly, a single hlist is slightly faster than a double-linked list.
The speed-up saturates near 64 slots.  Therefore I chose value 128 to provide
some margin if more labels were used.
It looks that IO becomes a new bottleneck.

Signed-off-by: Tomasz Stanislawski <t.stanislaws@samsung.com>
2013-08-01 16:55:20 -07:00
Tomasz Stanislawski
470043ba99 security: smack: fix memleak in smk_write_rules_list()
The smack_parsed_rule structure is allocated.  If a rule is successfully
installed then the last reference to the object is lost.  This patch fixes this
leak. Moreover smack_parsed_rule is allocated on stack because it no longer
needed ofter smk_write_rules_list() is finished.

Signed-off-by: Tomasz Stanislawski <t.stanislaws@samsung.com>
2013-08-01 12:57:24 -07:00
fan.du
ca4c3fc24e net: split rt_genid for ipv4 and ipv6
Current net name space has only one genid for both IPv4 and IPv6, it has below
drawbacks:

- Add/delete an IPv4 address will invalidate all IPv6 routing table entries.
- Insert/remove XFRM policy will also invalidate both IPv4/IPv6 routing table
  entries even when the policy is only applied for one address family.

Thus, this patch attempt to split one genid for two to cater for IPv4 and IPv6
separately in a fine granularity.

Signed-off-by: Fan Du <fan.du@windriver.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-31 14:56:36 -07:00
Chris PeBenito
2be4d74f2f Add SELinux policy capability for always checking packet and peer classes.
Currently the packet class in SELinux is not checked if there are no
SECMARK rules in the security or mangle netfilter tables.  Some systems
prefer that packets are always checked, for example, to protect the system
should the netfilter rules fail to load or if the nefilter rules
were maliciously flushed.

Add the always_check_network policy capability which, when enabled, treats
SECMARK as enabled, even if there are no netfilter SECMARK rules and
treats peer labeling as enabled, even if there is no Netlabel or
labeled IPSEC configuration.

Includes definition of "redhat1" SELinux policy capability, which
exists in the SELinux userpace library, to keep ordering correct.

The SELinux userpace portion of this was merged last year, but this kernel
change fell on the floor.

Signed-off-by: Chris PeBenito <cpebenito@tresys.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-07-25 13:03:38 -04:00
Paul Moore
b04eea8864 selinux: fix problems in netnode when BUG() is compiled out
When the BUG() macro is disabled at compile time it can cause some
problems in the SELinux netnode code: invalid return codes and
uninitialized variables.  This patch fixes this by making sure we take
some corrective action after the BUG() macro.

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-07-25 13:03:27 -04:00
Eric Paris
b43e725d8d SELinux: use a helper function to determine seclabel
Use a helper to determine if a superblock should have the seclabel flag
rather than doing it in the function.  I'm going to use this in the
security server as well.

Signed-off-by: Eric Paris <eparis@redhat.com>
2013-07-25 13:03:24 -04:00
Eric Paris
a64c54cf08 SELinux: pass a superblock to security_fs_use
Rather than passing pointers to memory locations, strings, and other
stuff just give up on the separation and give security_fs_use the
superblock.  It just makes the code easier to read (even if not easier to
reuse on some other OS)

Signed-off-by: Eric Paris <eparis@redhat.com>
2013-07-25 13:03:21 -04:00
Eric Paris
308ab70c46 SELinux: do not handle seclabel as a special flag
Instead of having special code around the 'non-mount' seclabel mount option
just handle it like the mount options.

Signed-off-by: Eric Paris <eparis@redhat.com>
2013-07-25 13:03:12 -04:00
Eric Paris
f936c6e502 SELinux: change sbsec->behavior to short
We only have 6 options, so char is good enough, but use a short as that
packs nicely.  This shrinks the superblock_security_struct just a little
bit.

Signed-off-by: Eric Paris <eparis@redhat.com>
2013-07-25 13:03:09 -04:00
Eric Paris
cfca0303da SELinux: renumber the superblock options
Just to make it clear that we have mount time options and flags,
separate them.  Since I decided to move the non-mount options above
above 0x10, we need a short instead of a char.  (x86 padding says
this takes up no additional space as we have a 3byte whole in the
structure)

Signed-off-by: Eric Paris <eparis@redhat.com>
2013-07-25 13:03:06 -04:00
Eric Paris
eadcabc697 SELinux: do all flags twiddling in one place
Currently we set the initialize and seclabel flag in one place.  Do some
unrelated printk then we unset the seclabel flag.  Eww.  Instead do the flag
twiddling in one place in the code not seperated by unrelated printk.  Also
don't set and unset the seclabel flag.  Only set it if we need to.

Signed-off-by: Eric Paris <eparis@redhat.com>
2013-07-25 13:03:03 -04:00
Eric Paris
12f348b9dc SELinux: rename SE_SBLABELSUPP to SBLABEL_MNT
Just a flag rename as we prepare to make it not so special.

Signed-off-by: Eric Paris <eparis@redhat.com>
2013-07-25 13:03:01 -04:00
Eric Paris
af8e50cc7d SELinux: use define for number of bits in the mnt flags mask
We had this random hard coded value of '8' in the code (I put it there)
for the number of bits to check for mount options.  This is stupid.  Instead
use the #define we already have which tells us the number of mount
options.

Signed-off-by: Eric Paris <eparis@redhat.com>
2013-07-25 13:02:58 -04:00
Eric Paris
d355987f47 SELinux: make it harder to get the number of mnt opts wrong
Instead of just hard coding a value, use the enum to out benefit.

Signed-off-by: Eric Paris <eparis@redhat.com>
2013-07-25 13:02:53 -04:00
Eric Paris
40d3d0b85f SELinux: remove crazy contortions around proc
We check if the fsname is proc and if so set the proc superblock security
struct flag.  We then check if the flag is set and use the string 'proc'
for the fsname instead of just using the fsname.  What's the point?  It's
always proc...  Get rid of the useless conditional.

Signed-off-by: Eric Paris <eparis@redhat.com>
2013-07-25 13:02:50 -04:00
Eric Paris
b138004ea0 SELinux: fix selinuxfs policy file on big endian systems
The /sys/fs/selinux/policy file is not valid on big endian systems like
ppc64 or s390.  Let's see why:

static int hashtab_cnt(void *key, void *data, void *ptr)
{
	int *cnt = ptr;
	*cnt = *cnt + 1;

	return 0;
}

static int range_write(struct policydb *p, void *fp)
{
	size_t nel;
[...]
	/* count the number of entries in the hashtab */
	nel = 0;
	rc = hashtab_map(p->range_tr, hashtab_cnt, &nel);
	if (rc)
		return rc;
	buf[0] = cpu_to_le32(nel);
	rc = put_entry(buf, sizeof(u32), 1, fp);

So size_t is 64 bits.  But then we pass a pointer to it as we do to
hashtab_cnt.  hashtab_cnt thinks it is a 32 bit int and only deals with
the first 4 bytes.  On x86_64 which is little endian, those first 4
bytes and the least significant, so this works out fine.  On ppc64/s390
those first 4 bytes of memory are the high order bits.  So at the end of
the call to hashtab_map nel has a HUGE number.  But the least
significant 32 bits are all 0's.

We then pass that 64 bit number to cpu_to_le32() which happily truncates
it to a 32 bit number and does endian swapping.  But the low 32 bits are
all 0's.  So no matter how many entries are in the hashtab, big endian
systems always say there are 0 entries because I screwed up the
counting.

The fix is easy.  Use a 32 bit int, as the hashtab_cnt expects, for nel.

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
2013-07-25 13:02:44 -04:00
Stephen Smalley
5c73fceb8c SELinux: Enable setting security contexts on rootfs inodes.
rootfs (ramfs) can support setting of security contexts
by userspace due to the vfs fallback behavior of calling
the security module to set the in-core inode state
for security.* attributes when the filesystem does not
provide an xattr handler.  No xattr handler required
as the inodes are pinned in memory and have no backing
store.

This is useful in allowing early userspace to label individual
files within a rootfs while still providing a policy-defined
default via genfs.

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-07-25 13:02:37 -04:00
Waiman Long
a767f680e3 SELinux: Increase ebitmap_node size for 64-bit configuration
Currently, the ebitmap_node structure has a fixed size of 32 bytes. On
a 32-bit system, the overhead is 8 bytes, leaving 24 bytes for being
used as bitmaps. The overhead ratio is 1/4.

On a 64-bit system, the overhead is 16 bytes. Therefore, only 16 bytes
are left for bitmap purpose and the overhead ratio is 1/2. With a
3.8.2 kernel, a boot-up operation will cause the ebitmap_get_bit()
function to be called about 9 million times. The average number of
ebitmap_node traversal is about 3.7.

This patch increases the size of the ebitmap_node structure to 64
bytes for 64-bit system to keep the overhead ratio at 1/4. This may
also improve performance a little bit by making node to node traversal
less frequent (< 2) as more bits are available in each node.

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Acked-by:  Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-07-25 13:02:31 -04:00
Waiman Long
fee7114298 SELinux: Reduce overhead of mls_level_isvalid() function call
While running the high_systime workload of the AIM7 benchmark on
a 2-socket 12-core Westmere x86-64 machine running 3.10-rc4 kernel
(with HT on), it was found that a pretty sizable amount of time was
spent in the SELinux code. Below was the perf trace of the "perf
record -a -s" of a test run at 1500 users:

  5.04%            ls  [kernel.kallsyms]     [k] ebitmap_get_bit
  1.96%            ls  [kernel.kallsyms]     [k] mls_level_isvalid
  1.95%            ls  [kernel.kallsyms]     [k] find_next_bit

The ebitmap_get_bit() was the hottest function in the perf-report
output.  Both the ebitmap_get_bit() and find_next_bit() functions
were, in fact, called by mls_level_isvalid(). As a result, the
mls_level_isvalid() call consumed 8.95% of the total CPU time of
all the 24 virtual CPUs which is quite a lot. The majority of the
mls_level_isvalid() function invocations come from the socket creation
system call.

Looking at the mls_level_isvalid() function, it is checking to see
if all the bits set in one of the ebitmap structure are also set in
another one as well as the highest set bit is no bigger than the one
specified by the given policydb data structure. It is doing it in
a bit-by-bit manner. So if the ebitmap structure has many bits set,
the iteration loop will be done many times.

The current code can be rewritten to use a similar algorithm as the
ebitmap_contains() function with an additional check for the
highest set bit. The ebitmap_contains() function was extended to
cover an optional additional check for the highest set bit, and the
mls_level_isvalid() function was modified to call ebitmap_contains().

With that change, the perf trace showed that the used CPU time drop
down to just 0.08% (ebitmap_contains + mls_level_isvalid) of the
total which is about 100X less than before.

  0.07%            ls  [kernel.kallsyms]     [k] ebitmap_contains
  0.05%            ls  [kernel.kallsyms]     [k] ebitmap_get_bit
  0.01%            ls  [kernel.kallsyms]     [k] mls_level_isvalid
  0.01%            ls  [kernel.kallsyms]     [k] find_next_bit

The remaining ebitmap_get_bit() and find_next_bit() functions calls
are made by other kernel routines as the new mls_level_isvalid()
function will not call them anymore.

This patch also improves the high_systime AIM7 benchmark result,
though the improvement is not as impressive as is suggested by the
reduction in CPU time spent in the ebitmap functions. The table below
shows the performance change on the 2-socket x86-64 system (with HT
on) mentioned above.

+--------------+---------------+----------------+-----------------+
|   Workload   | mean % change | mean % change  | mean % change   |
|              | 10-100 users  | 200-1000 users | 1100-2000 users |
+--------------+---------------+----------------+-----------------+
| high_systime |     +0.1%     |     +0.9%      |     +2.6%       |
+--------------+---------------+----------------+-----------------+

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Acked-by:  Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-07-25 13:02:18 -04:00
Paul Moore
bed4d7efb3 selinux: remove the BUG_ON() from selinux_skb_xfrm_sid()
Remove the BUG_ON() from selinux_skb_xfrm_sid() and propogate the
error code up to the caller.  Also check the return values in the
only caller function, selinux_skb_peerlbl_sid().

Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-07-25 13:02:13 -04:00