mirror of
https://github.com/FEX-Emu/linux.git
synced 2025-01-15 14:10:43 +00:00
6255c46fa0
cgroup-legacy may be too loaded. Rename the docs so that they're postfixed with v1 and v2. * s/cgroup-legacy/cgroup-v1/ * s/cgroup.txt/cgroup-v2.txt/ Signed-off-by: Tejun Heo <tj@kernel.org>
117 lines
4.3 KiB
Plaintext
117 lines
4.3 KiB
Plaintext
Device Whitelist Controller
|
|
|
|
1. Description:
|
|
|
|
Implement a cgroup to track and enforce open and mknod restrictions
|
|
on device files. A device cgroup associates a device access
|
|
whitelist with each cgroup. A whitelist entry has 4 fields.
|
|
'type' is a (all), c (char), or b (block). 'all' means it applies
|
|
to all types and all major and minor numbers. Major and minor are
|
|
either an integer or * for all. Access is a composition of r
|
|
(read), w (write), and m (mknod).
|
|
|
|
The root device cgroup starts with rwm to 'all'. A child device
|
|
cgroup gets a copy of the parent. Administrators can then remove
|
|
devices from the whitelist or add new entries. A child cgroup can
|
|
never receive a device access which is denied by its parent.
|
|
|
|
2. User Interface
|
|
|
|
An entry is added using devices.allow, and removed using
|
|
devices.deny. For instance
|
|
|
|
echo 'c 1:3 mr' > /sys/fs/cgroup/1/devices.allow
|
|
|
|
allows cgroup 1 to read and mknod the device usually known as
|
|
/dev/null. Doing
|
|
|
|
echo a > /sys/fs/cgroup/1/devices.deny
|
|
|
|
will remove the default 'a *:* rwm' entry. Doing
|
|
|
|
echo a > /sys/fs/cgroup/1/devices.allow
|
|
|
|
will add the 'a *:* rwm' entry to the whitelist.
|
|
|
|
3. Security
|
|
|
|
Any task can move itself between cgroups. This clearly won't
|
|
suffice, but we can decide the best way to adequately restrict
|
|
movement as people get some experience with this. We may just want
|
|
to require CAP_SYS_ADMIN, which at least is a separate bit from
|
|
CAP_MKNOD. We may want to just refuse moving to a cgroup which
|
|
isn't a descendant of the current one. Or we may want to use
|
|
CAP_MAC_ADMIN, since we really are trying to lock down root.
|
|
|
|
CAP_SYS_ADMIN is needed to modify the whitelist or move another
|
|
task to a new cgroup. (Again we'll probably want to change that).
|
|
|
|
A cgroup may not be granted more permissions than the cgroup's
|
|
parent has.
|
|
|
|
4. Hierarchy
|
|
|
|
device cgroups maintain hierarchy by making sure a cgroup never has more
|
|
access permissions than its parent. Every time an entry is written to
|
|
a cgroup's devices.deny file, all its children will have that entry removed
|
|
from their whitelist and all the locally set whitelist entries will be
|
|
re-evaluated. In case one of the locally set whitelist entries would provide
|
|
more access than the cgroup's parent, it'll be removed from the whitelist.
|
|
|
|
Example:
|
|
A
|
|
/ \
|
|
B
|
|
|
|
group behavior exceptions
|
|
A allow "b 8:* rwm", "c 116:1 rw"
|
|
B deny "c 1:3 rwm", "c 116:2 rwm", "b 3:* rwm"
|
|
|
|
If a device is denied in group A:
|
|
# echo "c 116:* r" > A/devices.deny
|
|
it'll propagate down and after revalidating B's entries, the whitelist entry
|
|
"c 116:2 rwm" will be removed:
|
|
|
|
group whitelist entries denied devices
|
|
A all "b 8:* rwm", "c 116:* rw"
|
|
B "c 1:3 rwm", "b 3:* rwm" all the rest
|
|
|
|
In case parent's exceptions change and local exceptions are not allowed
|
|
anymore, they'll be deleted.
|
|
|
|
Notice that new whitelist entries will not be propagated:
|
|
A
|
|
/ \
|
|
B
|
|
|
|
group whitelist entries denied devices
|
|
A "c 1:3 rwm", "c 1:5 r" all the rest
|
|
B "c 1:3 rwm", "c 1:5 r" all the rest
|
|
|
|
when adding "c *:3 rwm":
|
|
# echo "c *:3 rwm" >A/devices.allow
|
|
|
|
the result:
|
|
group whitelist entries denied devices
|
|
A "c *:3 rwm", "c 1:5 r" all the rest
|
|
B "c 1:3 rwm", "c 1:5 r" all the rest
|
|
|
|
but now it'll be possible to add new entries to B:
|
|
# echo "c 2:3 rwm" >B/devices.allow
|
|
# echo "c 50:3 r" >B/devices.allow
|
|
or even
|
|
# echo "c *:3 rwm" >B/devices.allow
|
|
|
|
Allowing or denying all by writing 'a' to devices.allow or devices.deny will
|
|
not be possible once the device cgroups has children.
|
|
|
|
4.1 Hierarchy (internal implementation)
|
|
|
|
device cgroups is implemented internally using a behavior (ALLOW, DENY) and a
|
|
list of exceptions. The internal state is controlled using the same user
|
|
interface to preserve compatibility with the previous whitelist-only
|
|
implementation. Removal or addition of exceptions that will reduce the access
|
|
to devices will be propagated down the hierarchy.
|
|
For every propagated exception, the effective rules will be re-evaluated based
|
|
on current parent's access rules.
|