cgroup: add clone_children control file

The ns_cgroup is a control group interacting with the namespaces.  When a
new namespace is created, a corresponding cgroup is automatically created
too.  The cgroup name is the pid of the process who did 'unshare' or the
child of 'clone'.

This cgroup is tied with the namespace because it prevents a process to
escape the control group and use the post_clone callback, so the child
cgroup inherits the values of the parent cgroup.

Unfortunately, the more we use this cgroup and the more we are facing
problems with it:

(1) when a process unshares, the cgroup name may conflict with a
    previous cgroup with the same pid, so unshare or clone return -EEXIST

(2) the cgroup creation is out of control because there may have an
    application creating several namespaces where the system will
    automatically create several cgroups in his back and let them on the
    cgroupfs (eg.  a vrf based on the network namespace).

(3) the mix of (1) and (2) force an administrator to regularly check
    and clean these cgroups.

This patchset removes the ns_cgroup by adding a new flag to the cgroup and
the cgroupfs mount option.  It enables the copy of the parent cgroup when
a child cgroup is created.  We can then safely remove the ns_cgroup as
this flag brings a compatibility.  We have now to manually create and add
the task to a cgroup, which is consistent with the cgroup framework.

This patch:

Sent as an answer to a previous thread around the ns_cgroup.

https://lists.linux-foundation.org/pipermail/containers/2009-June/018627.html

It adds a control file 'clone_children' for a cgroup.  This control file
is a boolean specifying if the child cgroup should be a clone of the
parent cgroup or not.  The default value is 'false'.

This flag makes the child cgroup to call the post_clone callback of all
the subsystem, if it is available.

At present, the cpuset is the only one which had implemented the
post_clone callback.

The option can be set at mount time by specifying the 'clone_children'
mount option.

Signed-off-by: Daniel Lezcano <daniel.lezcano@free.fr>
Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Paul Menage <menage@google.com>
Reviewed-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Jamal Hadi Salim <hadi@cyberus.ca>
Cc: Matt Helsley <matthltc@us.ibm.com>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This commit is contained in:
Daniel Lezcano 2010-10-27 15:33:35 -07:00 committed by Linus Torvalds
parent 2d3cbf8bc8
commit 97978e6d1f
3 changed files with 55 additions and 2 deletions

View File

@ -18,7 +18,8 @@ CONTENTS:
1.2 Why are cgroups needed ? 1.2 Why are cgroups needed ?
1.3 How are cgroups implemented ? 1.3 How are cgroups implemented ?
1.4 What does notify_on_release do ? 1.4 What does notify_on_release do ?
1.5 How do I use cgroups ? 1.5 What does clone_children do ?
1.6 How do I use cgroups ?
2. Usage Examples and Syntax 2. Usage Examples and Syntax
2.1 Basic Usage 2.1 Basic Usage
2.2 Attaching processes 2.2 Attaching processes
@ -293,7 +294,16 @@ notify_on_release in the root cgroup at system boot is disabled
value of their parents notify_on_release setting. The default value of value of their parents notify_on_release setting. The default value of
a cgroup hierarchy's release_agent path is empty. a cgroup hierarchy's release_agent path is empty.
1.5 How do I use cgroups ? 1.5 What does clone_children do ?
---------------------------------
If the clone_children flag is enabled (1) in a cgroup, then all
cgroups created beneath will call the post_clone callbacks for each
subsystem of the newly created cgroup. Usually when this callback is
implemented for a subsystem, it copies the values of the parent
subsystem, this is the case for the cpuset.
1.6 How do I use cgroups ?
-------------------------- --------------------------
To start a new job that is to be contained within a cgroup, using To start a new job that is to be contained within a cgroup, using

View File

@ -154,6 +154,10 @@ enum {
* A thread in rmdir() is wating for this cgroup. * A thread in rmdir() is wating for this cgroup.
*/ */
CGRP_WAIT_ON_RMDIR, CGRP_WAIT_ON_RMDIR,
/*
* Clone cgroup values when creating a new child cgroup
*/
CGRP_CLONE_CHILDREN,
}; };
/* which pidlist file are we talking about? */ /* which pidlist file are we talking about? */

View File

@ -243,6 +243,11 @@ static int notify_on_release(const struct cgroup *cgrp)
return test_bit(CGRP_NOTIFY_ON_RELEASE, &cgrp->flags); return test_bit(CGRP_NOTIFY_ON_RELEASE, &cgrp->flags);
} }
static int clone_children(const struct cgroup *cgrp)
{
return test_bit(CGRP_CLONE_CHILDREN, &cgrp->flags);
}
/* /*
* for_each_subsys() allows you to iterate on each subsystem attached to * for_each_subsys() allows you to iterate on each subsystem attached to
* an active hierarchy * an active hierarchy
@ -1040,6 +1045,8 @@ static int cgroup_show_options(struct seq_file *seq, struct vfsmount *vfs)
seq_puts(seq, ",noprefix"); seq_puts(seq, ",noprefix");
if (strlen(root->release_agent_path)) if (strlen(root->release_agent_path))
seq_printf(seq, ",release_agent=%s", root->release_agent_path); seq_printf(seq, ",release_agent=%s", root->release_agent_path);
if (clone_children(&root->top_cgroup))
seq_puts(seq, ",clone_children");
if (strlen(root->name)) if (strlen(root->name))
seq_printf(seq, ",name=%s", root->name); seq_printf(seq, ",name=%s", root->name);
mutex_unlock(&cgroup_mutex); mutex_unlock(&cgroup_mutex);
@ -1050,6 +1057,7 @@ struct cgroup_sb_opts {
unsigned long subsys_bits; unsigned long subsys_bits;
unsigned long flags; unsigned long flags;
char *release_agent; char *release_agent;
bool clone_children;
char *name; char *name;
/* User explicitly requested empty subsystem */ /* User explicitly requested empty subsystem */
bool none; bool none;
@ -1097,6 +1105,8 @@ static int parse_cgroupfs_options(char *data, struct cgroup_sb_opts *opts)
opts->none = true; opts->none = true;
} else if (!strcmp(token, "noprefix")) { } else if (!strcmp(token, "noprefix")) {
set_bit(ROOT_NOPREFIX, &opts->flags); set_bit(ROOT_NOPREFIX, &opts->flags);
} else if (!strcmp(token, "clone_children")) {
opts->clone_children = true;
} else if (!strncmp(token, "release_agent=", 14)) { } else if (!strncmp(token, "release_agent=", 14)) {
/* Specifying two release agents is forbidden */ /* Specifying two release agents is forbidden */
if (opts->release_agent) if (opts->release_agent)
@ -1355,6 +1365,8 @@ static struct cgroupfs_root *cgroup_root_from_opts(struct cgroup_sb_opts *opts)
strcpy(root->release_agent_path, opts->release_agent); strcpy(root->release_agent_path, opts->release_agent);
if (opts->name) if (opts->name)
strcpy(root->name, opts->name); strcpy(root->name, opts->name);
if (opts->clone_children)
set_bit(CGRP_CLONE_CHILDREN, &root->top_cgroup.flags);
return root; return root;
} }
@ -3173,6 +3185,23 @@ fail:
return ret; return ret;
} }
static u64 cgroup_clone_children_read(struct cgroup *cgrp,
struct cftype *cft)
{
return clone_children(cgrp);
}
static int cgroup_clone_children_write(struct cgroup *cgrp,
struct cftype *cft,
u64 val)
{
if (val)
set_bit(CGRP_CLONE_CHILDREN, &cgrp->flags);
else
clear_bit(CGRP_CLONE_CHILDREN, &cgrp->flags);
return 0;
}
/* /*
* for the common functions, 'private' gives the type of file * for the common functions, 'private' gives the type of file
*/ */
@ -3203,6 +3232,11 @@ static struct cftype files[] = {
.write_string = cgroup_write_event_control, .write_string = cgroup_write_event_control,
.mode = S_IWUGO, .mode = S_IWUGO,
}, },
{
.name = "cgroup.clone_children",
.read_u64 = cgroup_clone_children_read,
.write_u64 = cgroup_clone_children_write,
},
}; };
static struct cftype cft_release_agent = { static struct cftype cft_release_agent = {
@ -3332,6 +3366,9 @@ static long cgroup_create(struct cgroup *parent, struct dentry *dentry,
if (notify_on_release(parent)) if (notify_on_release(parent))
set_bit(CGRP_NOTIFY_ON_RELEASE, &cgrp->flags); set_bit(CGRP_NOTIFY_ON_RELEASE, &cgrp->flags);
if (clone_children(parent))
set_bit(CGRP_CLONE_CHILDREN, &cgrp->flags);
for_each_subsys(root, ss) { for_each_subsys(root, ss) {
struct cgroup_subsys_state *css = ss->create(ss, cgrp); struct cgroup_subsys_state *css = ss->create(ss, cgrp);
@ -3346,6 +3383,8 @@ static long cgroup_create(struct cgroup *parent, struct dentry *dentry,
goto err_destroy; goto err_destroy;
} }
/* At error, ->destroy() callback has to free assigned ID. */ /* At error, ->destroy() callback has to free assigned ID. */
if (clone_children(parent) && ss->post_clone)
ss->post_clone(ss, cgrp);
} }
cgroup_lock_hierarchy(root); cgroup_lock_hierarchy(root);