mirror of
https://github.com/mozilla/gecko-dev.git
synced 2024-10-16 23:05:42 +00:00
Bug 1391476 - Add UID and GID to cache parameters; r=dustin
The UID and GID that a task executes under is dynamic. As a result, caches need to be aware of the UID and GID that owns files otherwise subsequent tasks could run into permission denied errors. This is why `run-task --chown-recursive` exists. By recursively changing ownership of persisted files, we ensure the current task is able to read and write all existing files. When you take a step back, you realize that chowning of cached files is an expensive workaround. Yes, this results in cache hits. But the cost is you potentially have to perform hundreds of thousands of I/O system calls to mass chown. The ideal situation is that UID/GID is consistent across tasks on any given cache and potentially expensive permissions setting can be avoided. So, that's what this commit does. We add the task's UID and GID to run-task's requirements. When we first see a cache, we record a UID and GID with it and chown the empty cache directory to that UID and GID. Subsequent tasks using this cache *must* use the same UID and GID or else run-task will fail. Since run-task now guarantees that all cache consumers use the same UID and GID, we can avoid a potentially expensive recursive chown. But there is an exception. In untrusted environments (namely Try), we recursively chown existing caches if there is a uid/gid mismatch. We do this because Try is a sandbox and any random task could experiment with a non-standard uid/gid. That populated cache would "poison" the cache for the next caller. Or vice-versa. It would be annoying if caches were randomly poisoned due to Try pushes that didn't realize there was a UID/GID mismatch. We could outlaw "bad" UID and GIDs. But that makes the barrier to testing things on Try harder. So, we go with the flow and recursively chown caches in this scenario. This change will shine light on all tasks using inconsistent UID and GID values on the same cache. Bustage is anticipated. Unfortunately, we can't easily know what will break. So it will be one of those things where we will have to fix problems as they arise. Fortunately, because caches are now tied to the content of run-task, we only need to back out this change and tasks should revert to caches without UID and GID pinning requirements and everything will work again. MozReview-Commit-ID: 2ka4rOnnXIp --HG-- extra : rebase_source : ccb2b0a9230694f989775b26d5276fd3ac928af3 extra : source : 083d2e1cc8fe44b04e44f74bda3dd8bc75ba826c
This commit is contained in:
parent
4e063535e6
commit
5fb99cdfc3
@ -37,6 +37,22 @@ FALLBACK_FINGERPRINT = {
|
||||
":bf:d7:b8:40:84:01:48:9c:26:ce:d9"}
|
||||
|
||||
|
||||
CACHE_UID_GID_MISMATCH = '''
|
||||
There is a UID/GID mismatch on the cache. This likely means:
|
||||
|
||||
a) different tasks are running as a different user/group
|
||||
b) different Docker images have different UID/GID for the same user/group
|
||||
|
||||
Our cache policy is that the UID/GID for ALL tasks must be consistent
|
||||
for the lifetime of the cache. This eliminates permissions problems due
|
||||
to file/directory user/group ownership.
|
||||
|
||||
To make this error go away, ensure that all Docker images are use
|
||||
a consistent UID/GID and that all tasks using this cache are running as
|
||||
the same user/group.
|
||||
'''
|
||||
|
||||
|
||||
NON_EMPTY_VOLUME = '''
|
||||
error: volume %s is not empty
|
||||
|
||||
@ -281,6 +297,12 @@ def main(args):
|
||||
else:
|
||||
caches = []
|
||||
|
||||
if 'TASKCLUSTER_UNTRUSTED_CACHES' in os.environ:
|
||||
untrusted_caches = True
|
||||
del os.environ['TASKCLUSTER_UNTRUSTED_CACHES']
|
||||
else:
|
||||
untrusted_caches = False
|
||||
|
||||
our_requirements = {
|
||||
# Include a version string that we can bump whenever to trigger
|
||||
# fresh caches. The actual value is not relevant and doesn't need
|
||||
@ -288,6 +310,10 @@ def main(args):
|
||||
# hash into cache names, any change to this file/version is sufficient
|
||||
# to force the use of a new cache.
|
||||
b'version=1',
|
||||
# Include the UID and GID the task will run as to ensure that tasks
|
||||
# with different UID and GID don't share the same cache.
|
||||
b'uid=%d' % uid,
|
||||
b'gid=%d' % gid,
|
||||
}
|
||||
|
||||
for cache in caches:
|
||||
@ -297,12 +323,13 @@ def main(args):
|
||||
|
||||
requires_path = os.path.join(cache, '.cacherequires')
|
||||
|
||||
# The cache is empty. No validation necessary. Just set up our
|
||||
# requirements file.
|
||||
# The cache is empty. Configure it.
|
||||
if not os.listdir(cache):
|
||||
print_line(b'cache', b'cache %s is empty; writing requirements: '
|
||||
b'%s\n' % (cache, b' '.join(sorted(our_requirements))))
|
||||
|
||||
# We write a requirements file so future invocations know what the
|
||||
# requirements are.
|
||||
with open(requires_path, 'wb') as fh:
|
||||
fh.write(b'\n'.join(sorted(our_requirements)))
|
||||
|
||||
@ -321,16 +348,40 @@ def main(args):
|
||||
cache, b' '.join(sorted(wanted_requirements))))
|
||||
|
||||
missing = wanted_requirements - our_requirements
|
||||
if missing:
|
||||
|
||||
# Allow requirements mismatch for uid/gid if and only if caches
|
||||
# are untrusted. This allows cache behavior on Try to be
|
||||
# reasonable. Otherwise, random tasks could "poison" cache
|
||||
# usability by introducing uid/gid mismatches. For untrusted
|
||||
# environments like Try, this is a perfectly reasonable thing to
|
||||
# allow.
|
||||
if missing and untrusted_caches and running_as_root and \
|
||||
all(s.startswith(('uid=', 'gid=')) for s in missing):
|
||||
print_line(b'cache',
|
||||
b'cache %s uid/gid mismatch; this is acceptable '
|
||||
b'because caches for this task are untrusted; '
|
||||
b'changing ownership to facilitate cache use\n' %
|
||||
cache)
|
||||
chown_recursive(cache, user.pw_name, group.gr_name, uid, gid)
|
||||
|
||||
# And write out the updated reality.
|
||||
with open(requires_path, 'wb') as fh:
|
||||
fh.write(b'\n'.join(sorted(our_requirements)))
|
||||
|
||||
elif missing:
|
||||
print('requirements for populated cache %s differ from '
|
||||
'this task' % cache)
|
||||
print('cache requirements: %s' % ' '.join(sorted(
|
||||
wanted_requirements)))
|
||||
print('our requirements: %s' % ' '.join(sorted(
|
||||
our_requirements)))
|
||||
if any(s.startswith(('uid=', 'gid=')) for s in missing):
|
||||
print(CACHE_UID_GID_MISMATCH)
|
||||
return 1
|
||||
|
||||
chown_recursive(cache, user.pw_name, group.gr_name, uid, gid)
|
||||
# We don't need to adjust permissions here because the cache is
|
||||
# associated with a uid/gid and the first task should have set
|
||||
# a proper owner/group.
|
||||
|
||||
# The cache has content and no requirements file. This shouldn't
|
||||
# happen because run-task should be the first thing that touches a
|
||||
|
@ -740,6 +740,9 @@ def build_docker_worker_payload(config, task, task_def):
|
||||
payload['env']['TASKCLUSTER_VOLUMES'] = ';'.join(
|
||||
sorted(worker['volumes']))
|
||||
|
||||
if payload.get('cache') and skip_untrusted:
|
||||
payload['env']['TASKCLUSTER_UNTRUSTED_CACHES'] = '1'
|
||||
|
||||
if features:
|
||||
payload['features'] = features
|
||||
if capabilities:
|
||||
|
Loading…
Reference in New Issue
Block a user