Commit Graph

11 Commits

Author SHA1 Message Date
Nick Thomas
0b738d5db5 Bug 1471767 - taskcluster documentation fixes, r=dustin
Assorted fixes from trawling the sphinx logs - malformed formatting, broken references, leftovers from renaming action-task to action-callback and removing
yaml-templates, docstring fixes to make sphinx happier, and typos.

MozReview-Commit-ID: 6jUOljdLoE2

--HG--
extra : rebase_source : f2a9dbcde5180f760a80f47691a70eba76e58bad
2018-06-27 21:48:10 +12:00
Mike Hommey
f88329d02f Bug 1409260 - Remove tc-vcs caches. r=dustin,gps
--HG--
extra : rebase_source : bc8a0e8807c1dd6d2a662c7c1fc8ad33af88efe9
2017-10-17 15:12:18 +09:00
Gregory Szorc
22bae15639 Bug 1390700 - Support sparse checkouts in run-task; r=dustin
`run-task` is taught a --sparse-profile argument to be passed down
to `hg robustcheckout` for the main source checkout. It does what
you expect: performs a sparse checkout using the named profile.

The Taskgraph YAML for run-task is taught a "sparse-profile"
property to define the sparse profile. When defined, --sparse-profile
will be passed down to `run-task` and the cache name will be updated
to reflect the use of sparse checkout.

Our cache checking transform is updated to audit for the use of
--sparse-profile without the corresponding "-sparse" cache name
variation.

The reason we need a distinct cache name for sparse is because
clients that aren't sparse aware will be unable to read checkouts
that are sparse. By forcing sparse and non-sparse into different
cache pools, we avoid compatibility issues.

In the ideal world, we probably support sparse profiles on all the
VCS checkouts that `run-task` supports (e.g. --tools-checkout).
Perfect is the enemy of done. All of this is defined in-tree and
it is easy enough to change atomically.

MozReview-Commit-ID: 79k7Vul0hHO

--HG--
extra : rebase_source : babe9b42e2796c2341bffc6ecfe829f4daff9e0f
2017-08-23 18:54:14 -07:00
Dustin J. Mitchell
98a3631d7b Bug 1391776: cleanup of taskgraph docs; r=ahal
* eliminate heading for test kinds, of which there is now only one
* make the caches document have a single heading in the TOC
* break out mach commands into a separate document, add ./mach taskgraph morphed
* remove docs for YAML templates support (the .yml file wasn't actually
  used -- I expect it was a merge leftover); these are still used for actions.yml,
  but once that is gone the code should be removed, too.
* break try out into its own document, edit to distinguish "how to run try"
  from "how to generate config"

MozReview-Commit-ID: 76ZopWA9TPL

--HG--
extra : rebase_source : 6946d866f9df6eec591b9a05ddedc6467dd69e4b
2017-08-23 15:22:10 -04:00
Gregory Szorc
02d1cf283b Bug 1391789 - Improve cache coherence via run-task integration; r=dustin
Today, cache names are mostly static and are brittle as a result.

In theory, when a backwards incompatible change is performed on
something that touches a cache, the cache name needs to be changed
to ensure tasks running the old code don't see cached data from the
new task. (Alternatively, all code is forward compatible, but that is
hard to implement in practice.)

For many things, the process works as planned. However, not everyone
knows that cache names need changed. And, it isn't always obvious
that some things require fresh caches. When mistakes are made, tasks
break intermittently due to cache wonkiness.

One area where we get into trouble is with UID and GID mismatch.
Task A will use a Docker image where our standard "worker" user/group
is UID/GID 1000:1000. Then Task B will use UID/GID 500:500. (This is
common when mixing Debian and RedHel based distros.) If they use the
same cache, then Task B needs to chown/chmod all files in the cache
or there could be a permissions problem. This is exactly why
run-task recursively chowns certain paths before dropping root
privileges.

Permissions setting in run-task solves permissions problems. But
it doesn't solve content incompatibility problems. For that, you
need to change cache names, not use caches, or blow away content
when incompatibilities are detected.

This commit starts the process of adding a little bit more coherence
to our caching story.

There are two main features in this commit:

1) Cache names tied to run-task content
2) Cache validation in run-task

Taskgraph now detects when a task is using caches with run-task. When
caches and run-task are both being used, the cache name is adjusted to
contain a hash of run-task's content. When run-task changes, the cache
name changes. So, changing run-task ensures that all caches from that point
forward are "clean." This frees run-task and any functionality related
to run-task (such as maintaining version control checkouts) from
having to maintain backwards or forwards compatibility with any other
version of run-task. This does mean that any changes to run-task
effectively wipe out caches. But changes to run-task tend to be
seldom, so this should be acceptable.

The second part of this change is code in run-task to record per-cache
properties and validate whether a populated cache is appropriate for
use. To enable this, taskgraph passes a list of cache paths via an
environment variable. For each cache path, run-task looks for a
well-defined file containing a list of "requirements." Right now,
that list is simply a version string. But other features will be
worked into it. If the cache is empty, we simply write out a new
requirements file and are done. If the file exists, we compare
requirements and fail fast if there is a mismatch. If the cache
has content but not this special file, then we abort (because this
should never happen).

The "requirements" validation isn't very useful now because the only
entry comes from run-task's source code and modifying run-task will
change the hash and cause a new cache to be used. The implementation
at this point is more demonstrating the concept than doing anything
terribly useful with it.

MozReview-Commit-ID: HtpXIc7OD1k

--HG--
extra : rebase_source : 2424696b1fde59f20152617a6ebb2afe14b94678
2017-08-18 14:07:03 -07:00
Gregory Szorc
f114aa8c35 Bug 1391789 - Make tooltool cache level dependent; r=dustin
Caches shared across levels scare me, even if readers are purported to
perform content verification. We shouldn't take any risks with released
Firefox builds being contaminated by e.g. Try tasks.

Also, the old cache name interferes with my desire to make cache
names dynamic. This requires dynamic scopes. We already have
have level-{{level}}-* scopes for caches. So having all caches
prefixed with this makes things flexible.

MozReview-Commit-ID: LsrcxIYoEh1

--HG--
extra : rebase_source : dfe97f92a726059200ed79afe215ef2cf1fd7bf1
2017-08-18 16:15:44 -07:00
Tom Prince
6aa707b866 Bug 1380833 - Document level-*-checkouts-v1's hg-store and the HG_STORE_PATH environment variable; r=dustin
MozReview-Commit-ID: 9yRuKONHTvP

--HG--
extra : rebase_source : dd8f28f9d96748bf74b4a7dd1bfdbe5cbaa4c950
2017-07-13 16:01:41 -06:00
Gregory Szorc
b547239226 Bug 1312475 - Add a version parameter to checkouts cache; r=dustin
ff5a4bab0813 (bug 1311791) and 332a08725ed0 (bug 1292071) changed
behavior of the VCS caches. First, the store cache / path was merged
into the checkouts cache. Then the path of the Mercurial shared store
was moved within the cache to always be rooted at the cache root.

Caches are shared across tasks. Tasks can execute on any revision
configured to use a cache. So, when interacting with caches, it is
important to consider how every revision configured to use that
cache will interact with it.

Take this scenario for example.

A worker executes a task where the hg shared store is rooted at
/home/worker/checkouts/src/hg-shared. Then the worker executes a
task where the hg shared store is rooted at
/home/worker/checkouts/hg-store. `hg robustcheckout` will see the
checkout from the first task. But then it sees that the store
it is pointing to is at an unexpected location
(checkouts/src/hg-shared instead of checkouts/hg-store). `hg
robustcheckout` aggressively normalizes state to ensure
consistency. So when it sees this mismatch, it blows away the
checkout and creates one from checkouts/hg-store to replace it.
That's a lot of overhead. And this cycle can repeat itself if
the right combination of revisions run on the worker!

A solution to this problem is to create a clean break from caches
when cache semantics change. In TaskCluster, that means using a
different cache.

This commit introduces a "version" component to the checkouts
cache name. By doing so, we create a clean break from all previous
caches, ensuring all revisions this point forward won't encounter
an hg shared store at an unexpected location. This also paves the
road for easily making additional clean breaks in the future.

MozReview-Commit-ID: JT8yuULKpch

--HG--
extra : rebase_source : 2e5d321e4314ff7543423d3c7837b770b7c8a3a3
2016-10-24 09:58:01 -07:00
Gregory Szorc
f67f676a52 Bug 1292071 - Put Mercurial store on same cache as checkout; r=dustin
We were seeing issues with the Mercurial working directory not being
pristine. While I can't reproduce this, I have a hunch it is due to
mixing and matching stores and checkouts in TaskCluster. For example,
if a worker supports running concurrent tasks and 2 tasks arrive at
the same time, the caches for the store and checkout may look like:

  (store0, checkout0)
  (store1, checkout1)

However, the next task may get:

  (store1, checkout0)

This may confuse Mercurial.

This commit eliminates the "hg-shared" cache and places the shared
stores as a sibling directory of the checkout.

MozReview-Commit-ID: 8SzyS6wWf9C

--HG--
extra : rebase_source : 6205f3fe944a6ad2cb833a5a7c0ae5ba136eaea6
2016-10-18 09:46:55 -07:00
Dustin J. Mitchell
123cde8ef1 Bug 1302776: document caches; r=gps
MozReview-Commit-ID: Dy9aXGo5O4u

--HG--
extra : rebase_source : 265f7548bb83db2cc5d8bc5e7bdfbfd81342c7e5
2016-10-20 14:44:26 +00:00
Gregory Szorc
96fa6fe838 Bug 1289643 - Change path for checkouts from "workspace" to "checkouts"; r=dustin
Currently, TaskCluster tasks tend to use the "workspace" directory as
a cache that manages the source checkout *and* additional state.

Historically at Mozilla, we've lumped "source checkout" and "workspace"
(sometimes known as an "objdir") into the same directory. This is
not ideal. Ideally, there is an immutable, read-only source checkout
and all files produced from that source live in a separate directory.

In this commit, the "workspace" directory for the "lint" image has been
renamed to "checkouts" and all tasks using the image have been updated
accordingly. By having "checkout" in the name, we clearly identify this
cache as being relevant to source checkouts, which IMO can serve a
different role from "workspaces." This distinction is important, as the
next commit will prevent the "checkouts" cache from getting optimized
out in certain tasks.

To hammer this point home, documentation on common caches has been
introduced.

MozReview-Commit-ID: BSEc4dM5YCt

--HG--
extra : rebase_source : 5a62939e066d3723736b41e14007112d92346684
2016-07-29 10:44:19 -07:00