Bug 1646794 - Add basic infrastructure for collecting data on the performance of individual build actions r=mhentges,froydnj

This, hopefully, begins to address an ongoing global problem where we have few, if any, insights into the performance of individual build tasks (compilations, calls into Python scripts, etc.) At most we have aggregated statistics about how long tiers last, combined with `sccache` aggregates across the entire build (which don't cover non-compilation tasks). This has a few implications:

1. It's impossible to identify bottlenecks, except by going out of your way to notice and reproduce them. e.g. no one, to my knowledge, was aware that `make_dafsa.py` was a bottleneck until someone happened to notice and report it in bug 1629337. We could have systems that automatically detect this sort of thing, or at least that make it easier to do so than by CTRL-C'ing in the middle of the build several times to try to reproduce the problem.

2. It's impossible to detect regressions, unless the regression is so pronounced and severe that it has an immediate impact on the overall build time and triggers build time alerts.

3. It's impossible to identify that you have *fixed* regressions, except by doing ad-hoc timing measurements by building individual `make` targets. This is error-prone and annoying.

Here we propose a low-friction system wherein individual build tasks log their build own perf info. For now, that's a write to `stdout` consisting of the string `BUILDTASK ` followed by a simple JSON object with a start time, end time, the `argv` of the task, and an additional `"context"` key (I anticipate this could be used to annotate the task with relevant per-task for later aggregation, for example: was this an `sccache` cache hit or not? For now, it's empty everywhere). The build controller then collects this data, validates it, and writes out the entire list of build tasks as a JSON file after the build has completed, similarly to what we already do with `build_resources.json`. We already parse some `make` output to do stuff like tracking when we switch tiers, so this isn't a huge architectural shift or anything.

In my opinion this "should" happen at the build system, or `make`, level, but `make` doesn't expose anything resembling this information to my knowledge, so this has to be implemented outside of `make`. One could implement something like this at the `sccache` level but that doesn't touch anything but C/C++/Rust compilation tasks; an ideal solution would support other generic build tasks. We could also fork `make` to add this feature ourselves, but for several reasons I don't think that's tractable. :)

Of course, this approach has downsides:

1. We depend on parsing the `stdout` of `make`, and processes can unfortunately sometimes trample on each other, leading to data loss for individual build tasks occasionally. This is a necessary limitation of the model to my knowledge, and I don't know that it can be fixed generally. In my testing, not much data tends to be lost usually.

2. Dumping arbitrary data to `stdout` isn't always possible or desirable. If you're not careful about it this can also result in noisier-than-necessary tasks, especially when those tasks are not invoked by a parent process that knows how to handle the special `BUILDTASK` lines.

3. This data is raw enough where aggregation is not completely trivial.

4. This functionality has to be added for any new kind of build task whose performance we'd like to track; it doesn't come "for free" due to not being able to be implemented at the build system level.

5. The data isn't awfully small due to the `argv`'s (at this point, not nearly big enough where we need to be concerned about it IMO, but maybe that will change in the future?)

One can imagine a couple other architectures that could avoid the first two problems, namely: 1) we could use a "real" database that would not dump info to `stdout` and wouldn't lose data, like `sqlite3`; or, 2) we could set up another server, similar to `sccache`, that collects this data from subprocesses and aggregates it, making sure not to lose any along the way. Both of these have enough overhead, in terms of engineering effort or actual impact on latency, where I dont know that they make any sense to even attempt implementing. The remaining continue to be real issues, however.

After this is landed there are a few ways forward. We can start uploading these files as build artifacts in CI to allow us to reason about performance impacts of changes in `central`. We can easily add this functionality to the `sccache` client to start tracking those builds as well. We already have a very simple visualization of build tier timing in `mach resource-usage`; we could join that data against the `BUILDTASK` data to produce a very clear visualization of build bottlenecks, i.e., "why is the `export` tier taking so long", etc.

Differential Revision: https://phabricator.services.mozilla.com/D80284
This commit is contained in:
Ricky Stewart 2020-08-26 22:47:58 +00:00
parent 76574f22bf
commit 1989d12e47
12 changed files with 70 additions and 10 deletions

View File

@ -95,7 +95,7 @@ class ProcessExecutionMixin(LoggingMixin):
if line_handler:
line_handler(line)
if not log_name:
if line.startswith('BUILDTASK') or not log_name:
return
self.log(log_level, log_name, {'line': line.rstrip()}, '{line}')

View File

@ -17,6 +17,7 @@ from mozbuild.util import (
ensureParentDir,
lock_file,
)
from mozbuild.action.util import log_build_task
def addEntriesToListFile(listFile, entries):
@ -51,4 +52,4 @@ def main(args):
if __name__ == '__main__':
sys.exit(main(sys.argv[1:]))
sys.exit(log_build_task(main, sys.argv[1:]))

View File

@ -13,6 +13,7 @@ import sys
from distutils.version import StrictVersion as Version
import buildconfig
from mozbuild.action.util import log_build_task
from mozbuild.util import memoize
from mozpack.executables import (
get_type,
@ -317,4 +318,4 @@ def main(args):
if __name__ == '__main__':
sys.exit(main(sys.argv[1:]))
sys.exit(log_build_task(main, sys.argv[1:]))

View File

@ -19,6 +19,7 @@ from mozbuild.pythonutil import iter_modules_in_path
from mozbuild.makeutil import Makefile
from mozbuild.util import FileAvoidWrite
import buildconfig
from mozbuild.action.util import log_build_task
def main(argv):
@ -121,4 +122,4 @@ def main(argv):
if __name__ == '__main__':
sys.exit(main(sys.argv[1:]))
sys.exit(log_build_task(main, sys.argv[1:]))

View File

@ -7,6 +7,7 @@ from __future__ import absolute_import, print_function
import sys
import mozbuild.jar
from mozbuild.action.util import log_build_task
def main(args):
@ -14,4 +15,4 @@ def main(args):
if __name__ == '__main__':
sys.exit(main(sys.argv[1:]))
sys.exit(log_build_task(main, sys.argv[1:]))

View File

@ -7,6 +7,7 @@ from __future__ import absolute_import, print_function
import sys
from mozbuild.preprocessor import Preprocessor
from mozbuild.action.util import log_build_task
def generate(output, *args):
@ -22,4 +23,4 @@ def main(args):
if __name__ == "__main__":
main(sys.argv[1:])
log_build_task(main, sys.argv[1:])

View File

@ -21,6 +21,7 @@ from mozpack.manifests import (
InstallManifest,
)
from mozbuild.util import DefinesAction
from mozbuild.action.util import log_build_task
COMPLETE = 'Elapsed: {elapsed:.2f}s; From {dest}: Kept {existing} existing; ' \
@ -111,4 +112,4 @@ def main(argv):
if __name__ == '__main__':
main(sys.argv[1:])
log_build_task(main, sys.argv[1:])

View File

@ -0,0 +1,26 @@
# This Source Code Form is subject to the terms of the Mozilla Public
# License, v. 2.0. If a copy of the MPL was not distributed with this
# file, You can obtain one at http://mozilla.org/MPL/2.0/.
from __future__ import absolute_import, print_function, unicode_literals
import json
import sys
import time
def log_build_task(f, *args, **kwargs):
"""Run the given function, representing an entire build task, and log the
BUILDTASK metadata row to stdout.
"""
start = time.time()
try:
return f(*args, **kwargs)
finally:
end = time.time()
print('BUILDTASK %s' % json.dumps({
"argv": sys.argv,
"start": start,
"end": end,
"context": None
}))

View File

@ -7,6 +7,7 @@ from __future__ import absolute_import, print_function
import sys
from mozwebidlcodegen import BuildSystemWebIDL
from mozbuild.action.util import log_build_task
def main(argv):
@ -16,4 +17,4 @@ def main(argv):
if __name__ == '__main__':
sys.exit(main(sys.argv[1:]))
sys.exit(log_build_task(main, sys.argv[1:]))

View File

@ -25,6 +25,7 @@ from mozbuild.makeutil import Makefile
from mozpack import path as mozpath
from mozbuild.pythonutil import iter_modules_in_path
from mozbuild.util import FileAvoidWrite
from mozbuild.action.util import log_build_task
def process(input_dirs, inc_paths, bindings_conf, header_dir,
@ -124,4 +125,4 @@ def main(argv):
if __name__ == '__main__':
main(sys.argv[1:])
log_build_task(main, sys.argv[1:])

View File

@ -11,6 +11,7 @@ from mozpack.files import FileFinder
from mozpack.copier import Jarrer
from mozpack.errors import errors
from mozpack.path import match
from mozbuild.action.util import log_build_task
import argparse
import mozpack.path as mozpath
@ -43,4 +44,4 @@ def main(args):
if __name__ == '__main__':
main(sys.argv[1:])
log_build_task(main, sys.argv[1:])

View File

@ -241,6 +241,7 @@ class BuildMonitor(MozbuildObject):
self._warnings_collector = WarningsCollector(on_warning,
objdir=self.topobjdir)
self._build_tasks = []
self.build_objects = []
self.build_dirs = set()
@ -303,6 +304,16 @@ class BuildMonitor(MozbuildObject):
raise Exception('Unknown build status: %s' % action)
return BuildOutputResult(None, update_needed, message)
elif line.startswith('BUILDTASK'):
_, data = line.split(maxsplit=1)
# Check that we can parse the JSON. Skip this line if we can't;
# we'll be missing data, but that's not a huge deal.
try:
json.loads(data)
self._build_tasks.append(data)
except json.decoder.JSONDecodeError:
pass
return BuildOutputResult(None, False, None)
warning = None
@ -330,6 +341,20 @@ class BuildMonitor(MozbuildObject):
self.warnings_database.prune()
self.warnings_database.save_to_file(self._warnings_path)
if 'MOZ_AUTOMATION' not in os.environ:
build_tasks_path = self._get_state_filename('build_tasks.json')
with io.open(build_tasks_path, 'w', encoding='utf-8',
newline='\n') as fh:
fh.write('[')
first = True
for task in self._build_tasks:
# We've already verified all of these are valid JSON, so we
# can write the data out to the file directly.
fh.write('%s\n %s' % (',' if not first else '', task))
first = False
fh.write('\n]\n')
# Record usage.
if not record_usage:
return