gecko-dev/taskcluster/docker/image_builder/download-and-compress
Gregory Szorc 3eb3ce1bf0 Bug 1350447 - Use python-zstandard for Docker image compression; r=dustin
The goal of this change is to switch to python-zstandard for Docker
image compression so we can employ multi-threaded compression. This will
cut down the wall time it takes to compress images, decreasing end-to-end
times.

In order to use python-zstandard, I needed to write a Python script
for doing the compression. Since I was writing a Python script, I
figured I'd move Docker image downloading to that script as well.
This way, the raw Docker image never hits disk: it is streamed straight
from Docker into a zstandard compressor and that output is written to
disk. For large images, this will eliminate a few gigabytes of disk
writes.

The one extra complication about this I don't care for is you need a
special Python package to teach the "requests" package how to download
from UNIX domain sockets.

MozReview-Commit-ID: EufaRzR6A4Y

--HG--
extra : rebase_source : 2143bfee729bdc075c3a87a1e607eff2f0c164d2
2017-03-28 16:19:24 -07:00

86 lines
2.6 KiB
Python
Executable File

#!/usr/bin/python2.7
# This Source Code Form is subject to the terms of the Mozilla Public
# License, v. 2.0. If a copy of the MPL was not distributed with this
# file, You can obtain one at http://mozilla.org/MPL/2.0/.
import os
import sys
import time
import requests
import requests_unixsocket
import zstd
# Allow requests to fetch from UNIX domain sockets.
requests_unixsocket.monkeypatch()
def download_and_compress(url, path, level):
r = requests.get(url, stream=True)
if r.status_code != 200:
raise Exception('non-200 response: %d' % r.status_code)
in_size = 0
out_size = 0
last_progress = time.time()
# Use all available CPU cores for multi-threaded compression.
cctx = zstd.ZstdCompressor(threads=-1, level=level, write_checksum=True)
cobj = cctx.compressobj()
with open(path, 'wb') as fh:
for raw in r.iter_content(zstd.COMPRESSION_RECOMMENDED_INPUT_SIZE):
# Print output periodically, for humans.
now = time.time()
if now - last_progress > 5.0:
print('%d -> %d' % (in_size, out_size))
last_progress = now
in_size += len(raw)
chunk = cobj.compress(raw)
if not chunk:
continue
out_size += len(chunk)
fh.write(chunk)
chunk = cobj.flush()
out_size += len(chunk)
fh.write(chunk)
return in_size, out_size
if __name__ == '__main__':
url, temp_path, final_path = sys.argv[1:]
# Default zstd level is 3. We default to 10 because multi-threaded
# compression allows us to burn lots of CPU for significant image
# size reductions without a major wall time penalty.
level = int(os.environ.get('DOCKER_IMAGE_ZSTD_LEVEL', '10'))
print('using zstandard compression level %d' % level)
count = 0
while count < 10:
count += 1
try:
t_start = time.time()
raw_size, compress_size = download_and_compress(url, temp_path,
level)
elapsed = time.time() - t_start
# Move to final path at end so partial image isn't uploaded as
# an artifact.
os.rename(temp_path, final_path)
speed = int(raw_size / elapsed) / 1000000
print('compression ratio: %.2f (%d -> %d) @ %d MB/s' % (
float(compress_size) / float(raw_size),
raw_size, compress_size, speed))
sys.exit(0)
except Exception as e:
print('exception: %s' % e)
time.sleep(5)
print('reached maximum retry attempts; giving up')
sys.exit(1)