gecko-dev/js/xpconnect/loader/ScriptPreloader.cpp

1247 lines
40 KiB
C++
Raw Normal View History

Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
/* -*- Mode: C++; tab-width: 8; indent-tabs-mode: nil; c-basic-offset: 4 -*- */
/* vim: set ts=8 sts=4 et sw=4 tw=99: */
/* This Source Code Form is subject to the terms of the Mozilla Public
* License, v. 2.0. If a copy of the MPL was not distributed with this
* file, You can obtain one at http://mozilla.org/MPL/2.0/. */
#include "ScriptPreloader-inl.h"
#include "mozilla/ScriptPreloader.h"
#include "mozilla/loader/ScriptCacheActors.h"
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
#include "mozilla/URLPreloader.h"
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
#include "mozilla/ArrayUtils.h"
#include "mozilla/ClearOnShutdown.h"
#include "mozilla/FileUtils.h"
#include "mozilla/Logging.h"
#include "mozilla/ScopeExit.h"
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
#include "mozilla/Services.h"
#include "mozilla/Telemetry.h"
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
#include "mozilla/Unused.h"
#include "mozilla/dom/ContentChild.h"
#include "mozilla/dom/ContentParent.h"
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
#include "MainThreadUtils.h"
#include "nsDebug.h"
#include "nsDirectoryServiceUtils.h"
#include "nsIFile.h"
#include "nsIObserverService.h"
#include "nsJSUtils.h"
#include "nsMemoryReporterManager.h"
#include "nsNetUtil.h"
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
#include "nsProxyRelease.h"
#include "nsThreadUtils.h"
#include "nsXULAppAPI.h"
#include "xpcpublic.h"
#define STARTUP_COMPLETE_TOPIC "browser-delayed-startup-finished"
#define DOC_ELEM_INSERTED_TOPIC "document-element-inserted"
#define CONTENT_DOCUMENT_LOADED_TOPIC "content-document-loaded"
#define CACHE_WRITE_TOPIC "browser-idle-startup-tasks-finished"
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
#define CLEANUP_TOPIC "xpcom-shutdown"
#define SHUTDOWN_TOPIC "quit-application-granted"
#define CACHE_INVALIDATE_TOPIC "startupcache-invalidate"
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
// The maximum time we'll wait for a child process to finish starting up before
// we send its script data back to the parent.
constexpr uint32_t CHILD_STARTUP_TIMEOUT_MS = 8000;
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
namespace mozilla {
namespace {
static LazyLogModule gLog("ScriptPreloader");
#define LOG(level, ...) MOZ_LOG(gLog, LogLevel::level, (__VA_ARGS__))
}
using mozilla::dom::AutoJSAPI;
using mozilla::dom::ContentChild;
using mozilla::dom::ContentParent;
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
using namespace mozilla::loader;
ProcessType ScriptPreloader::sProcessType;
// This type correspond to js::vm::XDRAlignment type, which is used as a size
// reference for alignment of XDR buffers.
using XDRAlign = uint16_t;
static const uint8_t sAlignPadding[sizeof(XDRAlign)] = { 0, 0 };
static inline size_t
ComputeByteAlignment(size_t bytes, size_t align)
{
return (align - (bytes % align)) % align;
}
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
nsresult
ScriptPreloader::CollectReports(nsIHandleReportCallback* aHandleReport,
nsISupports* aData, bool aAnonymize)
{
MOZ_COLLECT_REPORT(
"explicit/script-preloader/heap/saved-scripts", KIND_HEAP, UNITS_BYTES,
SizeOfHashEntries<ScriptStatus::Saved>(mScripts, MallocSizeOf),
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
"Memory used to hold the scripts which have been executed in this "
"session, and will be written to the startup script cache file.");
MOZ_COLLECT_REPORT(
"explicit/script-preloader/heap/restored-scripts", KIND_HEAP, UNITS_BYTES,
SizeOfHashEntries<ScriptStatus::Restored>(mScripts, MallocSizeOf),
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
"Memory used to hold the scripts which have been restored from the "
"startup script cache file, but have not been executed in this session.");
MOZ_COLLECT_REPORT(
"explicit/script-preloader/heap/other", KIND_HEAP, UNITS_BYTES,
ShallowHeapSizeOfIncludingThis(MallocSizeOf),
"Memory used by the script cache service itself.");
// Since the mem-mapped cache file is mapped into memory, we want to report
// it as explicit memory somewhere. But since the child cache is shared
// between all processes, we don't want to report it as explicit memory for
// all of them. So we report it as explicit only in the parent process, and
// non-explicit everywhere else.
if (XRE_IsParentProcess()) {
MOZ_COLLECT_REPORT(
"explicit/script-preloader/non-heap/memmapped-cache", KIND_NONHEAP, UNITS_BYTES,
mCacheData.nonHeapSizeOfExcludingThis(),
"The memory-mapped startup script cache file.");
} else {
MOZ_COLLECT_REPORT(
"script-preloader-memmapped-cache", KIND_NONHEAP, UNITS_BYTES,
mCacheData.nonHeapSizeOfExcludingThis(),
"The memory-mapped startup script cache file.");
}
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
return NS_OK;
}
ScriptPreloader&
ScriptPreloader::GetSingleton()
{
static RefPtr<ScriptPreloader> singleton;
if (!singleton) {
if (XRE_IsParentProcess()) {
singleton = new ScriptPreloader();
singleton->mChildCache = &GetChildSingleton();
Unused << singleton->InitCache();
} else {
singleton = &GetChildSingleton();
}
ClearOnShutdown(&singleton);
}
return *singleton;
}
// The child singleton is available in all processes, including the parent, and
// is used for scripts which are expected to be loaded into child processes
// (such as process and frame scripts), or scripts that have already been loaded
// into a child. The child caches are managed as follows:
//
// - Every startup, we open the cache file from the last session, move it to a
// new location, and begin pre-loading the scripts that are stored in it. There
// is a separate cache file for parent and content processes, but the parent
// process opens both the parent and content cache files.
//
// - Once startup is complete, we write a new cache file for the next session,
// containing only the scripts that were used during early startup, so we don't
// waste pre-loading scripts that may not be needed.
//
// - For content processes, opening and writing the cache file is handled in the
// parent process. The first content process of each type sends back the data
// for scripts that were loaded in early startup, and the parent merges them and
// writes them to a cache file.
//
// - Currently, content processes only benefit from the cache data written
// during the *previous* session. Ideally, new content processes should probably
// use the cache data written during this session if there was no previous cache
// file, but I'd rather do that as a follow-up.
ScriptPreloader&
ScriptPreloader::GetChildSingleton()
{
static RefPtr<ScriptPreloader> singleton;
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
if (!singleton) {
singleton = new ScriptPreloader();
if (XRE_IsParentProcess()) {
Unused << singleton->InitCache(NS_LITERAL_STRING("scriptCache-child"));
}
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
ClearOnShutdown(&singleton);
}
return *singleton;
}
void
ScriptPreloader::InitContentChild(ContentParent& parent)
{
auto& cache = GetChildSingleton();
// We want startup script data from the first process of a given type.
// That process sends back its script data before it executes any
// untrusted code, and then we never accept further script data for that
// type of process for the rest of the session.
//
// The script data from each process type is merged with the data from the
// parent process's frame and process scripts, and shared between all
// content process types in the next session.
//
// Note that if the first process of a given type crashes or shuts down
// before sending us its script data, we silently ignore it, and data for
// that process type is not included in the next session's cache. This
// should be a sufficiently rare occurrence that it's not worth trying to
// handle specially.
auto processType = GetChildProcessType(parent.GetRemoteType());
bool wantScriptData = !cache.mInitializedProcesses.contains(processType);
cache.mInitializedProcesses += processType;
auto fd = cache.mCacheData.cloneFileDescriptor();
// Don't send original cache data to new processes if the cache has been
// invalidated.
if (fd.IsValid() && !cache.mCacheInvalidated) {
Unused << parent.SendPScriptCacheConstructor(fd, wantScriptData);
} else {
Unused << parent.SendPScriptCacheConstructor(NS_ERROR_FILE_NOT_FOUND, wantScriptData);
}
}
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
ProcessType
ScriptPreloader::GetChildProcessType(const nsAString& remoteType)
{
if (remoteType.EqualsLiteral(EXTENSION_REMOTE_TYPE)) {
return ProcessType::Extension;
}
if (remoteType.EqualsLiteral(PRIVILEGED_REMOTE_TYPE)) {
return ProcessType::Privileged;
}
return ProcessType::Web;
}
namespace {
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
static void
TraceOp(JSTracer* trc, void* data)
{
auto preloader = static_cast<ScriptPreloader*>(data);
preloader->Trace(trc);
}
} // anonymous namespace
void
ScriptPreloader::Trace(JSTracer* trc)
{
for (auto& script : IterHash(mScripts)) {
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
JS::TraceEdge(trc, &script->mScript, "ScriptPreloader::CachedScript.mScript");
}
}
ScriptPreloader::ScriptPreloader()
: mMonitor("[ScriptPreloader.mMonitor]")
, mSaveMonitor("[ScriptPreloader.mSaveMonitor]")
{
// We do not set the process type for child processes here because the
// remoteType in ContentChild is not ready yet.
if (XRE_IsParentProcess()) {
sProcessType = ProcessType::Parent;
}
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
nsCOMPtr<nsIObserverService> obs = services::GetObserverService();
MOZ_RELEASE_ASSERT(obs);
if (XRE_IsParentProcess()) {
// In the parent process, we want to freeze the script cache as soon
// as idle tasks for the first browser window have completed.
obs->AddObserver(this, STARTUP_COMPLETE_TOPIC, false);
obs->AddObserver(this, CACHE_WRITE_TOPIC, false);
}
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
obs->AddObserver(this, SHUTDOWN_TOPIC, false);
obs->AddObserver(this, CLEANUP_TOPIC, false);
obs->AddObserver(this, CACHE_INVALIDATE_TOPIC, false);
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
AutoSafeJSAPI jsapi;
JS_AddExtraGCRootsTracer(jsapi.cx(), TraceOp, this);
}
void
ScriptPreloader::ForceWriteCacheFile()
{
if (mSaveThread) {
MonitorAutoLock mal(mSaveMonitor);
// Make sure we've prepared scripts, so we don't risk deadlocking while
// dispatching the prepare task during shutdown.
PrepareCacheWrite();
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
// Unblock the save thread, so it can start saving before we get to
// XPCOM shutdown.
mal.Notify();
}
}
void
ScriptPreloader::Cleanup()
{
if (mSaveThread) {
MonitorAutoLock mal(mSaveMonitor);
// Make sure the save thread is not blocked dispatching a sync task to
// the main thread, or we will deadlock.
MOZ_RELEASE_ASSERT(!mBlockedOnSyncDispatch);
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
while (!mSaveComplete && mSaveThread) {
mal.Wait();
}
}
// Wait for any pending parses to finish before clearing the mScripts
// hashtable, since the parse tasks depend on memory allocated by those
// scripts.
{
MonitorAutoLock mal(mMonitor);
FinishPendingParses(mal);
mScripts.Clear();
}
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
AutoSafeJSAPI jsapi;
JS_RemoveExtraGCRootsTracer(jsapi.cx(), TraceOp, this);
UnregisterWeakMemoryReporter(this);
}
void
ScriptPreloader::InvalidateCache()
{
mMonitor.AssertNotCurrentThreadOwns();
MonitorAutoLock mal(mMonitor);
mCacheInvalidated = true;
// Wait for pending off-thread parses to finish, since they depend on the
// memory allocated by our CachedScripts, and can't be canceled
// asynchronously.
FinishPendingParses(mal);
// Pending scripts should have been cleared by the above, and new parses
// should not have been queued.
MOZ_ASSERT(mParsingScripts.empty());
MOZ_ASSERT(mParsingSources.empty());
MOZ_ASSERT(mPendingScripts.isEmpty());
for (auto& script : IterHash(mScripts)) {
script.Remove();
}
// If we've already finished saving the cache at this point, start a new
// delayed save operation. This will write out an empty cache file in place
// of any cache file we've already written out this session, which will
// prevent us from falling back to the current session's cache file on the
// next startup.
if (mSaveComplete && mChildCache) {
mSaveComplete = false;
// Make sure scripts are prepared to avoid deadlock when invalidating
// the cache during shutdown.
PrepareCacheWriteInternal();
Unused << NS_NewNamedThread("SaveScripts",
getter_AddRefs(mSaveThread), this);
}
}
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
nsresult
ScriptPreloader::Observe(nsISupports* subject, const char* topic, const char16_t* data)
{
nsCOMPtr<nsIObserverService> obs = services::GetObserverService();
if (!strcmp(topic, STARTUP_COMPLETE_TOPIC)) {
obs->RemoveObserver(this, STARTUP_COMPLETE_TOPIC);
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
MOZ_ASSERT(XRE_IsParentProcess());
mStartupFinished = true;
} else if (!strcmp(topic, CACHE_WRITE_TOPIC)) {
obs->RemoveObserver(this, CACHE_WRITE_TOPIC);
MOZ_ASSERT(mStartupFinished);
MOZ_ASSERT(XRE_IsParentProcess());
if (mChildCache) {
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
Unused << NS_NewNamedThread("SaveScripts",
getter_AddRefs(mSaveThread), this);
}
} else if (mContentStartupFinishedTopic.Equals(topic)) {
// If this is an uninitialized about:blank viewer or a chrome: document
// (which should always be an XBL binding document), ignore it. We don't
// have to worry about it loading malicious content.
if (nsCOMPtr<nsIDocument> doc = do_QueryInterface(subject)) {
nsCOMPtr<nsIURI> uri = doc->GetDocumentURI();
bool schemeIs;
if ((NS_IsAboutBlank(uri) &&
doc->GetReadyStateEnum() == doc->READYSTATE_UNINITIALIZED) ||
(NS_SUCCEEDED(uri->SchemeIs("chrome", &schemeIs)) && schemeIs)) {
return NS_OK;
}
}
FinishContentStartup();
} else if (!strcmp(topic, "timer-callback")) {
FinishContentStartup();
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
} else if (!strcmp(topic, SHUTDOWN_TOPIC)) {
ForceWriteCacheFile();
} else if (!strcmp(topic, CLEANUP_TOPIC)) {
Cleanup();
} else if (!strcmp(topic, CACHE_INVALIDATE_TOPIC)) {
InvalidateCache();
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
}
return NS_OK;
}
void
ScriptPreloader::FinishContentStartup()
{
MOZ_ASSERT(XRE_IsContentProcess());
#ifdef DEBUG
if (mContentStartupFinishedTopic.Equals(CONTENT_DOCUMENT_LOADED_TOPIC)) {
MOZ_ASSERT(sProcessType == ProcessType::Privileged);
} else {
MOZ_ASSERT(sProcessType != ProcessType::Privileged);
}
#endif /* DEBUG */
nsCOMPtr<nsIObserverService> obs = services::GetObserverService();
obs->RemoveObserver(this, mContentStartupFinishedTopic.get());
mSaveTimer = nullptr;
mStartupFinished = true;
if (mChildActor) {
mChildActor->SendScriptsAndFinalize(mScripts);
}
#ifdef XP_WIN
// Record the amount of USS at startup. This is Windows-only for now,
// we could turn it on for Linux relatively cheaply. On macOS it can have
// a perf impact.
mozilla::Telemetry::Accumulate(
mozilla::Telemetry::MEMORY_UNIQUE_CONTENT_STARTUP,
nsMemoryReporterManager::ResidentUnique() / 1024);
#endif
}
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
bool
ScriptPreloader::WillWriteScripts()
{
return Active() && (XRE_IsParentProcess() || mChildActor);
}
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
Result<nsCOMPtr<nsIFile>, nsresult>
ScriptPreloader::GetCacheFile(const nsAString& suffix)
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
{
NS_ENSURE_TRUE(mProfD, Err(NS_ERROR_NOT_INITIALIZED));
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
nsCOMPtr<nsIFile> cacheFile;
MOZ_TRY(mProfD->Clone(getter_AddRefs(cacheFile)));
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
MOZ_TRY(cacheFile->AppendNative(NS_LITERAL_CSTRING("startupCache")));
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
Unused << cacheFile->Create(nsIFile::DIRECTORY_TYPE, 0777);
MOZ_TRY(cacheFile->Append(mBaseName + suffix));
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
return std::move(cacheFile);
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
}
static const uint8_t MAGIC[] = "mozXDRcachev002";
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
Result<Ok, nsresult>
ScriptPreloader::OpenCache()
{
MOZ_TRY(NS_GetSpecialDirectory("ProfLDS", getter_AddRefs(mProfD)));
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
nsCOMPtr<nsIFile> cacheFile;
MOZ_TRY_VAR(cacheFile, GetCacheFile(NS_LITERAL_STRING(".bin")));
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
bool exists;
MOZ_TRY(cacheFile->Exists(&exists));
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
if (exists) {
MOZ_TRY(cacheFile->MoveTo(nullptr, mBaseName + NS_LITERAL_STRING("-current.bin")));
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
} else {
MOZ_TRY(cacheFile->SetLeafName(mBaseName + NS_LITERAL_STRING("-current.bin")));
MOZ_TRY(cacheFile->Exists(&exists));
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
if (!exists) {
return Err(NS_ERROR_FILE_NOT_FOUND);
}
}
MOZ_TRY(mCacheData.init(cacheFile));
return Ok();
}
// Opens the script cache file for this session, and initializes the script
// cache based on its contents. See WriteCache for details of the cache file.
Result<Ok, nsresult>
ScriptPreloader::InitCache(const nsAString& basePath)
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
{
mCacheInitialized = true;
mBaseName = basePath;
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
RegisterWeakMemoryReporter(this);
if (!XRE_IsParentProcess()) {
return Ok();
}
// Grab the compilation scope before initializing the URLPreloader, since
// it's not safe to run component loader code during its critical section.
AutoSafeJSAPI jsapi;
JS::RootedObject scope(jsapi.cx(), xpc::CompilationScope());
// Note: Code on the main thread *must not access Omnijar in any way* until
// this AutoBeginReading guard is destroyed.
URLPreloader::AutoBeginReading abr;
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
MOZ_TRY(OpenCache());
return InitCacheInternal(scope);
}
Result<Ok, nsresult>
ScriptPreloader::InitCache(const Maybe<ipc::FileDescriptor>& cacheFile, ScriptCacheChild* cacheChild)
{
MOZ_ASSERT(XRE_IsContentProcess());
mCacheInitialized = true;
mChildActor = cacheChild;
sProcessType = GetChildProcessType(dom::ContentChild::GetSingleton()->GetRemoteType());
nsCOMPtr<nsIObserverService> obs = services::GetObserverService();
MOZ_RELEASE_ASSERT(obs);
if (sProcessType == ProcessType::Privileged) {
// Since we control all of the documents loaded in the privileged
// content process, we can increase the window of active time for the
// ScriptPreloader to include the scripts that are loaded until the
// first document finishes loading.
mContentStartupFinishedTopic.AssignLiteral(CONTENT_DOCUMENT_LOADED_TOPIC);
} else {
// In the child process, we need to freeze the script cache before any
// untrusted code has been executed. The insertion of the first DOM
// document element may sometimes be earlier than is ideal, but at
// least it should always be safe.
mContentStartupFinishedTopic.AssignLiteral(DOC_ELEM_INSERTED_TOPIC);
}
obs->AddObserver(this, mContentStartupFinishedTopic.get(), false);
RegisterWeakMemoryReporter(this);
auto cleanup = MakeScopeExit([&] {
// If the parent is expecting cache data from us, make sure we send it
// before it writes out its cache file. For normal proceses, this isn't
// a concern, since they begin loading documents quite early. For the
// preloaded process, we may end up waiting a long time (or, indeed,
// never loading a document), so we need an additional timeout.
if (cacheChild) {
NS_NewTimerWithObserver(getter_AddRefs(mSaveTimer),
this, CHILD_STARTUP_TIMEOUT_MS,
nsITimer::TYPE_ONE_SHOT);
}
});
if (cacheFile.isNothing()){
return Ok();
}
MOZ_TRY(mCacheData.init(cacheFile.ref()));
return InitCacheInternal();
}
Result<Ok, nsresult>
ScriptPreloader::InitCacheInternal(JS::HandleObject scope)
{
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
auto size = mCacheData.size();
uint32_t headerSize;
if (size < sizeof(MAGIC) + sizeof(headerSize)) {
return Err(NS_ERROR_UNEXPECTED);
}
auto data = mCacheData.get<uint8_t>();
uint8_t* start = data.get();
MOZ_ASSERT(reinterpret_cast<uintptr_t>(start) % sizeof(XDRAlign) == 0);
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
auto end = data + size;
if (memcmp(MAGIC, data.get(), sizeof(MAGIC))) {
return Err(NS_ERROR_UNEXPECTED);
}
data += sizeof(MAGIC);
headerSize = LittleEndian::readUint32(data.get());
data += sizeof(headerSize);
if (data + headerSize > end) {
return Err(NS_ERROR_UNEXPECTED);
}
{
auto cleanup = MakeScopeExit([&] () {
mScripts.Clear();
});
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
LinkedList<CachedScript> scripts;
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
Range<uint8_t> header(data, data + headerSize);
data += headerSize;
InputBuffer buf(header);
size_t len = data.get() - start;
size_t alignLen = ComputeByteAlignment(len, sizeof(XDRAlign));
data += alignLen;
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
size_t offset = 0;
while (!buf.finished()) {
auto script = MakeUnique<CachedScript>(*this, buf);
MOZ_RELEASE_ASSERT(script);
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
auto scriptData = data + script->mOffset;
if (scriptData + script->mSize > end) {
return Err(NS_ERROR_UNEXPECTED);
}
// Make sure offsets match what we'd expect based on script ordering and
// size, as a basic sanity check.
if (script->mOffset != offset) {
return Err(NS_ERROR_UNEXPECTED);
}
offset += script->mSize;
MOZ_ASSERT(reinterpret_cast<uintptr_t>(scriptData.get()) % sizeof(XDRAlign) == 0);
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
script->mXDRRange.emplace(scriptData, scriptData + script->mSize);
// Don't pre-decode the script unless it was used in this process type during the
// previous session.
if (script->mOriginalProcessTypes.contains(CurrentProcessType())) {
scripts.insertBack(script.get());
} else {
script->mReadyToExecute = true;
}
mScripts.Put(script->mCachePath, script.get());
Unused << script.release();
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
}
if (buf.error()) {
return Err(NS_ERROR_UNEXPECTED);
}
mPendingScripts = std::move(scripts);
cleanup.release();
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
}
DecodeNextBatch(OFF_THREAD_FIRST_CHUNK_SIZE, scope);
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
return Ok();
}
void
ScriptPreloader::PrepareCacheWriteInternal()
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
{
MOZ_ASSERT(NS_IsMainThread());
mMonitor.AssertCurrentThreadOwns();
auto cleanup = MakeScopeExit([&] () {
if (mChildCache) {
mChildCache->PrepareCacheWrite();
}
});
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
if (mDataPrepared) {
return;
}
AutoSafeJSAPI jsapi;
bool found = false;
for (auto& script : IterHash(mScripts, Match<ScriptStatus::Saved>())) {
// Don't write any scripts that are also in the child cache. They'll be
// loaded from the child cache in that case, so there's no need to write
// them twice.
CachedScript* childScript = mChildCache ? mChildCache->mScripts.Get(script->mCachePath) : nullptr;
if (childScript && !childScript->mProcessTypes.isEmpty()) {
childScript->UpdateLoadTime(script->mLoadTime);
childScript->mProcessTypes += script->mProcessTypes;
script.Remove();
continue;
}
if (!(script->mProcessTypes == script->mOriginalProcessTypes)) {
// Note: EnumSet doesn't support operator!=, hence the weird form above.
found = true;
}
if (!script->mSize && !script->XDREncode(jsapi.cx())) {
script.Remove();
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
}
}
if (!found) {
mSaveComplete = true;
return;
}
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
mDataPrepared = true;
}
void
ScriptPreloader::PrepareCacheWrite()
{
MonitorAutoLock mal(mMonitor);
PrepareCacheWriteInternal();
}
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
// Writes out a script cache file for the scripts accessed during early
// startup in this session. The cache file is a little-endian binary file with
// the following format:
//
// - A uint32 containing the size of the header block.
//
// - A header entry for each file stored in the cache containing:
// - The URL that the script was originally read from.
// - Its cache key.
// - The offset of its XDR data within the XDR data block.
// - The size of its XDR data in the XDR data block.
// - A bit field describing which process types the script is used in.
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
//
// - A block of XDR data for the encoded scripts, with each script's data at
// an offset from the start of the block, as specified above.
Result<Ok, nsresult>
ScriptPreloader::WriteCache()
{
MOZ_ASSERT(!NS_IsMainThread());
if (!mDataPrepared && !mSaveComplete) {
MOZ_ASSERT(!mBlockedOnSyncDispatch);
mBlockedOnSyncDispatch = true;
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
MonitorAutoUnlock mau(mSaveMonitor);
NS_DispatchToMainThread(
NewRunnableMethod("ScriptPreloader::PrepareCacheWrite",
this,
&ScriptPreloader::PrepareCacheWrite),
NS_DISPATCH_SYNC);
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
}
mBlockedOnSyncDispatch = false;
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
if (mSaveComplete) {
// If we don't have anything we need to save, we're done.
return Ok();
}
nsCOMPtr<nsIFile> cacheFile;
MOZ_TRY_VAR(cacheFile, GetCacheFile(NS_LITERAL_STRING("-new.bin")));
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
bool exists;
MOZ_TRY(cacheFile->Exists(&exists));
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
if (exists) {
MOZ_TRY(cacheFile->Remove(false));
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
}
{
AutoFDClose fd;
MOZ_TRY(cacheFile->OpenNSPRFileDesc(PR_WRONLY | PR_CREATE_FILE, 0644, &fd.rwget()));
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
// We also need to hold mMonitor while we're touching scripts in
// mScripts, or they may be freed before we're done with them.
mMonitor.AssertNotCurrentThreadOwns();
MonitorAutoLock mal(mMonitor);
nsTArray<CachedScript*> scripts;
for (auto& script : IterHash(mScripts, Match<ScriptStatus::Saved>())) {
scripts.AppendElement(script);
}
// Sort scripts by load time, with async loaded scripts before sync scripts.
// Since async scripts are always loaded immediately at startup, it helps to
// have them stored contiguously.
scripts.Sort(CachedScript::Comparator());
OutputBuffer buf;
size_t offset = 0;
for (auto script : scripts) {
MOZ_ASSERT(offset % sizeof(XDRAlign) == 0);
script->mOffset = offset;
script->Code(buf);
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
offset += script->mSize;
}
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
uint8_t headerSize[4];
LittleEndian::writeUint32(headerSize, buf.cursor());
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
size_t len = 0;
MOZ_TRY(Write(fd, MAGIC, sizeof(MAGIC)));
len += sizeof(MAGIC);
MOZ_TRY(Write(fd, headerSize, sizeof(headerSize)));
len += sizeof(headerSize);
MOZ_TRY(Write(fd, buf.Get(), buf.cursor()));
len += buf.cursor();
size_t alignLen = ComputeByteAlignment(len, sizeof(XDRAlign));
if (alignLen) {
MOZ_TRY(Write(fd, sAlignPadding, alignLen));
len += alignLen;
}
for (auto script : scripts) {
MOZ_ASSERT(script->mSize % sizeof(XDRAlign) == 0);
MOZ_TRY(Write(fd, script->Range().begin().get(), script->mSize));
len += script->mSize;
if (script->mScript) {
script->FreeData();
}
}
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
}
MOZ_TRY(cacheFile->MoveTo(nullptr, mBaseName + NS_LITERAL_STRING(".bin")));
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
return Ok();
}
// Runs in the mSaveThread thread, and writes out the cache file for the next
// session after a reasonable delay.
nsresult
ScriptPreloader::Run()
{
MonitorAutoLock mal(mSaveMonitor);
// Ideally wait about 10 seconds before saving, to avoid unnecessary IO
// during early startup. But only if the cache hasn't been invalidated,
// since that can trigger a new write during shutdown, and we don't want to
// cause shutdown hangs.
if (!mCacheInvalidated) {
mal.Wait(TimeDuration::FromSeconds(10));
}
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
auto result = URLPreloader::GetSingleton().WriteCache();
Unused << NS_WARN_IF(result.isErr());
result = WriteCache();
Unused << NS_WARN_IF(result.isErr());
result = mChildCache->WriteCache();
Unused << NS_WARN_IF(result.isErr());
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
mSaveComplete = true;
NS_ReleaseOnMainThreadSystemGroup("ScriptPreloader::mSaveThread",
mSaveThread.forget());
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
mal.NotifyAll();
return NS_OK;
}
void
ScriptPreloader::NoteScript(const nsCString& url, const nsCString& cachePath,
JS::HandleScript jsscript, bool isRunOnce)
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
{
if (!Active()) {
if (isRunOnce) {
if (auto script = mScripts.Get(cachePath)) {
script->mIsRunOnce = true;
script->MaybeDropScript();
}
}
return;
}
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
// Don't bother trying to cache any URLs with cache-busting query
// parameters.
if (cachePath.FindChar('?') >= 0) {
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
return;
}
// Don't bother caching files that belong to the mochitest harness.
NS_NAMED_LITERAL_CSTRING(mochikitPrefix, "chrome://mochikit/");
if (StringHead(url, mochikitPrefix.Length()) == mochikitPrefix) {
return;
}
auto script = mScripts.LookupOrAdd(cachePath, *this, url, cachePath, jsscript);
if (isRunOnce) {
script->mIsRunOnce = true;
}
if (!script->MaybeDropScript() && !script->mScript) {
MOZ_ASSERT(jsscript);
script->mScript = jsscript;
script->mReadyToExecute = true;
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
}
script->UpdateLoadTime(TimeStamp::Now());
script->mProcessTypes += CurrentProcessType();
}
void
ScriptPreloader::NoteScript(const nsCString& url, const nsCString& cachePath,
ProcessType processType, nsTArray<uint8_t>&& xdrData,
TimeStamp loadTime)
{
// After data has been prepared, there's no point in noting further scripts,
// since the cache either has already been written, or is about to be
// written. Any time prior to the data being prepared, we can safely mutate
// mScripts without locking. After that point, the save thread is free to
// access it, and we can't alter it without locking.
if (mDataPrepared) {
return;
}
auto script = mScripts.LookupOrAdd(cachePath, *this, url, cachePath, nullptr);
if (!script->HasRange()) {
MOZ_ASSERT(!script->HasArray());
script->mSize = xdrData.Length();
script->mXDRData.construct<nsTArray<uint8_t>>(std::forward<nsTArray<uint8_t>>(xdrData));
auto& data = script->Array();
script->mXDRRange.emplace(data.Elements(), data.Length());
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
}
if (!script->mSize && !script->mScript) {
// If the content process is sending us a script entry for a script
// which was in the cache at startup, it expects us to already have this
// script data, so it doesn't send it.
//
// However, the cache may have been invalidated at this point (usually
// due to the add-on manager installing or uninstalling a legacy
// extension during very early startup), which means we may no longer
// have an entry for this script. Since that means we have no data to
// write to the new cache, and no JSScript to generate it from, we need
// to discard this entry.
mScripts.Remove(cachePath);
return;
}
script->UpdateLoadTime(loadTime);
script->mProcessTypes += processType;
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
}
JSScript*
ScriptPreloader::GetCachedScript(JSContext* cx, const nsCString& path)
{
// If a script is used by both the parent and the child, it's stored only
// in the child cache.
if (mChildCache) {
auto script = mChildCache->GetCachedScript(cx, path);
if (script) {
return script;
}
}
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
auto script = mScripts.Get(path);
if (script) {
return WaitForCachedScript(cx, script);
}
return nullptr;
}
JSScript*
ScriptPreloader::WaitForCachedScript(JSContext* cx, CachedScript* script)
{
// Check for finished operations before locking so that we can move onto
// decoding the next batch as soon as possible after the pending batch is
// ready. If we wait until we hit an unfinished script, we wind up having at
// most one batch of buffered scripts, and occasionally under-running that
// buffer.
MaybeFinishOffThreadDecode();
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
if (!script->mReadyToExecute) {
LOG(Info, "Must wait for async script load: %s\n", script->mURL.get());
auto start = TimeStamp::Now();
mMonitor.AssertNotCurrentThreadOwns();
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
MonitorAutoLock mal(mMonitor);
// Check for finished operations again *after* locking, or we may race
// against mToken being set between our last check and the time we
// entered the mutex.
MaybeFinishOffThreadDecode();
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
if (!script->mReadyToExecute && script->mSize < MAX_MAINTHREAD_DECODE_SIZE) {
LOG(Info, "Script is small enough to recompile on main thread\n");
script->mReadyToExecute = true;
} else {
while (!script->mReadyToExecute) {
mal.Wait();
MonitorAutoUnlock mau(mMonitor);
MaybeFinishOffThreadDecode();
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
}
}
LOG(Debug, "Waited %fms\n", (TimeStamp::Now() - start).ToMilliseconds());
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
}
return script->GetJSScript(cx);
}
/* static */ void
ScriptPreloader::OffThreadDecodeCallback(JS::OffThreadToken* token, void* context)
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
{
auto cache = static_cast<ScriptPreloader*>(context);
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
cache->mMonitor.AssertNotCurrentThreadOwns();
MonitorAutoLock mal(cache->mMonitor);
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
// First notify any tasks that are already waiting on scripts, since they'll
// be blocking the main thread, and prevent any runnables from executing.
cache->mToken = token;
mal.NotifyAll();
// If nothing processed the token, and we don't already have a pending
// runnable, then dispatch a new one to finish the processing on the main
// thread as soon as possible.
if (cache->mToken && !cache->mFinishDecodeRunnablePending) {
cache->mFinishDecodeRunnablePending = true;
NS_DispatchToMainThread(
NewRunnableMethod("ScriptPreloader::DoFinishOffThreadDecode",
cache,
&ScriptPreloader::DoFinishOffThreadDecode));
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
}
}
void
ScriptPreloader::FinishPendingParses(MonitorAutoLock& aMal)
{
mMonitor.AssertCurrentThreadOwns();
mPendingScripts.clear();
MaybeFinishOffThreadDecode();
// Loop until all pending decode operations finish.
while (!mParsingScripts.empty()) {
aMal.Wait();
MaybeFinishOffThreadDecode();
}
}
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
void
ScriptPreloader::DoFinishOffThreadDecode()
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
{
mFinishDecodeRunnablePending = false;
MaybeFinishOffThreadDecode();
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
}
void
ScriptPreloader::MaybeFinishOffThreadDecode()
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
{
if (!mToken) {
return;
}
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
auto cleanup = MakeScopeExit([&] () {
mToken = nullptr;
mParsingSources.clear();
mParsingScripts.clear();
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
DecodeNextBatch(OFF_THREAD_CHUNK_SIZE);
});
AutoSafeJSAPI jsapi;
JSContext* cx = jsapi.cx();
JSAutoRealm ar(cx, xpc::CompilationScope());
JS::Rooted<JS::ScriptVector> jsScripts(cx, JS::ScriptVector(cx));
// If this fails, we still need to mark the scripts as finished. Any that
// weren't successfully compiled in this operation (which should never
// happen under ordinary circumstances) will be re-decoded on the main
// thread, and raise the appropriate errors when they're executed.
//
// The exception from the off-thread decode operation will be reported when
// we pop the AutoJSAPI off the stack.
Unused << JS::FinishMultiOffThreadScriptsDecoder(cx, mToken, &jsScripts);
unsigned i = 0;
for (auto script : mParsingScripts) {
LOG(Debug, "Finished off-thread decode of %s\n", script->mURL.get());
if (i < jsScripts.length()) {
script->mScript = jsScripts[i++];
}
script->mReadyToExecute = true;
}
}
void
ScriptPreloader::DecodeNextBatch(size_t chunkSize, JS::HandleObject scope)
{
MOZ_ASSERT(mParsingSources.length() == 0);
MOZ_ASSERT(mParsingScripts.length() == 0);
auto cleanup = MakeScopeExit([&] () {
mParsingScripts.clearAndFree();
mParsingSources.clearAndFree();
});
auto start = TimeStamp::Now();
LOG(Debug, "Off-thread decoding scripts...\n");
size_t size = 0;
for (CachedScript* next = mPendingScripts.getFirst(); next;) {
auto script = next;
next = script->getNext();
// Skip any scripts that we decoded on the main thread rather than
// waiting for an off-thread operation to complete.
if (script->mReadyToExecute) {
script->remove();
continue;
}
// If we have enough data for one chunk and this script would put us
// over our chunk size limit, we're done.
if (size > SMALL_SCRIPT_CHUNK_THRESHOLD &&
size + script->mSize > chunkSize) {
break;
}
if (!mParsingScripts.append(script) ||
!mParsingSources.emplaceBack(script->Range(), script->mURL.get(), 0)) {
break;
}
LOG(Debug, "Beginning off-thread decode of script %s (%u bytes)\n",
script->mURL.get(), script->mSize);
script->remove();
size += script->mSize;
}
if (size == 0 && mPendingScripts.isEmpty()) {
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
return;
}
AutoSafeJSAPI jsapi;
JSContext* cx = jsapi.cx();
JSAutoRealm ar(cx, scope ? scope : xpc::CompilationScope());
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
JS::CompileOptions options(cx);
options.setNoScriptRval(true)
.setSourceIsLazy(true);
if (!JS::CanCompileOffThread(cx, options, size) ||
!JS::DecodeMultiOffThreadScripts(cx, options, mParsingSources,
OffThreadDecodeCallback,
static_cast<void*>(this))) {
// If we fail here, we don't move on to process the next batch, so make
// sure we don't have any other scripts left to process.
MOZ_ASSERT(mPendingScripts.isEmpty());
for (auto script : mPendingScripts) {
script->mReadyToExecute = true;
}
LOG(Info, "Can't decode %lu bytes of scripts off-thread", (unsigned long)size);
for (auto script : mParsingScripts) {
script->mReadyToExecute = true;
}
return;
}
cleanup.release();
LOG(Debug, "Initialized decoding of %u scripts (%u bytes) in %fms\n",
(unsigned)mParsingSources.length(), (unsigned)size, (TimeStamp::Now() - start).ToMilliseconds());
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
}
ScriptPreloader::CachedScript::CachedScript(ScriptPreloader& cache, InputBuffer& buf)
: mCache(cache)
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
{
Code(buf);
// Swap the mProcessTypes and mOriginalProcessTypes values, since we want to
// start with an empty set of processes loaded into for this session, and
// compare against last session's values later.
mOriginalProcessTypes = mProcessTypes;
mProcessTypes = {};
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
}
bool
ScriptPreloader::CachedScript::XDREncode(JSContext* cx)
{
auto cleanup = MakeScopeExit([&] () {
MaybeDropScript();
});
JSAutoRealm ar(cx, mScript);
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
JS::RootedScript jsscript(cx, mScript);
mXDRData.construct<JS::TranscodeBuffer>();
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
JS::TranscodeResult code = JS::EncodeScript(cx, Buffer(), jsscript);
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
if (code == JS::TranscodeResult_Ok) {
mXDRRange.emplace(Buffer().begin(), Buffer().length());
mSize = Range().length();
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
return true;
}
mXDRData.destroy();
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
JS_ClearPendingException(cx);
return false;
}
JSScript*
ScriptPreloader::CachedScript::GetJSScript(JSContext* cx)
{
MOZ_ASSERT(mReadyToExecute);
if (mScript) {
return mScript;
}
if (!HasRange()) {
// We've already executed the script, and thrown it away. But it wasn't
// in the cache at startup, so we don't have any data to decode. Give
// up.
return nullptr;
}
// If we have no script at this point, the script was too small to decode
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
// off-thread, or it was needed before the off-thread compilation was
// finished, and is small enough to decode on the main thread rather than
// wait for the off-thread decoding to finish. In either case, we decode
// it synchronously the first time it's needed.
auto start = TimeStamp::Now();
LOG(Info, "Decoding script %s on main thread...\n", mURL.get());
JS::RootedScript script(cx);
if (JS::DecodeScript(cx, Range(), &script)) {
mScript = script;
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
if (mCache.mSaveComplete) {
FreeData();
}
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
}
LOG(Debug, "Finished decoding in %fms", (TimeStamp::Now() - start).ToMilliseconds());
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 19:24:22 +00:00
return mScript;
}
NS_IMPL_ISUPPORTS(ScriptPreloader, nsIObserver, nsIRunnable, nsIMemoryReporter)
#undef LOG
} // namespace mozilla