Prior to this patch, the startupcache created its own mWriteThread off which it
wrote to disk. It's initialized by MaybeSpawnWriteThread, which got called
at shutdown, to do the shutdown write if there was any reason to do so, and
from a timer that is re-initialized after every addition to the startup cache,
to run 60s after the last change to the cache.
It then joined that write thread on the main thread (in other words, blocks
on that off-main-thread write completing from the main thread) when:
- xpcom-shutdown fired
- the startupcache itself gets destroyed
- someone calls any of:
* HasEntry
* GetBuffer
* PutBuffer
* InvalidateCache
This patch removes the separate write thread, and instead dispatches a task to
the background task queue, indicating it can block. The task is started in
the same circumstances where we previously used to write (timer from the last
PutBuffer call, and shutdown if necessary).
To ensure it cannot be trying to use the data it writes out (mTable) from
the other thread while that data changes on the main thread, we use a mutex.
The task locks the mutex before starting, and unlocks when finished.
Enumerating the cases that we used to block on joining the thread:
In terms of application shutdown, we expect the background task queue to
either finish the write task, or fail to run it if it hasn't started it yet.
In the FastStartup case, we check if a write was necessary; if so, we
attempt to gain the lock without waiting. If we're successful, the write has
not yet started, and we instead run the write on the main thread. Otherwise,
we retry gaining the lock, blocking this time, thus guaranteeing the
off-the-main-thread write completes.
The task keeps a reference to the startupcache object, so it cannot be
destroyed while the task is pending.
Because the write does not modify `mTable`, and neither does `HasEntry`,
we do not need to do anything there.
In the `GetBuffer` case, we do not modify the table unless we have to read
the entry off disk (memmapped into `mCacheData`). This can only happen if
`mCacheData.initialized()` returns true, and we specifically call
`mCacheData.reset()` before firing off the write task to avoid this.
`mCacheData` is only re-initialized if someone calls `LoadArchive()`,
which can only happen from `Init()` (which is guaranteed not to run
again because this is a singleton), or `InvalidateCache()`, where we lock
the mutex (see below). So this is safe - but we assert on the lock to try
and avoid people breaking this chain of assumptions in the future.
When `PutBuffer` is called, we try to lock the mutex - but if locking fails
(ie the background thread is writing), we simply fail to store the entry
in the startupcache. In practice, this should be rare - it'd happen if
new calls to PutBuffer happen while writing during shutdown (when really,
we don't care) or when it's been 60 seconds since the last PutBuffer so
we started writing the startupcache.
When InvalidateCache is called, we lock the mutex - we shouldn't try to
write while invalidating, or invalidate while writing. This may be slow,
but in practice nothing should call `InvalidateCache` except developer
restarts or the `-purgecaches` commandline flag, so it shouldn't
matter a great deal.
Differential Revision: https://phabricator.services.mozilla.com/D70413
--HG--
extra : moz-landing-system : lando
Please double check that I am using this correctly. I believe we are
seeing the crash in the linked bug because we are not handling hardware
faults when reading from the memory mapped file. This patch just wraps
all accesses in the MMAP_FAULT_HANDLER_ macros.
Depends on D53042
Differential Revision: https://phabricator.services.mozilla.com/D53043
--HG--
rename : modules/libjar/MmapFaultHandler.cpp => mozglue/misc/MmapFaultHandler.cpp
rename : modules/libjar/MmapFaultHandler.h => mozglue/misc/MmapFaultHandler.h
extra : moz-landing-system : lando
This is not, AFAICT, causing crashes - it is just an oversight noticed while
investigating the crash in the bug.
Differential Revision: https://phabricator.services.mozilla.com/D53042
--HG--
extra : moz-landing-system : lando
Adds a new startup cache info service to expose some of the internal
state of the startup cache.
Differential Revision: https://phabricator.services.mozilla.com/D69298
--HG--
extra : moz-landing-system : lando
Adds a new startup cache info service to expose some of the internal
state of the startup cache.
Differential Revision: https://phabricator.services.mozilla.com/D69298
--HG--
extra : moz-landing-system : lando
Please double check that I am using this correctly. I believe we are
seeing the crash in the linked bug because we are not handling hardware
faults when reading from the memory mapped file. This patch just wraps
all accesses in the MMAP_FAULT_HANDLER_ macros.
Depends on D53042
Differential Revision: https://phabricator.services.mozilla.com/D53043
--HG--
rename : modules/libjar/MmapFaultHandler.cpp => mozglue/misc/MmapFaultHandler.cpp
rename : modules/libjar/MmapFaultHandler.h => mozglue/misc/MmapFaultHandler.h
extra : moz-landing-system : lando
This is not, AFAICT, causing crashes - it is just an oversight noticed while
investigating the crash in the bug.
Differential Revision: https://phabricator.services.mozilla.com/D53042
--HG--
extra : moz-landing-system : lando
The exact circumstances of how this is showing up in the wild aren't
clear - there seem to be a couple of ways we can get here. However it
all revolves around early shutdowns (i.e., from the select profile popup)
- before the StartupCache is ever initialized. In any case, the solution
shouldn't change based on the exact circumstances - if we don't have a
StartupCache, there's no need to write one. Also, don't bother lazy
initializing it if it doesn't exist yet.
Differential Revision: https://phabricator.services.mozilla.com/D63208
--HG--
extra : moz-landing-system : lando
Reordering the mWrittenOnce check should be sufficient to eliminate
the data race; however, I made mWrittenOnce an atomic just to reduce
the fragility of this since it is intended to be written from and
read to on multiple threads.
Differential Revision: https://phabricator.services.mozilla.com/D62949
--HG--
extra : moz-landing-system : lando
This was just left in, and does not need to be here. We want to be
spawning the background thread here which we will wait on from
xpcom-shutdown.
Differential Revision: https://phabricator.services.mozilla.com/D62848
--HG--
extra : moz-landing-system : lando
This was just left in, and does not need to be here. We want to be
spawning the background thread here which we will wait on from
xpcom-shutdown.
Differential Revision: https://phabricator.services.mozilla.com/D62848
--HG--
extra : moz-landing-system : lando
The initial thought for getting the StartupCache out of the shutdown
path was to simply not write it during shutdown, as it should write
60 seconds after startup, and the theory was that if the user shut
down within the first 60 seconds of use, they were likely updating or
something and it shouldn't matter. However, considering how many of
our users only ever open one tab, I think it's rather likely that
users are starting up firefox to go to a web site, then closing it
when done with that website, and then maybe opening up a new instance
in order to go to a different website. Accordingly it still makes
sense to continue writing it. However, we may as well leverage a
background thread for this and get it kicked off earlier during
shutdown, so we don't sit there blocking in the destructor late
during shutdown.
Differential Revision: https://phabricator.services.mozilla.com/D62294
--HG--
extra : source : 7b7b147b6955cee07e0c115993446bfbd59cf7e2
extra : histedit_source : 6990122d6b1ac4939b0e4b0a5e452183fb981e19
The initial thought for getting the StartupCache out of the shutdown
path was to simply not write it during shutdown, as it should write
60 seconds after startup, and the theory was that if the user shut
down within the first 60 seconds of use, they were likely updating or
something and it shouldn't matter. However, considering how many of
our users only ever open one tab, I think it's rather likely that
users are starting up firefox to go to a web site, then closing it
when done with that website, and then maybe opening up a new instance
in order to go to a different website. Accordingly it still makes
sense to continue writing it. However, we may as well leverage a
background thread for this and get it kicked off earlier during
shutdown, so we don't sit there blocking in the destructor late
during shutdown.
Differential Revision: https://phabricator.services.mozilla.com/D62294
--HG--
extra : moz-landing-system : lando
The initial thought for getting the StartupCache out of the shutdown
path was to simply not write it during shutdown, as it should write
60 seconds after startup, and the theory was that if the user shut
down within the first 60 seconds of use, they were likely updating or
something and it shouldn't matter. However, considering how many of
our users only ever open one tab, I think it's rather likely that
users are starting up firefox to go to a web site, then closing it
when done with that website, and then maybe opening up a new instance
in order to go to a different website. Accordingly it still makes
sense to continue writing it. However, we may as well leverage a
background thread for this and get it kicked off earlier during
shutdown, so we don't sit there blocking in the destructor late
during shutdown.
Differential Revision: https://phabricator.services.mozilla.com/D62294
--HG--
extra : moz-landing-system : lando
We can't guarantee uniqueness of the keys here because the file might
be corrupt. So we should check if the key exists and if it does, bail
out.
Differential Revision: https://phabricator.services.mozilla.com/D58387
--HG--
extra : moz-landing-system : lando
The inclusions were removed with the following very crude script and the
resulting breakage was fixed up by hand. The manual fixups did either
revert the changes done by the script, replace a generic header with a more
specific one or replace a header with a forward declaration.
find . -name "*.idl" | grep -v web-platform | grep -v third_party | while read path; do
interfaces=$(grep "^\(class\|interface\).*:.*" "$path" | cut -d' ' -f2)
if [ -n "$interfaces" ]; then
if [[ "$interfaces" == *$'\n'* ]]; then
regexp="\("
for i in $interfaces; do regexp="$regexp$i\|"; done
regexp="${regexp%%\\\|}\)"
else
regexp="$interfaces"
fi
interface=$(basename "$path")
rg -l "#include.*${interface%%.idl}.h" . | while read path2; do
hits=$(grep -v "#include.*${interface%%.idl}.h" "$path2" | grep -c "$regexp" )
if [ $hits -eq 0 ]; then
echo "Removing ${interface} from ${path2}"
grep -v "#include.*${interface%%.idl}.h" "$path2" > "$path2".tmp
mv -f "$path2".tmp "$path2"
fi
done
fi
done
Differential Revision: https://phabricator.services.mozilla.com/D55444
--HG--
extra : moz-landing-system : lando
There is no behavior change here - these asserts were previously being hit
inside the operator+ of RangedPtr, but this makes them explicit.
Differential Revision: https://phabricator.services.mozilla.com/D49985
--HG--
extra : moz-landing-system : lando
In bug 1550108 we added a thread to prefetch the contents of the
startup cache file, and as part of that we reduced its stack size
and the stack size of the thread which writes the file. However
it seems we set the write thread's stack size too low.
Differential Revision: https://phabricator.services.mozilla.com/D48570
--HG--
extra : moz-landing-system : lando
Does what it says on the tin. Once we have a central scheduling
system this should likely just consume that.
Differential Revision: https://phabricator.services.mozilla.com/D35454
--HG--
extra : moz-landing-system : lando
The first run loads more things into the StartupCache than are
used on the second and subsequent runs. This just ensures that if
the StartupCache diverges too far from its actual use that we will
rebuild it.
Differential Revision: https://phabricator.services.mozilla.com/D34654
--HG--
extra : moz-landing-system : lando
The signatures were updated in the previous patch to hand us the raw,
uncopied buffers. This just adjusts the callsites to match.
Differential Revision: https://phabricator.services.mozilla.com/D34653
--HG--
extra : moz-landing-system : lando
I am not aware of anything that depends on StartupCache being a
zip file, and since I want to use lz4 compression because inflate
is showing up quite a lot in profiles, it's simplest to just use
a custom format. This loosely mimicks the ScriptPreloader code,
with a few diversions:
- Obviously the contents of the cache are compressed. I used lz4
for this as I hit the same file size as deflate at a compression
level of 1, which is what the StartupCache was using previously,
while decompressing an order of magnitude faster. Seemed like
the most conservative change to make. I think it's worth
investigating what the impact of slower algs with higher ratios
would be, but for right now I settled on this. We'd probably
want to look at zstd next.
- I use streaming compression for this via lz4frame. This is not
strictly necessary, but has the benefit of not requiring as
much memory for large buffers, as well as giving us a built-in
checksum, rather than relying on the much slower CRC that we
were doing with the zip-based approach.
- I coded the serialization of the headers inline, since I had to
jump back to add the offset and compressed size, which would
make the nice Code(...) method for the ScriptPreloader stuff
rather more complex. Open to cleaner solutions, but moving it
out just felt like extra hoops for the reader to jump through
to understand without the benefit of being more concise.
Differential Revision: https://phabricator.services.mozilla.com/D34652
--HG--
extra : moz-landing-system : lando
This will not behave exactly the same if we had previously written bad
data for the entry that would fail to decompress. I imagine this is rare
enough, and the consequences are not severe enough, that this should be
fine.
Differential Revision: https://phabricator.services.mozilla.com/D30643
--HG--
extra : moz-landing-system : lando
I thought I had already written out the patch to remove these, but
apparently not. Per discussion in the startup cache telemetry bug,
there should be no reason for doing this.
Differential Revision: https://phabricator.services.mozilla.com/D31491
--HG--
extra : moz-landing-system : lando
Does what it says on the tin. Once we have a central scheduling
system this should likely just consume that.
Differential Revision: https://phabricator.services.mozilla.com/D35454
--HG--
extra : moz-landing-system : lando
The first run loads more things into the StartupCache than are
used on the second and subsequent runs. This just ensures that if
the StartupCache diverges too far from its actual use that we will
rebuild it.
Differential Revision: https://phabricator.services.mozilla.com/D34654
--HG--
extra : moz-landing-system : lando
The signatures were updated in the previous patch to hand us the raw,
uncopied buffers. This just adjusts the callsites to match.
Differential Revision: https://phabricator.services.mozilla.com/D34653
--HG--
extra : moz-landing-system : lando
I am not aware of anything that depends on StartupCache being a
zip file, and since I want to use lz4 compression because inflate
is showing up quite a lot in profiles, it's simplest to just use
a custom format. This loosely mimicks the ScriptPreloader code,
with a few diversions:
- Obviously the contents of the cache are compressed. I used lz4
for this as I hit the same file size as deflate at a compression
level of 1, which is what the StartupCache was using previously,
while decompressing an order of magnitude faster. Seemed like
the most conservative change to make. I think it's worth
investigating what the impact of slower algs with higher ratios
would be, but for right now I settled on this. We'd probably
want to look at zstd next.
- I use streaming compression for this via lz4frame. This is not
strictly necessary, but has the benefit of not requiring as
much memory for large buffers, as well as giving us a built-in
checksum, rather than relying on the much slower CRC that we
were doing with the zip-based approach.
- I coded the serialization of the headers inline, since I had to
jump back to add the offset and compressed size, which would
make the nice Code(...) method for the ScriptPreloader stuff
rather more complex. Open to cleaner solutions, but moving it
out just felt like extra hoops for the reader to jump through
to understand without the benefit of being more concise.
Differential Revision: https://phabricator.services.mozilla.com/D34652
--HG--
extra : moz-landing-system : lando
This will not behave exactly the same if we had previously written bad
data for the entry that would fail to decompress. I imagine this is rare
enough, and the consequences are not severe enough, that this should be
fine.
Differential Revision: https://phabricator.services.mozilla.com/D30643
--HG--
extra : moz-landing-system : lando