The overall goal of this patch is to make the StartupCache accessible anywhere.
There's two main pieces to that equation:
1. Allowing it to be accessed off main thread, which means modifying the
mutex usage to ensure that all data accessed from non-main threads is
protected.
2. Allowing it to be accessed out of the chrome process, which means passing
a handle to a shared cache buffer down to child processes.
Number 1 is somewhat fiddly, but it's all generally straightforward work. I'll
hope that the comments and the code are sufficient to explain what's going on
there.
Number 2 has some decisions to be made:
- The first decision was to pass a handle to a frozen chunk of memory down to
all child processes, rather than passing a handle to an actual file. There's
two reasons for this: 1) since we want to compress the underlying file on
disk, giving that file to child processes would mean they have to decompress
it themselves, eating CPU time. 2) since they would have to decompress it
themselves, they would have to allocate the memory for the decompressed
buffers, meaning they cannot all simply share one big decompressed buffer.
- The drawback of this decision is that we have to load and decompress the
buffer up front, before we spawn any child processes. We attempt to
mitigate this by keeping track of all the entries that child processes
access, and only including those in the frozen decompressed shared buffer.
- We base our implementation of this approach off of the shared preferences
implementation. Hopefully I got all of the pieces to fit together
correctly. They seem to work in local testing and on try, but I think
they require a set of experienced eyes looking carefully at them.
- Another decision was whether to send the handles to the buffers over IPC or
via command line. We went with the command line approach, because the startup
cache would need to be accessed very early on in order to ensure we do not
read from any omnijars, and we could not make that work via IPC.
- Unfortunately this means adding another hard-coded FD, similar to
kPrefMapFileDescriptor. It seems like at the very least we need to rope all
of these together into one place, but I think that should be filed as a
follow-up?
Lastly, because this patch is a bit of a monster to review - first, thank you
for looking at it, and second, the reason we're invested in this is because we
saw a >10% improvement in cold startup times on reference hardware, with a p
value less than 0.01. It's still not abundantly clear how reference hardware
numbers translate to numbers on release, and they certainly don't translate
well to Nightly numbers, but it's enough to convince me that it's worth some
effort.
Depends on D78584
Differential Revision: https://phabricator.services.mozilla.com/D77635
We need to be able to init StartupCache before the Omnijar in order to cache
all of the Omnijar contents we access. This patch implements that.
Depends on D77632
Differential Revision: https://phabricator.services.mozilla.com/D77633
We would like to be able to defer opening the omnijar files until after startup
if the StartupCache has already been populated. Opening the omnijar files takes
a nontrivial time, at least on Windows, and almost everything in the omnijar
should be fairly compressible, and thus makes sense to live in the StartupCache.
See the last patch in this series for a little more discussion on numbers, but
tl;dr: we saw a 12% improvement in time to about:home being finished on reference
hardware with these changes together with the changes from the descendant patches.
Differential Revision: https://phabricator.services.mozilla.com/D77632
The overall goal of this patch is to make the StartupCache accessible anywhere.
There's two main pieces to that equation:
1. Allowing it to be accessed off main thread, which means modifying the
mutex usage to ensure that all data accessed from non-main threads is
protected.
2. Allowing it to be accessed out of the chrome process, which means passing
a handle to a shared cache buffer down to child processes.
Number 1 is somewhat fiddly, but it's all generally straightforward work. I'll
hope that the comments and the code are sufficient to explain what's going on
there.
Number 2 has some decisions to be made:
- The first decision was to pass a handle to a frozen chunk of memory down to
all child processes, rather than passing a handle to an actual file. There's
two reasons for this: 1) since we want to compress the underlying file on
disk, giving that file to child processes would mean they have to decompress
it themselves, eating CPU time. 2) since they would have to decompress it
themselves, they would have to allocate the memory for the decompressed
buffers, meaning they cannot all simply share one big decompressed buffer.
- The drawback of this decision is that we have to load and decompress the
buffer up front, before we spawn any child processes. We attempt to
mitigate this by keeping track of all the entries that child processes
access, and only including those in the frozen decompressed shared buffer.
- We base our implementation of this approach off of the shared preferences
implementation. Hopefully I got all of the pieces to fit together
correctly. They seem to work in local testing and on try, but I think
they require a set of experienced eyes looking carefully at them.
- Another decision was whether to send the handles to the buffers over IPC or
via command line. We went with the command line approach, because the startup
cache would need to be accessed very early on in order to ensure we do not
read from any omnijars, and we could not make that work via IPC.
- Unfortunately this means adding another hard-coded FD, similar to
kPrefMapFileDescriptor. It seems like at the very least we need to rope all
of these together into one place, but I think that should be filed as a
follow-up?
Lastly, because this patch is a bit of a monster to review - first, thank you
for looking at it, and second, the reason we're invested in this is because we
saw a >10% improvement in cold startup times on reference hardware, with a p
value less than 0.01. It's still not abundantly clear how reference hardware
numbers translate to numbers on release, and they certainly don't translate
well to Nightly numbers, but it's enough to convince me that it's worth some
effort.
Depends on D78584
Differential Revision: https://phabricator.services.mozilla.com/D77635
We need to be able to init StartupCache before the Omnijar in order to cache
all of the Omnijar contents we access. This patch implements that.
Depends on D77632
Differential Revision: https://phabricator.services.mozilla.com/D77633
We would like to be able to defer opening the omnijar files until after startup
if the StartupCache has already been populated. Opening the omnijar files takes
a nontrivial time, at least on Windows, and almost everything in the omnijar
should be fairly compressible, and thus makes sense to live in the StartupCache.
See the last patch in this series for a little more discussion on numbers, but
tl;dr: we saw a 12% improvement in time to about:home being finished on reference
hardware with these changes together with the changes from the descendant patches.
Differential Revision: https://phabricator.services.mozilla.com/D77632
The overall goal of this patch is to make the StartupCache accessible anywhere.
There's two main pieces to that equation:
1. Allowing it to be accessed off main thread, which means modifying the
mutex usage to ensure that all data accessed from non-main threads is
protected.
2. Allowing it to be accessed out of the chrome process, which means passing
a handle to a shared cache buffer down to child processes.
Number 1 is somewhat fiddly, but it's all generally straightforward work. I'll
hope that the comments and the code are sufficient to explain what's going on
there.
Number 2 has some decisions to be made:
- The first decision was to pass a handle to a frozen chunk of memory down to
all child processes, rather than passing a handle to an actual file. There's
two reasons for this: 1) since we want to compress the underlying file on
disk, giving that file to child processes would mean they have to decompress
it themselves, eating CPU time. 2) since they would have to decompress it
themselves, they would have to allocate the memory for the decompressed
buffers, meaning they cannot all simply share one big decompressed buffer.
- The drawback of this decision is that we have to load and decompress the
buffer up front, before we spawn any child processes. We attempt to
mitigate this by keeping track of all the entries that child processes
access, and only including those in the frozen decompressed shared buffer.
- We base our implementation of this approach off of the shared preferences
implementation. Hopefully I got all of the pieces to fit together
correctly. They seem to work in local testing and on try, but I think
they require a set of experienced eyes looking carefully at them.
- Another decision was whether to send the handles to the buffers over IPC or
via command line. We went with the command line approach, because the startup
cache would need to be accessed very early on in order to ensure we do not
read from any omnijars, and we could not make that work via IPC.
- Unfortunately this means adding another hard-coded FD, similar to
kPrefMapFileDescriptor. It seems like at the very least we need to rope all
of these together into one place, but I think that should be filed as a
follow-up?
Lastly, because this patch is a bit of a monster to review - first, thank you
for looking at it, and second, the reason we're invested in this is because we
saw a >10% improvement in cold startup times on reference hardware, with a p
value less than 0.01. It's still not abundantly clear how reference hardware
numbers translate to numbers on release, and they certainly don't translate
well to Nightly numbers, but it's enough to convince me that it's worth some
effort.
Depends on D78584
Differential Revision: https://phabricator.services.mozilla.com/D77635
We need to be able to init StartupCache before the Omnijar in order to cache
all of the Omnijar contents we access. This patch implements that.
Depends on D77632
Differential Revision: https://phabricator.services.mozilla.com/D77633
We would like to be able to defer opening the omnijar files until after startup
if the StartupCache has already been populated. Opening the omnijar files takes
a nontrivial time, at least on Windows, and almost everything in the omnijar
should be fairly compressible, and thus makes sense to live in the StartupCache.
See the last patch in this series for a little more discussion on numbers, but
tl;dr: we saw a 12% improvement in time to about:home being finished on reference
hardware with these changes together with the changes from the descendant patches.
Differential Revision: https://phabricator.services.mozilla.com/D77632
The overall goal of this patch is to make the StartupCache accessible anywhere.
There's two main pieces to that equation:
1. Allowing it to be accessed off main thread, which means modifying the
mutex usage to ensure that all data accessed from non-main threads is
protected.
2. Allowing it to be accessed out of the chrome process, which means passing
a handle to a shared cache buffer down to child processes.
Number 1 is somewhat fiddly, but it's all generally straightforward work. I'll
hope that the comments and the code are sufficient to explain what's going on
there.
Number 2 has some decisions to be made:
- The first decision was to pass a handle to a frozen chunk of memory down to
all child processes, rather than passing a handle to an actual file. There's
two reasons for this: 1) since we want to compress the underlying file on
disk, giving that file to child processes would mean they have to decompress
it themselves, eating CPU time. 2) since they would have to decompress it
themselves, they would have to allocate the memory for the decompressed
buffers, meaning they cannot all simply share one big decompressed buffer.
- The drawback of this decision is that we have to load and decompress the
buffer up front, before we spawn any child processes. We attempt to
mitigate this by keeping track of all the entries that child processes
access, and only including those in the frozen decompressed shared buffer.
- We base our implementation of this approach off of the shared preferences
implementation. Hopefully I got all of the pieces to fit together
correctly. They seem to work in local testing and on try, but I think
they require a set of experienced eyes looking carefully at them.
- Another decision was whether to send the handles to the buffers over IPC or
via command line. We went with the command line approach, because the startup
cache would need to be accessed very early on in order to ensure we do not
read from any omnijars, and we could not make that work via IPC.
- Unfortunately this means adding another hard-coded FD, similar to
kPrefMapFileDescriptor. It seems like at the very least we need to rope all
of these together into one place, but I think that should be filed as a
follow-up?
Lastly, because this patch is a bit of a monster to review - first, thank you
for looking at it, and second, the reason we're invested in this is because we
saw a >10% improvement in cold startup times on reference hardware, with a p
value less than 0.01. It's still not abundantly clear how reference hardware
numbers translate to numbers on release, and they certainly don't translate
well to Nightly numbers, but it's enough to convince me that it's worth some
effort.
Depends on D78584
Differential Revision: https://phabricator.services.mozilla.com/D77635
We need to be able to init StartupCache before the Omnijar in order to cache
all of the Omnijar contents we access. This patch implements that.
Depends on D77632
Differential Revision: https://phabricator.services.mozilla.com/D77633
We would like to be able to defer opening the omnijar files until after startup
if the StartupCache has already been populated. Opening the omnijar files takes
a nontrivial time, at least on Windows, and almost everything in the omnijar
should be fairly compressible, and thus makes sense to live in the StartupCache.
See the last patch in this series for a little more discussion on numbers, but
tl;dr: we saw a 12% improvement in time to about:home being finished on reference
hardware with these changes together with the changes from the descendant patches.
Differential Revision: https://phabricator.services.mozilla.com/D77632
The overall goal of this patch is to make the StartupCache accessible anywhere.
There's two main pieces to that equation:
1. Allowing it to be accessed off main thread, which means modifying the
mutex usage to ensure that all data accessed from non-main threads is
protected.
2. Allowing it to be accessed out of the chrome process, which means passing
a handle to a shared cache buffer down to child processes.
Number 1 is somewhat fiddly, but it's all generally straightforward work. I'll
hope that the comments and the code are sufficient to explain what's going on
there.
Number 2 has some decisions to be made:
- The first decision was to pass a handle to a frozen chunk of memory down to
all child processes, rather than passing a handle to an actual file. There's
two reasons for this: 1) since we want to compress the underlying file on
disk, giving that file to child processes would mean they have to decompress
it themselves, eating CPU time. 2) since they would have to decompress it
themselves, they would have to allocate the memory for the decompressed
buffers, meaning they cannot all simply share one big decompressed buffer.
- The drawback of this decision is that we have to load and decompress the
buffer up front, before we spawn any child processes. We attempt to
mitigate this by keeping track of all the entries that child processes
access, and only including those in the frozen decompressed shared buffer.
- We base our implementation of this approach off of the shared preferences
implementation. Hopefully I got all of the pieces to fit together
correctly. They seem to work in local testing and on try, but I think
they require a set of experienced eyes looking carefully at them.
- Another decision was whether to send the handles to the buffers over IPC or
via command line. We went with the command line approach, because the startup
cache would need to be accessed very early on in order to ensure we do not
read from any omnijars, and we could not make that work via IPC.
- Unfortunately this means adding another hard-coded FD, similar to
kPrefMapFileDescriptor. It seems like at the very least we need to rope all
of these together into one place, but I think that should be filed as a
follow-up?
Lastly, because this patch is a bit of a monster to review - first, thank you
for looking at it, and second, the reason we're invested in this is because we
saw a >10% improvement in cold startup times on reference hardware, with a p
value less than 0.01. It's still not abundantly clear how reference hardware
numbers translate to numbers on release, and they certainly don't translate
well to Nightly numbers, but it's enough to convince me that it's worth some
effort.
Depends on D78584
Differential Revision: https://phabricator.services.mozilla.com/D77635
We need to be able to init StartupCache before the Omnijar in order to cache
all of the Omnijar contents we access. This patch implements that.
Depends on D77632
Differential Revision: https://phabricator.services.mozilla.com/D77633
We would like to be able to defer opening the omnijar files until after startup
if the StartupCache has already been populated. Opening the omnijar files takes
a nontrivial time, at least on Windows, and almost everything in the omnijar
should be fairly compressible, and thus makes sense to live in the StartupCache.
See the last patch in this series for a little more discussion on numbers, but
tl;dr: we saw a 12% improvement in time to about:home being finished on reference
hardware with these changes together with the changes from the descendant patches.
Differential Revision: https://phabricator.services.mozilla.com/D77632
Prior to this patch, the startupcache created its own mWriteThread off which it
wrote to disk. It's initialized by MaybeSpawnWriteThread, which got called
at shutdown, to do the shutdown write if there was any reason to do so, and
from a timer that is re-initialized after every addition to the startup cache,
to run 60s after the last change to the cache.
It then joined that write thread on the main thread (in other words, blocks
on that off-main-thread write completing from the main thread) when:
- xpcom-shutdown fired
- the startupcache itself gets destroyed
- someone calls any of:
* HasEntry
* GetBuffer
* PutBuffer
* InvalidateCache
This patch removes the separate write thread, and instead dispatches a task to
the background task queue, indicating it can block. The task is started in
the same circumstances where we previously used to write (timer from the last
PutBuffer call, and shutdown if necessary).
To ensure it cannot be trying to use the data it writes out (mTable) from
the other thread while that data changes on the main thread, we use a mutex.
The task locks the mutex before starting, and unlocks when finished.
Enumerating the cases that we used to block on joining the thread:
In terms of application shutdown, we expect the background task queue to
either finish the write task, or fail to run it if it hasn't started it yet.
In the FastStartup case, we check if a write was necessary; if so, we
attempt to gain the lock without waiting. If we're successful, the write has
not yet started, and we instead run the write on the main thread. Otherwise,
we retry gaining the lock, blocking this time, thus guaranteeing the
off-the-main-thread write completes.
The task keeps a reference to the startupcache object, so it cannot be
destroyed while the task is pending.
Because the write does not modify `mTable`, and neither does `HasEntry`,
we do not need to do anything there.
In the `GetBuffer` case, we do not modify the table unless we have to read
the entry off disk (memmapped into `mCacheData`). This can only happen if
`mCacheData.initialized()` returns true, and we specifically call
`mCacheData.reset()` before firing off the write task to avoid this.
`mCacheData` is only re-initialized if someone calls `LoadArchive()`,
which can only happen from `Init()` (which is guaranteed not to run
again because this is a singleton), or `InvalidateCache()`, where we lock
the mutex (see below). So this is safe - but we assert on the lock to try
and avoid people breaking this chain of assumptions in the future.
When `PutBuffer` is called, we try to lock the mutex - but if locking fails
(ie the background thread is writing), we simply fail to store the entry
in the startupcache. In practice, this should be rare - it'd happen if
new calls to PutBuffer happen while writing during shutdown (when really,
we don't care) or when it's been 60 seconds since the last PutBuffer so
we started writing the startupcache.
When InvalidateCache is called, we lock the mutex - we shouldn't try to
write while invalidating, or invalidate while writing. This may be slow,
but in practice nothing should call `InvalidateCache` except developer
restarts or the `-purgecaches` commandline flag, so it shouldn't
matter a great deal.
Differential Revision: https://phabricator.services.mozilla.com/D70413
--HG--
extra : moz-landing-system : lando
Please double check that I am using this correctly. I believe we are
seeing the crash in the linked bug because we are not handling hardware
faults when reading from the memory mapped file. This patch just wraps
all accesses in the MMAP_FAULT_HANDLER_ macros.
Depends on D53042
Differential Revision: https://phabricator.services.mozilla.com/D53043
--HG--
rename : modules/libjar/MmapFaultHandler.cpp => mozglue/misc/MmapFaultHandler.cpp
rename : modules/libjar/MmapFaultHandler.h => mozglue/misc/MmapFaultHandler.h
extra : moz-landing-system : lando
This is not, AFAICT, causing crashes - it is just an oversight noticed while
investigating the crash in the bug.
Differential Revision: https://phabricator.services.mozilla.com/D53042
--HG--
extra : moz-landing-system : lando
Adds a new startup cache info service to expose some of the internal
state of the startup cache.
Differential Revision: https://phabricator.services.mozilla.com/D69298
--HG--
extra : moz-landing-system : lando
Adds a new startup cache info service to expose some of the internal
state of the startup cache.
Differential Revision: https://phabricator.services.mozilla.com/D69298
--HG--
extra : moz-landing-system : lando
Please double check that I am using this correctly. I believe we are
seeing the crash in the linked bug because we are not handling hardware
faults when reading from the memory mapped file. This patch just wraps
all accesses in the MMAP_FAULT_HANDLER_ macros.
Depends on D53042
Differential Revision: https://phabricator.services.mozilla.com/D53043
--HG--
rename : modules/libjar/MmapFaultHandler.cpp => mozglue/misc/MmapFaultHandler.cpp
rename : modules/libjar/MmapFaultHandler.h => mozglue/misc/MmapFaultHandler.h
extra : moz-landing-system : lando
This is not, AFAICT, causing crashes - it is just an oversight noticed while
investigating the crash in the bug.
Differential Revision: https://phabricator.services.mozilla.com/D53042
--HG--
extra : moz-landing-system : lando
The exact circumstances of how this is showing up in the wild aren't
clear - there seem to be a couple of ways we can get here. However it
all revolves around early shutdowns (i.e., from the select profile popup)
- before the StartupCache is ever initialized. In any case, the solution
shouldn't change based on the exact circumstances - if we don't have a
StartupCache, there's no need to write one. Also, don't bother lazy
initializing it if it doesn't exist yet.
Differential Revision: https://phabricator.services.mozilla.com/D63208
--HG--
extra : moz-landing-system : lando
Reordering the mWrittenOnce check should be sufficient to eliminate
the data race; however, I made mWrittenOnce an atomic just to reduce
the fragility of this since it is intended to be written from and
read to on multiple threads.
Differential Revision: https://phabricator.services.mozilla.com/D62949
--HG--
extra : moz-landing-system : lando
This was just left in, and does not need to be here. We want to be
spawning the background thread here which we will wait on from
xpcom-shutdown.
Differential Revision: https://phabricator.services.mozilla.com/D62848
--HG--
extra : moz-landing-system : lando
This was just left in, and does not need to be here. We want to be
spawning the background thread here which we will wait on from
xpcom-shutdown.
Differential Revision: https://phabricator.services.mozilla.com/D62848
--HG--
extra : moz-landing-system : lando
The initial thought for getting the StartupCache out of the shutdown
path was to simply not write it during shutdown, as it should write
60 seconds after startup, and the theory was that if the user shut
down within the first 60 seconds of use, they were likely updating or
something and it shouldn't matter. However, considering how many of
our users only ever open one tab, I think it's rather likely that
users are starting up firefox to go to a web site, then closing it
when done with that website, and then maybe opening up a new instance
in order to go to a different website. Accordingly it still makes
sense to continue writing it. However, we may as well leverage a
background thread for this and get it kicked off earlier during
shutdown, so we don't sit there blocking in the destructor late
during shutdown.
Differential Revision: https://phabricator.services.mozilla.com/D62294
--HG--
extra : source : 7b7b147b6955cee07e0c115993446bfbd59cf7e2
extra : histedit_source : 6990122d6b1ac4939b0e4b0a5e452183fb981e19