Jemalloc 4 purges dirty pages regularly during free() when the ratio of dirty
pages compared to active pages is higher than 1 << lg_dirty_mult. We set
lg_dirty_mult in jemalloc_config to limit RSS usage, but it also has an impact
on performance.
So instead of enforcing a high ratio to force more pages being purged, we keep
jemalloc's default ratio of 8, and force a regular purge of all dirty pages,
after cycle collection.
Keeping jemalloc's default ratio avoids cycle-collection-triggered purge to
have to go through really all dirty pages when there are a lot, in which case
the normal jemalloc purge during free() will already have kicked in. It also
takes care of everything that doesn't run the cycle collector still having
a level of purge, like plugins in the plugin-container.
At the same time, since jemalloc_purge_freed_pages does nothing with jemalloc 4,
repurpose the MEMORY_FREE_PURGED_PAGES_MS telemetry probe to track the time
spent in this cycle-collector-triggered purge.
The breakpad dependency in ThreadStackHelper is preventing us from
upgrading our in-tree copy to a newer version (bug 1069556). This patch
gets rid of that dependency. This makes native stack frames not work
for BHR, but because of the ftp.m.o decommissioning, native
symbolication was already broken and naive stack frames already don't
work, so we don't really lose anything from this patch.
Eventually we want to make ThreadStackHelper use other means of
unwinding, such as LUL for Linux
I added | #if 0 | around the code to fill the thread context, but left
the code in because I think we'll evenually want to reuse some of that
code.
This makes the order of |aDelay| and |aType| match those of the InitWith*()
functions.
I've made this change because the inconsistency tripped me up during the
development of part 4.
--HG--
extra : rebase_source : 7d49f3f643e76955ea3de57e0954deb22cda3ddf
There are many sub-classes of nsExpirationTracker. In order to distinguish them
nicely in the logging of timer firings, it's necessary to manually name each
one. (This wouldn't be necessary if there was a way to stringify template
parameters, but there isn't.)
--HG--
extra : rebase_source : 89b99e9dbb2a806bd21145d04a5e023794643b61
This is similar to the solution adopted for bug 1190985, a race in
netwerk's DebugMutexAutoLock. A relaxed atomic tells tools like TSan
that we're OK with this variable being touched from multiple threads.
That it's only set from within a locked mutex should ensure whatever
memory barriers we need are executed so all threads have a consistent
view of what value it contains.
Getting rid of another |volatile| usage in the codebase is just a bonus.
nsEventQueue's HasPending event is defined to simply:
return GetEvent(false, nullptr);
So we can substitute HasPendingEvent for this particular GetEvent call
to make the code clearer.
Its only purpose is to disable PGO. Where that was not already explicitly done,
or irrelevant (because the directory only contains python), I disabled it in
moz.build.
We're already holding a reference to the Runner prior to dispatching it
to the thread pool; we can pass that reference in rather than requiring
the thread pool to take a new reference to it.
There's no reason to wake up all the threads in a thread pool when one
item gets placed in the queue. Waking up one will serve the same
purpose and is significantly more efficient for thread pools with large
numbers of threads.
There's no reason nsThreadPool needs to use a reentrant monitor for
locking its event queue. Having it use a non-reentrant one should be
slightly more efficient, both in the general operation of the monitor,
and that we're not performing redundant locking in methods like
nsThreadPool::Run. This change also eliminates the only usage of
nsEventQueue::GetReentrantMonitor.
Clients of nsEventQueue don't always need fully reentrant monitors.
Let's account for that by having a base class templated on the monitor
type. This change also opens up the possibility of having the monitor
for the event queue not owned by the event queue itself, but by the
client class, which makes a lot more sense than the current design.
The comment here suggests that we might AddRef/Release, but we really do
no such thing. Let's deal with the transfer of ownership directly,
rather than going through nsCOMPtr. This change makes the code slightly
smaller, and it also makes later refactorings to pull the lock out of
this function easier to do, since we don't have to consider how to hold
the lock within the lifetime of the nsCOMPtr temporary.
The patch removes 455 occurrences of FAIL_ON_WARNINGS from moz.build files, and
adds 78 instances of ALLOW_COMPILER_WARNINGS. About half of those 78 are in
code we control and which should be removable with a little effort.
--HG--
extra : rebase_source : 82e3387abfbd5f1471e953961d301d3d97ed2973
This gives zero when jemalloc is enabled and non-zero when jemalloc is disabled
(e.g. I got 83 MiB at start-up, which sounds plausible).
--HG--
extra : rebase_source : f39a15472d48643f57a77f1df03e4a29543d0867
This is motivated by three separate but related problems:
1. Our concept of recursion depth is broken for things that run from AfterProcessNextEvent observers (e.g. Promises). We decrement the recursionDepth counter before firing observers, so a Promise callback running at the lowest event loop depth has a recursion depth of 0 (whereas a regular nsIRunnable would be 1). This is a problem because it's impossible to distinguish a Promise running after a sync XHR's onreadystatechange handler from a top-level event (since the former runs with depth 2 - 1 = 1, and the latter runs with just 1).
2. The nsIThreadObserver mechanism that is used by a lot of code to run "after" the current event is a poor fit for anything that runs script. First, the order the observers fire in is the order they were added, not anything fixed by spec. Additionally, running script can cause the event loop to spin, which is a big source of pain here (bholley has some nasty bug caused by this).
3. We run Promises from different points in the code for workers and main thread. The latter runs from XPConnect's nsIThreadObserver callbacks, while the former runs from a hardcoded call to run Promises in the worker event loop. What workers do is particularly problematic because it means we can't get the right recursion depth no matter what we do to nsThread.
The solve this, this patch does the following:
1. Consolidate some handling of microtasks and all handling of stable state from appshell and WorkerPrivate into CycleCollectedJSRuntime.
2. Make the recursionDepth counter only available to CycleCollectedJSRuntime (and its consumers) and remove it from the nsIThreadInternal and nsIThreadObserver APIs.
3. Adjust the recursionDepth counter so that microtasks run with the recursionDepth of the task they are associated with.
4. Introduce the concept of metastable state to replace appshell's RunBeforeNextEvent. Metastable state is reached after every microtask or task is completed. This provides the semantics that bent and I want for IndexedDB, where transactions autocommit at the end of a microtask and do not "spill" from one microtask into a subsequent microtask. This differs from appshell's RunBeforeNextEvent in two ways:
a) It fires between microtasks, which was the motivation for starting this.
b) It no longer ensures that we're at the same event loop depth in the native event queue. bent decided we don't care about this.
5. Reorder stable state to happen after microtasks such as Promises, per HTML. Right now we call the regular thread observers, including appshell, before the main thread observer (XPConnect), so stable state tasks happen before microtasks.
After this change, we have ShallowSizeOf{In,Ex}cludingThis(), which don't do
anything to measure children. (They can be combined with iteration to measure
children.)
--HG--
extra : rebase_source : f98420176f50990bbc5a25e35788328154cfeb00
After this change, we have ShallowSizeOf{In,Ex}cludingThis(), which don't do
anything to measure children. (They can be combined with iteration to measure
children.)
And we still have the existing single-arg SizeOf{In,Ex}cluding() functions,
which work if the entry type itself defines SizeOfExcludingThis().
--HG--
extra : rebase_source : f93de9b789c21b1b148bed9de795f663f77c9dd9
This can cause leaks that are invisible to our XPCOM leak detection system.
To avoid this, classes should not addref or release in their Traverse methods.
--HG--
extra : rebase_source : acd0b070c63cbb4111c165d6b131b8e3b822773a
After this change, we have PLDHashTable::ShallowSizeOf{In,Ex}cludingThis(),
which don't do anything to measure children. (They can be combined with
iteration to measure children.)
This patch also removes the PL_DHashTableSizeOf{In,Ex}cludingThis() functions.
They're not necessary because the methods can be used instead.
Finally, the patch deliberately converts some SizeOfExcludingThis() calls to
SizeOfIncludingThis(). These are all done on heap pointers so this change is
valid.
--HG--
extra : rebase_source : b1d51096a8e7dcac29d7efd92e28938836ff5481
This makes it clearer that, unlike how SizeOf*() functions usually work, this
doesn't measure any children hanging off the array.
And do likewise for nsTObserverArray.
--HG--
extra : rebase_source : 6a8c8d8ffb53ad51b5773afea77126cdd767f149