On 64-bit Windows (x86_64, aarch64), stack walking relies on
RtlLookupFunctionEntry to navigate from one frame to the next. This
function acquires up to two ntdll internal locks when it is called.
The profiler and the background hang monitor both need to walk the
stacks of suspended threads. This can lead to deadlock situations,
which so far we have avoided with stack walk suppressions. We guard some
critical paths to mark them as suppressing stack walk, and we forbid
stack walking when any thread is currently on such path.
While stack walk suppression has helped remove most deadlock situations,
some can remain because it is hard to detect and manually annotate all
the paths that could lead to a deadlock situation. Another drawback is
that stack walk suppression disables stack walking for much larger
portions of code than required. For example, we disable stack walking
for LdrLoadDll, so we cannot collect stacks while we are loading a DLL.
Yet, the lock that could lead to a deadlock situation is only held
during a very small portion of the whole time spent in LdrLoadDll.
This patch addresses these two issues by implementing a finer-grained
strategy to avoid deadlock situations. We acquire the pointers to the
internel ntdll locks through a single-stepped execution of
RtlLookupFunctionEntry. This allows us to try to acquire the locks
non-blockingly so that we can guarantee safe stack walking with no
deadlock.
If we fail to collect pointers to the locks, we fall back to using stack
walk suppressions like before. This way we get the best of both worlds:
if we are confident that the situation is under control, we will use the
new strategy and get better profiler accuracy and no deadlock; in case
of doubt, we can still use the profiler thanks to stack walk
suppressions.
Differential Revision: https://phabricator.services.mozilla.com/D223498
On 64-bit Windows (x86_64, aarch64), stack walking relies on
RtlLookupFunctionEntry to navigate from one frame to the next. This
function acquires up to two ntdll internal locks when it is called.
The profiler and the background hang monitor both need to walk the
stacks of suspended threads. This can lead to deadlock situations,
which so far we have avoided with stack walk suppressions. We guard some
critical paths to mark them as suppressing stack walk, and we forbid
stack walking when any thread is currently on such path.
While stack walk suppression has helped remove most deadlock situations,
some can remain because it is hard to detect and manually annotate all
the paths that could lead to a deadlock situation. Another drawback is
that stack walk suppression disables stack walking for much larger
portions of code than required. For example, we disable stack walking
for LdrLoadDll, so we cannot collect stacks while we are loading a DLL.
Yet, the lock that could lead to a deadlock situation is only held
during a very small portion of the whole time spent in LdrLoadDll.
This patch addresses these two issues by implementing a finer-grained
strategy to avoid deadlock situations. We acquire the pointers to the
internel ntdll locks through a single-stepped execution of
RtlLookupFunctionEntry. This allows us to try to acquire the locks
non-blockingly so that we can guarantee safe stack walking with no
deadlock.
If we fail to collect pointers to the locks, we fall back to using stack
walk suppressions like before. This way we get the best of both worlds:
if we are confident that the situation is under control, we will use the
new strategy and get better profiler accuracy and no deadlock; in case
of doubt, we can still use the profiler thanks to stack walk
suppressions.
Differential Revision: https://phabricator.services.mozilla.com/D223498
This maps better to the code we have in nsWindow, and fixes a couple
bugs which caused maximized skeleton UI-consuming windows to be
mispositioned with the following patches.
Differential Revision: https://phabricator.services.mozilla.com/D225100
This maps better to the code we have in nsWindow, and fixes a couple
bugs which caused maximized skeleton UI-consuming windows to be
mispositioned with the following patches.
Differential Revision: https://phabricator.services.mozilla.com/D225100
This changes how file descriptors are passed to Android child processes to
support the new ChildProcessArgs model. Instead of explicit lists of FDs to pass
around, a single array of FDs is passed from the parent process into the child
service and then passed into GeckoArgs.
Differential Revision: https://phabricator.services.mozilla.com/D221378
This changes how file descriptors are passed to Android child processes to
support the new ChildProcessArgs model. Instead of explicit lists of FDs to pass
around, a single array of FDs is passed from the parent process into the child
service and then passed into GeckoArgs.
Differential Revision: https://phabricator.services.mozilla.com/D221378
The lz4 symbols were never meant to be exposed, the API in Compression.h
being the official way to use it. This is we had LZ4LIB_VISIBILITY set to
nothing. Unfortunately, that wasn't enough, because there is another
similar define for lz4frame: LZ4FLIB_VISIBILITY.
So we had been exporting those lz4frame symbols from firefox-bin
forever, without noticing, but it didn't cause problems until the
symbols were moved to libxul. With them moved to libxul, we end up
with the situation where we might actually end up using the symbols
from the system liblz4, which is pulled indirectly through other
dependencies (through libsystemd, which comes through libdbus).
This is all fine-ish on a "normal" opt build, but with LTO, things
end up such that some calls go through our copy of lz4frame and others
through the system one, and the discrepancy causes a crash.
The symbols file for non-gtest libxul, that hides all symbols but a few,
was saving the non-gtest case, fortunately.
Differential Revision: https://phabricator.services.mozilla.com/D222574
This patch finally removes the shared libraries code that was inside
tools/profiler and uses the one in base profiler everywhere.
Differential Revision: https://phabricator.services.mozilla.com/D220887
We already had this early return on the shared-libraries code that was on the
tools/profiler directory. Adding this check here as well to keep the both files
in sync. It will help us deduplicating the shared libraries code later.
Differential Revision: https://phabricator.services.mozilla.com/D220908
This method was used only internally, and having this method declaration causes
issues. That's this method was already like this in tools/profiler version of
this but it wasn't implemented here.
Differential Revision: https://phabricator.services.mozilla.com/D220886
While testing the de-duplication on Linux and macOS, I noticed that the
symbolication was completely broken due to having incorrect library names in
libs array. They were always like: "<obj-dir>/dist/bin/libxul.so" instead of
"libxul.so". The library names are found from their paths, and apparently we
were using an incorrect path separator for Linux and macOS, "\" instead of "/".
Differential Revision: https://phabricator.services.mozilla.com/D220885
These methods were already inside the shared libraries that was in
tools/profiler. Adding these methods here to make the implementations closer
and make the deduplication easier in the following commits.
Differential Revision: https://phabricator.services.mozilla.com/D220882
As per mozglue/static/README:
> mozglue/static contains parts of the mozglue library that can/should be
> statically linked to e.g. js/Gecko.
The compression part of MFBT is a good candidate for this.
Differential Revision: https://phabricator.services.mozilla.com/D221565
This patch adjusts the various places where we initialize content
processes to call SetGeckoProcessType as early as possible, and be more
consistent. After this change we should only ever set GeckoProcessType
and GeckoChildID once per-process (with the exception of the fork server
process).
In addition to this validation, some more checks around the fork server
were added, such as to prevent forking another forkserver, or forking a
non-content process.
As part of this change, there was some refactoring/cleanup done, such as
removing plugin-container.cpp and content_process_main, as compared to
the other duplicated code between the two call-sites, the duplication
was relatively small, and inlining it helped make things more readable.
Differential Revision: https://phabricator.services.mozilla.com/D218471
This requires quite a bit of piping to get the ChildID passed everywhere where
we currently pass the pid in IPC. This is done by adding a new struct type
(EndpointProcInfo), which is passed around instead of OtherPid in these places,
and contains the full pid.
In most cases, it was a fairly painless change to move over, however in some
cases, more complex changes were required, as the pid was being stored
previously in something like an Atomic<...>, and needed to be switched to using
a mutex-protected value.
In the future, it may be possible to remove OtherPid from IPDL actors once
everything is migrated to ChildID, but we're still a long way off from that, so
for now we unfortunately need to pass both around.
Differential Revision: https://phabricator.services.mozilla.com/D217118
The new GeckoChildID type introduced in this patch is inspired by the existing
ContentParentID type used by ContentParent, but is currently distinct. It is
supported by all process types at the GeckoChildProcessHost level and can be
read for the current process from anywhere.
As this type is similar in many ways to the process type, and should be
available as early as possible within child processes, this was added alongside
the GeckoProcessType value within mozglue to make that easier to do.
The type was chosen to be an int32_t to make it feel similar to a PID, which we
currently use for process identity comparisons across the codebase. The
intention is for GeckoChildID to be preferred for these within-gecko checks, as
these IDs will not be re-used and can be known earlier during child process
creation.
Differential Revision: https://phabricator.services.mozilla.com/D217117
The previous patchset failed to properly convert the RECT to be cleared
from screen- to window-coordinates. (This wasn't noticed because the two
only differ when nontrivial chrome such as the titlebar is present,
which by default it's not.)
Differential Revision: https://phabricator.services.mozilla.com/D214334
When creating the skeleton UI window, cloak and clear it to a theme-
appropriate color before first showing it, just as is done in nsWindow.
Differential Revision: https://phabricator.services.mozilla.com/D212764