On 64-bit Windows (x86_64, aarch64), stack walking relies on
RtlLookupFunctionEntry to navigate from one frame to the next. This
function acquires up to two ntdll internal locks when it is called.
The profiler and the background hang monitor both need to walk the
stacks of suspended threads. This can lead to deadlock situations,
which so far we have avoided with stack walk suppressions. We guard some
critical paths to mark them as suppressing stack walk, and we forbid
stack walking when any thread is currently on such path.
While stack walk suppression has helped remove most deadlock situations,
some can remain because it is hard to detect and manually annotate all
the paths that could lead to a deadlock situation. Another drawback is
that stack walk suppression disables stack walking for much larger
portions of code than required. For example, we disable stack walking
for LdrLoadDll, so we cannot collect stacks while we are loading a DLL.
Yet, the lock that could lead to a deadlock situation is only held
during a very small portion of the whole time spent in LdrLoadDll.
This patch addresses these two issues by implementing a finer-grained
strategy to avoid deadlock situations. We acquire the pointers to the
internel ntdll locks through a single-stepped execution of
RtlLookupFunctionEntry. This allows us to try to acquire the locks
non-blockingly so that we can guarantee safe stack walking with no
deadlock.
If we fail to collect pointers to the locks, we fall back to using stack
walk suppressions like before. This way we get the best of both worlds:
if we are confident that the situation is under control, we will use the
new strategy and get better profiler accuracy and no deadlock; in case
of doubt, we can still use the profiler thanks to stack walk
suppressions.
Differential Revision: https://phabricator.services.mozilla.com/D223498
On 64-bit Windows (x86_64, aarch64), stack walking relies on
RtlLookupFunctionEntry to navigate from one frame to the next. This
function acquires up to two ntdll internal locks when it is called.
The profiler and the background hang monitor both need to walk the
stacks of suspended threads. This can lead to deadlock situations,
which so far we have avoided with stack walk suppressions. We guard some
critical paths to mark them as suppressing stack walk, and we forbid
stack walking when any thread is currently on such path.
While stack walk suppression has helped remove most deadlock situations,
some can remain because it is hard to detect and manually annotate all
the paths that could lead to a deadlock situation. Another drawback is
that stack walk suppression disables stack walking for much larger
portions of code than required. For example, we disable stack walking
for LdrLoadDll, so we cannot collect stacks while we are loading a DLL.
Yet, the lock that could lead to a deadlock situation is only held
during a very small portion of the whole time spent in LdrLoadDll.
This patch addresses these two issues by implementing a finer-grained
strategy to avoid deadlock situations. We acquire the pointers to the
internel ntdll locks through a single-stepped execution of
RtlLookupFunctionEntry. This allows us to try to acquire the locks
non-blockingly so that we can guarantee safe stack walking with no
deadlock.
If we fail to collect pointers to the locks, we fall back to using stack
walk suppressions like before. This way we get the best of both worlds:
if we are confident that the situation is under control, we will use the
new strategy and get better profiler accuracy and no deadlock; in case
of doubt, we can still use the profiler thanks to stack walk
suppressions.
Differential Revision: https://phabricator.services.mozilla.com/D223498
This maps better to the code we have in nsWindow, and fixes a couple
bugs which caused maximized skeleton UI-consuming windows to be
mispositioned with the following patches.
Differential Revision: https://phabricator.services.mozilla.com/D225100
This maps better to the code we have in nsWindow, and fixes a couple
bugs which caused maximized skeleton UI-consuming windows to be
mispositioned with the following patches.
Differential Revision: https://phabricator.services.mozilla.com/D225100
This patch adjusts the various places where we initialize content
processes to call SetGeckoProcessType as early as possible, and be more
consistent. After this change we should only ever set GeckoProcessType
and GeckoChildID once per-process (with the exception of the fork server
process).
In addition to this validation, some more checks around the fork server
were added, such as to prevent forking another forkserver, or forking a
non-content process.
As part of this change, there was some refactoring/cleanup done, such as
removing plugin-container.cpp and content_process_main, as compared to
the other duplicated code between the two call-sites, the duplication
was relatively small, and inlining it helped make things more readable.
Differential Revision: https://phabricator.services.mozilla.com/D218471
This requires quite a bit of piping to get the ChildID passed everywhere where
we currently pass the pid in IPC. This is done by adding a new struct type
(EndpointProcInfo), which is passed around instead of OtherPid in these places,
and contains the full pid.
In most cases, it was a fairly painless change to move over, however in some
cases, more complex changes were required, as the pid was being stored
previously in something like an Atomic<...>, and needed to be switched to using
a mutex-protected value.
In the future, it may be possible to remove OtherPid from IPDL actors once
everything is migrated to ChildID, but we're still a long way off from that, so
for now we unfortunately need to pass both around.
Differential Revision: https://phabricator.services.mozilla.com/D217118
The new GeckoChildID type introduced in this patch is inspired by the existing
ContentParentID type used by ContentParent, but is currently distinct. It is
supported by all process types at the GeckoChildProcessHost level and can be
read for the current process from anywhere.
As this type is similar in many ways to the process type, and should be
available as early as possible within child processes, this was added alongside
the GeckoProcessType value within mozglue to make that easier to do.
The type was chosen to be an int32_t to make it feel similar to a PID, which we
currently use for process identity comparisons across the codebase. The
intention is for GeckoChildID to be preferred for these within-gecko checks, as
these IDs will not be re-used and can be known earlier during child process
creation.
Differential Revision: https://phabricator.services.mozilla.com/D217117
The previous patchset failed to properly convert the RECT to be cleared
from screen- to window-coordinates. (This wasn't noticed because the two
only differ when nontrivial chrome such as the titlebar is present,
which by default it's not.)
Differential Revision: https://phabricator.services.mozilla.com/D214334
When creating the skeleton UI window, cloak and clear it to a theme-
appropriate color before first showing it, just as is done in nsWindow.
Differential Revision: https://phabricator.services.mozilla.com/D212764
This patch introduces an aInstructionFilter argument to
CollectModuleSingleStepData to allow filtering which kind of
instructions should be recorded. We implement the All default filter and
the CallRet filter.
Differential Revision: https://phabricator.services.mozilla.com/D211196
This patch makes the single-step data collection code that we
implemented for bug 1571516 reusable, while preserving its behavior.
No functional changes, except for a slight reordering of the
error-result enum.
We define a generic CollectSingleStepData function that embeds the
magic for starting to trigger single step exceptions and for acting upon
them.
We define a more specialized CollectModuleSingleStepData function which
can be reused if the purpose of single step data collection is to
monitor what paths are taken within a specific module. It stores the
collected data in stack, so that it can be accessed from crash reports.
This code is considered unstable and thus only available in Nightly and
early Beta and only used on paths that are known to crash already.
Differential Revision: https://phabricator.services.mozilla.com/D211195
The -ldl flag was previously set globally, it's now set for the libs
that use it.
Also rationalize the difference between HAVE_DLOPEN and HAVE_DLFCN_H.
Differential Revision: https://phabricator.services.mozilla.com/D203594
The -ldl flag was previously set globally, it's now set for the libs
that use it.
Also rationalize the difference between HAVE_DLOPEN and HAVE_DLFCN_H.
Differential Revision: https://phabricator.services.mozilla.com/D203594
The -ldl flag was previously set globally, it's now set for the libs
that use it.
Also rationalize the difference between HAVE_DLOPEN and HAVE_DLFCN_H.
Differential Revision: https://phabricator.services.mozilla.com/D203594
This patch makes several fundamental changes to the logic we use to inform
the main process whenever the WER runtime exception module intercepts a child
process crash:
* We no longer read the process type or any other data from the child process;
the process type is passed as the runtime exception module's context
* We no longer read the address of the memory area used to communicate with the
main process from the child process arguments. Instead we allocate memory
directly into the main process and store the required information there
* We don't read anything from the main process either, the pointer to the
function used to notify the main process is now found by looking out its
dedicated section in the parent process' xul.dll mapping
* We no longer read the OOM crash annotation from a child process, this
functionality will be restored by making the module use the mozannotation
crates to fetch all the annotations
Differential Revision: https://phabricator.services.mozilla.com/D201589
On Linux and Android, both jitdump and the marker file will keep using
`CLOCK_MONOTONIC` nanoseconds, as before.
On macOS, both jitdump and the marker file will now be using
`TimeStamp::RawMachAbsoluteTimeNanoseconds()` , i.e. "nanoseconds since
mach_absolute_time origin".
This value has the advantage that it is also relatively easy to obtain
in other browser engines, because their internal timestamp value is stored
in milliseconds or nanoseconds rather than in `mach_absolute_time` ticks.
In the past, on macOS, Firefox was using `CLOCK_MONOTONIC` nanoseconds for
jitdump and `TimeStamp::RawMachAbsoluteTimeValue()` for the marker file.
This inconsistency is now fixed.
I will update samply to change how it treats jitdump timestamps on macOS.
There are no other consumers of jitdump files on macOS that I know of.
On Windows, we will keep using raw QPC values for the marker file - this
matches what's in the ETW events. Jitdump on Windows is mostly unused but
I'm updating it to match.
Furthermore, this fixes the order in mozglue/misc/moz.build to make sure
we always use the TimeStamp_darwin implementation on Darwin (and not just
due to a broken configure check, see bug 1681445), and it fixes the #ifdef
in TimeStamp.h to match the Darwin check.
Differential Revision: https://phabricator.services.mozilla.com/D199592