Modern NDK headers all have <link.h>, so we should always include it,
and still provide shims for anything we need that's not defined in
<link.h>.
MozReview-Commit-ID: GNBDIe73RFm
--HG--
extra : rebase_source : 1246dce8a7ad201cf4f01de8e4912217636f1fc8
Modern NDK headers all have <link.h>, so we should always include it,
and still provide shims for anything we need that's not defined in
<link.h>.
MozReview-Commit-ID: GNBDIe73RFm
--HG--
extra : rebase_source : f1f2eddc73dc1accb3a1eb8445600b015e9e7d52
Fixes the LUL unwind test cases (viz, gtest LulIntegration.unwind_consistency)
when built with Clang 5.
* Increases the test stack size, LUL_UNIT_TEST_STACK_SIZE, from 16KB to
32KB, since 16KB is gives inadequate margin for the test cases used, and
is actually too small when with building with ASan enabled.
* In the generated test functions, uses write() calls that do nothing to
ensure that Clang cannot optimise away the space[] array that is used to
give different frame sizes to the different test functions. Without
these, Clang 5 optimises out this array and that causes all the unwind
tests to fail.
--HG--
extra : rebase_source : 9d91ea9b08e6771facf7a788163d67f1871f5948
Adds minimal support for reading DWARF CFI pertaining to version 4 of the
standard. Dwarf 4 CFI appears to have become the default used by Clang
version 5. There are two changes:
* Accepts cie->version == 4.
* For version 4 CIEs, skips over the two new fields address_size and
segment_size, but ensures that segment_size is zero. Adds comments in
ReadFDEFields about what to do if we ever find a case where segment_size
is nonzero.
This is in no way full or complete Dwarf 4 support, but it is enough to get
LUL working again with Clang 5 compiled code.
--HG--
extra : rebase_source : f4e21ae5b8d0f219a360d14cc242b2aa812056a0
This removes LUL's ability to recover frames by the heuristic mechanism of
stack scanning. Stack scanning is a last-ditch way to try to recover the
unwind when all other methods (metadata-based, frame-pointer chasing) have
failed, by scanning back up the stack and looking for the first word that
could plausibly be a return address. It often mis-identifies return addresses
because it has no way to distinguish live ones from dead ones that have not
been overwritten, and very often causes the unwind to fail as a result.
In any case LUL's stack scanning ability has actually been switched off (by
the parameters passed to LUL::Unwind) for some considerable time now, so this
change should make no observable difference to behaviour. Specific changes:
In LUL::Unwind():
* Removes formal parameters |scannedFramesAcquired| and |scannedFramesAllowed|
* Removes code that does stack scanning
* Simplifies control flow in the main unwind loop, so that loop now
has the easier-to-follow structure
while (true) {
// preliminary stuff
if (CFI data available for current PC) {
do CFI step;
continue;
}
if (FP chasing possible for current PC) {
do FP step;
continue;
}
// give up
break;
}
* Moves two #ifdefs upwards to enclose the comments pertaining to them, as
well as the code. This makes the top level structure easier to follow. The
corresponding #endifs are likewise commented with the condition.
From class LULStats, removes |mScanned|.
Removes PriMap::MaybeIsReturnPoint() entirely. This is a heuristic helper
only used by stack scanning.
In all, 395 lines of code are removed, according to hg diff --stat.
--HG--
extra : rebase_source : 5ffa73c64923149a58df3228cf940cb539f8f707
This removes the need for PROFILER_LIKELY_MEMORY_CONSTRAINED.
The patch also removes PROFILE_JAVA, USE_FAULTY_LIB, CONFIG_CASE_1,
CONFIG_CASE_2 and replaces all their uses with GP_OS_linux or GP_OS_android.
Finally, the patch removes a bogus |defined(GP_OS_darwin)| condition in
platform-linux-lul.cpp.
--HG--
extra : rebase_source : 77d1c625d65ddf551ab8cd4b962ae48c1a54466c
On x86_64-Linux, LUL currently can only unwind frames for which CFI unwind data
is available. This causes a noticeable number of junk samples in the profiler,
characterised by failures at transition points between JIT and native frame
sequences. This patch allows LUL to try recovering the previous frame using
frame pointer chasing in the case where CFI isn't present. This allows LUL to
unwind through or jump over interleaved JIT frames, because, respectively:
* The baseline JIT produces frame-pointerised code.
* If the profiler is enabled, IonMonkey doesn't produce frame-pointerised code,
but also doesn't change the frame pointer register value. It can use the
frame pointer if profiling is disabled, but that's irrelevant here.
The patch also adds counts of FP-recovered frames to LUL's statistics printing,
to make it possible to assess how often this feature is used.
--HG--
extra : rebase_source : eadc54393788693b0e3f8d5129d48aaaad143a0b
clang's -Wcomma warning warns about suspicious use of the comma operator such as between two statements.
tools/profiler/lul/LulDwarf.cpp:604:15: warning: possible misuse of comma operator here [-Wcomma]
MozReview-Commit-ID: 6ZP79hgtrAD
--HG--
extra : rebase_source : 77028600c713aa3235c3729a5db7be0290df57e4
extra : source : e4536bbeb28050b38979a05b379f13eb4a12beee
For reasons related to the architecture of the Gecko Profiler in previous years,
which are no longer relevant, LUL will only unwind through the first 32KB of
stack. This is mostly harmless, since most stacks are smaller than 4KB, per
measurements today, but occasionally they go above 32KB, causing unwinding to
stop prematurely.
This patch changes the max size to 160KB, and documents the rationale for
copying the stack and unwinding, rather than unwinding in place. 160KB is big
enough for all stacks observed in several minutes of profiling all threads at
1KHz.
--HG--
extra : rebase_source : a1d5526aff50345be8b965c2b6b01c66b40fd0d8
For reasons which are unclear, but possibly due to lack of any known use cases
when the code was written, LUL on i686/x86_64-linux only accepts CFA (canonical
frame address) expressions of the form SP+offset or FP+offset. However, on
Fedora 25 x86_64 and Ubuntu 16.10 x86_64, at least one address range per object
uses a Dwarf expression for the CFA, for example:
00000018 000000000024 0000001c FDE cie=00000000 pc=0000000031e0..0000000031f0
DW_CFA_def_cfa_offset: 16
DW_CFA_advance_loc: 6 to 00000000000031e6
DW_CFA_def_cfa_offset: 24
DW_CFA_advance_loc: 10 to 00000000000031f0
DW_CFA_def_cfa_expression(
DW_OP_breg7 (rsp): 8; DW_OP_breg16 (rip): 0; DW_OP_lit15; DW_OP_and;
DW_OP_lit11; DW_OP_ge; DW_OP_lit3; DW_OP_shl; DW_OP_plus)
producing the following complaint from LUL:
can't summarise: SVMA=0x31f0: rule for DW_REG_CFA: invalid |how|, expr=LExpr(PFXEXPR,0,0)
Given that LUL is capable of handling such a CFA expression, it seems artificial
to stop it doing so. This patch changes Summariser::Rule() so as to allow such
expressions.
LUL doesn't read CFI from the main executable on x86_64-linux, and possibly
other Linux variants, because SharedLibraryInfo::GetInfoForSelf() doesn't
produce a name for the main executable object, even though it does notice the
mapping.
This causes noticeable unwind breakage because the main executable on Linux
contains various wrapper functions pertaining to memory allocation and locking,
such as
moz_xmalloc, moz_xcalloc, moz_xrealloc
mozilla::detail::MutexImpl::lock, mozilla::detail::MutexImpl::unlock
and is generally observable on x86_64-Linux as unwinding failures out of
functions with addresses around 0x40xxxx, since that's the traditional load
address for the main executable.
This patch modifies the Linux implementation of GetInfoForSelf() so as to
harvest the main executable's name from /proc/self/maps. This is then added
into the information acquired from dl_iterate_phdr. As a result
GetInfoForSelf() does correctly report the executable name, so LUL reads Dwarf
unwind info from it, and the abovementioned unwinding failures disappear.
--HG--
extra : rebase_source : 267c6d7c3967a4d29f8ff0b4a91d339a6625085d
The profiler will use level 3 (Info) and 4 (Debug) logging, though this patch
only uses level 3. LUL will use level 5 (Verbose) debugging.
The patch also tweaks parts of the the usage message, including adding
MOZ_PROFILER_{STARTUP,SHUTDOWN} to it.
--HG--
extra : rebase_source : f43a023912fbce993ed367cdd26b8f25f25381de
This patch properly synchronizes all the global state in platform*.cpp, which
gets us a long way towards implementing bug 1330184.
- Most of the global state goes in a new class, ProfilerState, with a single
instance, gPS. All accesses to gPS are protected by gPSMutex. All functions
that access ProfilerState require a token proving that gPS is locked; this
makes things much clearer.
gRegisteredThreadsMutex is removed because it is subsumed by gPSMutex.
- gVerbosity, however, does not go in ProfilerState. It stays separate, and
gains its own mutex, gVerbosityMutex.
Also, the tracking of the current profiler state is streamlined. Previously it
was tracked via:
- stack_key_initialized, gInitCount, gSampler, gIsProfiling, gIsActive, and
gIsPaused.
Now it is tracked via:
- gPS, gPS->sActivity, and gPS->mIsPaused.
This means that the Sampler class is no longer necessary, and the patch removes
it.
Other changes of note made by the patch are as follows.
- It removes ThreadInfo::{mMutex,GetMutex}. This mutex was only used in two
places, and both these are now protected by gPSMutex.
- It tweaks the LOG calls. All the main functions (init(), shutdown(), start(),
stop()) now do consistent BEGIN/END logging, and a couple of other low-value
incidental LOG calls have been removed.
- It adds a lot of release assertions requiring that gPS be initialized (e.g.
profiler_init() has been called but profiler_shutdown() has not).
- It uses alphabetical order for everything involving profiler feature names.
- It removes Platform{Start,Stop}() and SamplerThread::{Start,Stop}Sampler().
These are no longer necessary now that SamplerThread::sInstance has been
replaced with ProfilerState::mSamplerThread which allows more direct access
to the current SamplerThread instance.
- It removes PseudoStack::mPrivacyMode. This was derived from the "privacy"
feature, and we now use gPS->mFeaturePrivacy directly, which is simpler.
It also replaces profiler_in_privacy_mode() with
profiler_is_active_and_not_in_privacy_mode(), which avoids an unnecessary
lock/unlock of gPSMutex on a moderately hot path.
Finally, the new code does more locking than the old one. A number of operation
The following operations now lock a mutex when they previously didn't; the
following are ones that are significant, according to some ad hoc profiling.
- profiler_tracing()
- profiler_is_active()
- profiler_is_active_and_not_in_privacy_mode()
- profiler_add_marker()
- profiler_feature_active()
- SamplerThread::Run() [when the profiler is paused]
All up this roughly doubles the amount of mutex locking done by the profiler.
It's probably possible to avoid this increase by allowing careful unlocked
access to three of the fields in ProfilerState (mActivityGeneration,
mFeaturePrivacy, mStartTime), but this should only be done as a follow-up if
the extra locking is found to be a problem.
--HG--
extra : rebase_source : c2e41231f131b3e9ccd23ddf43626b54ccc77b7b
They duplicate the equivalent SPS_* macros. (The SPS_* macros have already
crept into use in some places within LUL.)
--HG--
extra : rebase_source : 65ed6e6e147189814511b0ca38342fec118478b9
It's fairly straightforward, and measures the important parts of:
- Sampler, PseudoStack, ProfileBuffer, ThreadInfo.
- LUL, PriMap, SecMap
Coverage isn't perfect, but it gets the major things I found via DMD on Linux.
Example output in about:memory:
├──151.21 MB (49.73%) -- profiler
│ ├──141.49 MB (46.53%) ── lul
│ └────9.72 MB (03.20%) ── sampler
--HG--
extra : rebase_source : 67d2ada42aead43f68f5100a08204a1d1f1cfceb