Suspending threads on some platforms may actually add CPU times to the target threads, artificially making it look like that thread did some work since the last CPU sampling.
This is especially visible on Linux, where suspending is done by sending a signal to the target thread, where the signal handler does quite a bit of work. On Windows there seems to be CPU work happening, possibly when the OS is checking whether a thread needs to be woken up.
To help with this, this patch effectively removes the CPU running times that happen during sampling:
* Thread sampling actions before: record-CPU stack ... record-CPU stack ...
The difference between two "record-CPU" is the amount of work done, which includes "stack" sampling.
* After: record-CPU stack reset-CPU ... record-CPU stack reset-CPU ...
Now thanks to the "reset-CPU" happening after "stack", the next "record-CPU" will only include the work done outside of the stack sampling.
This "reset-CPU" action is done in the new platform-specific function `DiscardSuspendedThreadRunningTimes`.
Note that there is a small window (just before and after the actual stack sampling) where some real CPU work may be discarded as well, but it should be negligible compared to all the work that happens between sampling loops.
Differential Revision: https://phabricator.services.mozilla.com/D130444
This is useful for the following parts, as UniqueFileHandle is a cross-platform
type which can also be used to support transferring HANDLEs between processes.
This change requires fairly sweeping changes to existing callsites, which
previously did not require owning access to the handle types when transferring.
For the most part these changes were straightforward, but manual.
Differential Revision: https://phabricator.services.mozilla.com/D126564
New features cpuallthreads, stacksallthreads, and markersallthreads now allow the user to selectively profile non-selected threads for more information.
The gtest GeckoProfiler.FeatureCombinations is modified to test combinations of up to 3 of a set of features, to lower its cost, which allows adding the new features.
Differential Revision: https://phabricator.services.mozilla.com/D130011
Now that most calls to `profiler_thread_is_being_profiled` have been updated, the `ThreadProfilingFeatures` can be made compulsory, to force callers to think about what they really want to know about the current profiling state.
Differential Revision: https://phabricator.services.mozilla.com/D130010
`profiler_thread_is_being_profiled` is used a lot for markers, so it makes sense to have a specialized version, which is a bit shorter, and lives in ProfilerMarkers.h.
Differential Revision: https://phabricator.services.mozilla.com/D130009
This replaces the simple boolean ThreadRegistrationData::mIsBeingProfiled and its directly-dependent functions.
profiler_thread_is_being_profiled now takes an extra (currently optional) argument, to check if any of the given ThreadProfilingFeatures is currently live.
This is used to control:
- Periodic sampling of CPU utilization.
- Periodic sampling of stacks.
- Markers.
This patch doesn't change the observed behavior yet (i.e., instead of IsBeingProfiled being true or false, all thread profiling features are either all or nothing), but will be used for finer-grained control in later patches.
Differential Revision: https://phabricator.services.mozilla.com/D130008
If the IOInterposer cannot retrieve a filename, the "filename" property is not even written in the marker "data" payload. So its presence should be checked before trying to use JS string functions on it.
Differential Revision: https://phabricator.services.mozilla.com/D130607
New features cpuallthreads, stacksallthreads, and markersallthreads now allow the user to selectively profile non-selected threads for more information.
The gtest GeckoProfiler.FeatureCombinations is modified to test combinations of up to 3 of a set of features, to lower its cost, which allows adding the new features.
Differential Revision: https://phabricator.services.mozilla.com/D130011
Now that most calls to `profiler_thread_is_being_profiled` have been updated, the `ThreadProfilingFeatures` can be made compulsory, to force callers to think about what they really want to know about the current profiling state.
Differential Revision: https://phabricator.services.mozilla.com/D130010
`profiler_thread_is_being_profiled` is used a lot for markers, so it makes sense to have a specialized version, which is a bit shorter, and lives in ProfilerMarkers.h.
Differential Revision: https://phabricator.services.mozilla.com/D130009
This replaces the simple boolean ThreadRegistrationData::mIsBeingProfiled and its directly-dependent functions.
profiler_thread_is_being_profiled now takes an extra (currently optional) argument, to check if any of the given ThreadProfilingFeatures is currently live.
This is used to control:
- Periodic sampling of CPU utilization.
- Periodic sampling of stacks.
- Markers.
This patch doesn't change the observed behavior yet (i.e., instead of IsBeingProfiled being true or false, all thread profiling features are either all or nothing), but will be used for finer-grained control in later patches.
Differential Revision: https://phabricator.services.mozilla.com/D130008
This is useful for the following parts, as UniqueFileHandle is a cross-platform
type which can also be used to support transferring HANDLEs between processes.
This change requires fairly sweeping changes to existing callsites, which
previously did not require owning access to the handle types when transferring.
For the most part these changes were straightforward, but manual.
Differential Revision: https://phabricator.services.mozilla.com/D126564
Removing the unlinking shouldn't be a problem, only possible side effect would be removing the partially downloaded file
Differential Revision: https://phabricator.services.mozilla.com/D130132
`GetRunningTimesWithTightTimestamp` was running the native get-running-times functions in a loop until it could run it within a "fast enough" time (based on an initial quick benchmark); This was to associate a timestamp with CPU measurements as close as possible, to get a more reliable CPU utilization graph at the end.
But this loop condition meant that if the system was under high stress, the loop could potentially re-run many times, wasting way more resources than the resulting "tightness" warrants.
So instead of repeating until "fast enough", we repeat at most once, and take the best of both measurements.
In the worst case, CPU measurements and timestamps may be a bit far apart, but it just means that a spike of activity may be reported in the next time slice instead, but the cumulative amount of work will be correct overall.
Differential Revision: https://phabricator.services.mozilla.com/D129773
Changed wget output log granularity from 📣 to :giga: (only used in case the python library didn't cover all usecases)
Added python methods to async cache/download all required files for update verify
Modified Dockerfile requirements to include aiohttp python lib
Differential Revision: https://phabricator.services.mozilla.com/D129429
The user shouldn't set MOZ_PROFILER_SHUTDOWN to an empty string, it wouldn't work anyway.
So now there is an extra check for that, to avoid even attempting to write a profile when there is no actual filename.
Thanks to this, the parent process can now just re-set MOZ_PROFILER_SHUTDOWN to "" in its children, so that they won't try to output their own profile into the same file. (This used to mostly work, because the parent process was the last to write its profile; but anecdotal data shows this may not alwaybe be true.)
As a bonus optimization, this means that child processes don't waste time needlessly saving their profile to disk.
Differential Revision: https://phabricator.services.mozilla.com/D129237
Finally it all comes together!
Factored-out functions from previous patches are now used by the new functions, which work with streaming contexts, in order to prepare stacks are other things first, then read the profile buffer just once to extract all samples and markers, and finally output this data in the expected JSON format.
One trick, is that when working on "legacy" entries (mostly related to samples), we need to handle "modern" marker entries as we encounter them; this happens in EntryGetter, and also when reading compact stacks and same-samples.
Differential Revision: https://phabricator.services.mozilla.com/D128444
This refactoring is a bit more complex (to prepare for the next patch).
The body of StreamSamplesToJSON was moved to a function that takes a lambda, which will provide streaming parameters relevant to the thread id of each read sample.
For now, it's only used by StreamSamplesToJSON, so in this case the lambda only checks if the entry thread id is the expected one, and it provides the JSON writer and other things needed to write these samples.
One extra complexity is that the `ProfileSample sample` that was outside the loop is now removed -- because in later patches this code will handle samples from different threads, so they cannot all share this one sample object. Instead the information that must be carried over is in the streaming parameters (mPreviousStackState and mPreviousStack), so that they correspond to one thread each. But the overall logic is the same, apart from where this out-of-the-loop information lives.
And another difference is that old samples (before `aSinceTime`) are fully processed, and then discarded just before being output, because we need their information to be in the thread's UniqueStacks, in case it's used by a later same-as-before sample.
Differential Revision: https://phabricator.services.mozilla.com/D128443
ThreadStreamingContext contains all the information needed to output samples and markers relevant to one thread.
ProcessStreamingContext contains a list of ThreadStreamingContext's for all profiled threads. This way, when reading the profile buffer, it will be possible to send each sample&marker to the relevant JSON writer.
Differential Revision: https://phabricator.services.mozilla.com/D128442
StreamTraceLoggerJSON doesn't rely on ProfiledThreadData, so it can just be a file-static function.
The code hasn't changed at all, the function needed to be moved up the file, before its first use.
Differential Revision: https://phabricator.services.mozilla.com/D128440
The body of StreamSamplesAndMarkers has been factored out into a templated function, with some lambdas to cover the buffer-reading and streaming of samples and markers.
In a later patch, DoStreamSamplesAndMarkers will be used to also stream already-processed samples and markers.
Differential Revision: https://phabricator.services.mozilla.com/D128439