While fixing non-unified-build errors in dom/canvas, I started hitting a
static_assert that we were calling IsPowerOfTwo with a signed type. It
turns out we have at least three copies of IsPowerOfTwo() in the tree.
Let's drop the non-mfbt ones.
MozReview-Commit-ID: 1fwQw0CrgiE
Allocate an additional slot in localInfo_ and use it to save the incoming TLS
pointer. When setting up arguments for a function call, get the TLS pointer from
that local slot. Also preserve the TLS pointer register by reloading it before
returning.
This makes the Baseline ABI compatible with the Ion ABI.
- Add a virtual isCallPreserved() method to LNode which allows a call
instruction to indicate that it preserves the values of some registers. Use
this hook in BacktrackingAllocator when processing a call instruction.
- Add a preservesTlsReg() property to MAsmJSCall and use this to implement the
LAsmJSCall::isCallPreserved() method.
- Mark intra-module WebAssembly calls as preserving the TLS pointer register.
This change allows the backtracking register allocator to leave the TLS pointer
register alone in small functions that don't need it for something else. There
are probably more improvements to be done if we need to split the live range of
the TLS pointer register. For example, BacktrackingAllocator::splitAcrossCalls()
will still split that live range at all calls.
WebAssembly functions take a TLS pointer argument and now ensure that the
WasmTlsReg register has the same value when they return.
This is not yet exploited by the register allocator which still thinks that all
registers are clobbered by function calls.
Get the stack limit from TlsData::stackLimit instead of
SymbolicAddress::StackLimit. Since the TLS pointer register is available at
every function prologue, the over-recursion check is the same cost as using the
statically linked address.
WebAssembly functions now expect to be passed a hidden argument in WasmTlsReg
which is a pointer to a TlsData struct.
Temporarily allocate a TlsData instance in the wasm::Instance itself. When wasm
supports multithreading, we will need to allocate a TlsData instance per thread
per module instance.
This patch generates code to pass the TLS pointer to WebAssembly functions,
preserving it through intra-module calls. The pointer is not used for anything
yet, and the the TLS pointer register is not currently preserved across function
calls.
Including the PID makes it impossible to aggregate crash reports on
crash-stats.
This also reduces buffer size to ensure that having two buffers does
not increase total stack size, though that is unlikely to matter. 1000
characters is likely excessive in any event.
mozilla::SignalTrampoline is designed to work around a bug in older ARM
kernels; it constructs a trampoline function with a NOP slide and then
calls a specified function. This feat is accomplished using inline
assembly and naked functions, which is a GCC extension where you get to
write the entire body of your function using GCC inline assembly.
Unfortunately, the particular implementation that it uses requires the
specified function's address to be loaded into a register. GCC permits
this and we use input arguments to the assembly statement to ensure that
GCC knows it shouldn't clobber the incoming argument registers when
trying to load the function's address.
clang, however, complains about the use of input parameters in naked
functions. So we need to find something that will work on both GCC and
clang.
The trick is to realize that we're a) tail-calling the specified
function and b) we don't have to worry about calling a fully-general
function. We just have to worry about calling a function inside libxul,
and we can therefore "assume" that the offset between the branch and the
called function fits into the immediate field of a Thumb (or ARM) branch
instruction. (This assumption is not strictly true; the branch range is
+/-16MB or so and libxul is actually quite a bit bigger than that. But
it works in practice, and the linker will insert branch stubs if
necessary to make things work out OK.)
The upshot is that we can use a "b" instruction instead of a "bx"
instruction, and this makes clang much happier. As a small bonus, the
stub gets ever-so-much-more efficient, which is probably the
least-significant micro-optimization ever.