which, at this point, is only the AC_SUBST, so we don't even need to
give it to old-configure anymore.
Depends on D17207
Differential Revision: https://phabricator.services.mozilla.com/D17208
--HG--
extra : moz-landing-system : lando
The only use in configure itself is for a MSVC version check that is now
always true (we don't accept versions < 19.15 anymore).
The only uses in the build system are in code that could just use
CC_TYPE instead.
Differential Revision: https://phabricator.services.mozilla.com/D17207
--HG--
extra : moz-landing-system : lando
which, at this point, is only the AC_SUBST, so we don't even need to
give it to old-configure anymore.
Depends on D17207
Differential Revision: https://phabricator.services.mozilla.com/D17208
--HG--
extra : moz-landing-system : lando
The only use in configure itself is for a MSVC version check that is now
always true (we don't accept versions < 19.15 anymore).
The only uses in the build system are in code that could just use
CC_TYPE instead.
Differential Revision: https://phabricator.services.mozilla.com/D17207
--HG--
extra : moz-landing-system : lando
- Replace oversizeSize with smallAllocsSize. This will track sizes of
non-transferred small allocation chunks. It excludes unused chunks,
oversize chunks and chunks transfered from other LifoAlloc. This new
counter is used to determine chunk size growth heuristics. This aims
to reduce memory spikes due to transferFrom allocation patterns that
we see in the wild.
- Also fix a pre-existing typo in LifoAlloc::reset
When using LifoAlloc::transferFrom to merge allocators (such as for
types), prepend the chunks to avoid wasted space at end of current last
chunk. This helps usecases such as TypeInference to behave more
predictably when merging small allocators.
(Original implementation by :nbp)
With jitted primitives for racy atomic access in place, we can
consolidate most C++ realizations of the atomic primitives into two
headers, one for gcc/Clang and one for MSVC, that will be used as
default fallbacks on non-tier-1 platforms.
Non-tier-1 platforms can still implement their own atomics layer, as
does MIPS already; we leave the MIPS code alone here.
--HG--
rename : js/src/jit/none/AtomicOperations-feeling-lucky.h => js/src/jit/shared/AtomicOperations-feeling-lucky-gcc.h
rename : js/src/jit/x86-shared/AtomicOperations-x86-shared-msvc.h => js/src/jit/shared/AtomicOperations-feeling-lucky-msvc.h
extra : rebase_source : e6d98623d0ae2992929e44a04e61abf0ff2669f0
extra : histedit_source : 286a6b4d8250342ebe6206c7af3fd80d49f1851c
SpiderMonkey (and eventually DOM) will sometimes access shared memory
from multiple threads without synchronization; this is a natural
consequence of the JS memory model + JS/DOM specs. We have always had
a hardware-specific abstraction layer for these accesses, to isolate
code from the details of how unsynchronized / racy access is handled.
This layer has been written in C++ and has several problems:
- In C++, racy access is undefined behavior, and the abstraction layer
is therefore inherently unsafe, especially in the presence of
inlining, PGO, and clever compilers. (And TSAN will start
complaining, too.)
- Some of the compiler intrinsics that are used in the C++ abstraction
layer are not actually the right primitives -- they assume C++, ie
non-racy, semantics, and may not implement the correct barriers in
all cases.
- There are few guarantees that the synchronization implemented by the
C++ primitives is actually compatible with the synchronization used
by jitted code.
- While x86 and ARM have 8-byte synchronized access (CMPXCHG8B and
LDREXD/STREXD), some C++ compilers do not support their use well or
at all, leading to occasional hardship for porting teams.
This patch solves all these problems by jit-generating the racy access
abstraction layer in the form of C++-compatible functions that: do not
trigger UB in the C++ code; do not depend on possibly-incorrect
intrinsics but instead always emit the proper barriers; are guaranteed
to be JIT-compatible; and support x86 properly.
Mostly this code is straightforward: each access function is a short,
nearly prologue- and epilogue-less, sequence of instructions that
performs a normal load or store or appropriately synchronized
operation (CMPXCHG or similar). Safe-for-races memcpy and memmove are
trickier but are handled by combining some C++ code with several
jit-generated functions that perform unrolled copies for various block
sizes and alignments.
The performance story is not completely satisfactory:
On the one hand, we don't regress anything because copying
unshared-to-unshared we do not use the new primitives but instead the
C++ compiler's optimized memcpy and standard memory loads and stores.
On the other hand, performance with shared memory is lower than
performance with unshared memory. TypedArray.prototype.set() is a
good test case. When the source and target arrays have the same type,
the engine uses a memcpy; shared memory copying is 3x slower than
unshared memory for 100,000 8K copies (Uint8). However, when the
source and target arrays are slightly different types (Uint8 vs Int8)
the engine uses individual loads and stores, which for shared memory
turns into two calls per byte being moved; in this case, shared memory
is 127x slower than unshared memory. (All numbers on x64 Linux.)
Can we live with the very significant slowdown in the latter case? It
depends on the applications we envision for shared memory. Primarily,
shared memory will be used as wasm heap memory, in which case most
applications that need to move data will use all Uint8Array arrays and
the slowdown is OK. But it is clearly a type of performance cliff.
We can reduce the overhead by jit-generating more code, specifically
code to perform the load, convert, and store in common cases. More
interestingly, and simpler, we can probably use memcpy in all cases by
copying first (fairly fast) and then running a local fixup. A bug
should be filed for this but IMO we're OK with the current solution.
(Memcpy can also be further sped up in platform-specific ways by
generating cleverer code that uses REP MOVS or SIMD or similar.)
--HG--
extra : rebase_source : 0b7e6512d87e7b0ce98147df4be9f8293998fa44
extra : histedit_source : 6474c01aa1eb37290073dabdf3e9f37190ff1a5d%2C24c956ef5b5869bc4a0c0a99c8595bc58a9f6fe5
Because old-configure is only refreshed when, essentially,
old-configure.in changes, hardcoded (absolute) paths don't necessarily
match the build environment of the current build.
So instead, use an environment variable that we pass from python
configure when invoking old-configure.
Also do dummy changes to old-configure.in so that old-configure is
refreshed at least once to get the environment-based value.
Differential Revision: https://phabricator.services.mozilla.com/D17077
--HG--
extra : moz-landing-system : lando
The change to test_bug440572.html is due to a behavior change. Specifically,
before this change, any IDL-declared property, even one not exposed
cross-origin, would prevent named frames with that name being visible
cross-origin. The new behavior is that cross-origin-exposed IDL properties
prevent corresponding frame names from being visible, but ones not exposed
cross-origin don't. This matches the spec and other browsers.
Same thing for the changes to test_bug860494.xul.
The wpt test changes are just adding test coverage for the thing the other
tests caught.
Differential Revision: https://phabricator.services.mozilla.com/D15428
--HG--
extra : moz-landing-system : lando
It was added more than 4 years ago but never got enabled due to benchmark
regressions. Websites can now use WebAssembly for hot code (and the plan is to
use Cranelift instead of Ion there) so it's unlikely we will need this soon.
Differential Revision: https://phabricator.services.mozilla.com/D16950
--HG--
extra : moz-landing-system : lando