brendan%mozilla.org 43a911aeb6 Fixes for bug 80981 (``Need extended jump bytecode to avoid "script too large"
errors, etc.''):

We now ReportStatementTooLarge only if
- a jump offset overflows 32 bits, signed;
- there are 2**32 or more span dependencies in a script;
- a backpatch chain link is more than (2**30 - 1) bytecodes long;
- a source note's distance from the last note, or from script main entry
  point, is > 0x7fffff bytes.

Narrative of the patch, by file:

- js.c
  The js_SrcNoteName array of const char * is now a js_SrcNoteSpec array of
  "specifiers", structs that include a const char *name member.  Also, due to
  span-dependent jumps at the ends of basic blocks where the decompiler knows
  the basic block length, but not the jump format, we need an offset operand
  for SRC_COND, SRC_IF_ELSE, and SRC_WHILE (to tell the distance from the
  branch bytecode after the condition expression to the span-dependent jump).

- jsarena.[ch]
  JS arenas are used mainly for last-in-first-out allocation with _en masse_
  release to the malloc pool (or, optionally, to a private freelist).  But
  the code generator needs to allocate and grow (by doubling, to avoid O(n^2)
  growth) allocations that hold bytecode, source notes, and span-dependency
  records.  This exception to LIFO allocation works by claiming an entire
  arena from the pool and realloc'ing it, as soon as the allocation size
  reaches the pool's default arena size.  Call such an allocation a "large
  single allocation".

  This patch adds a new arena API, JS_ArenaFreeAllocation, which can be used
  to free a large single allocation.  If called with an allocation that's not
  a large single allocation, it will nevertheless attempt to retract the arena
  containing that allocation, if the allocation is last within its arena.
  Thus JS_ArenaFreeAllocation adds a non-LIFO "free" special case to match the
  non-LIFO "grow" special case already implemented under JS_ARENA_GROW for
  large single allocations.

  The code generator still benefits via this extension to arenas, over purely
  manual malloc/realloc/free, by virtue of _en masse_ free (JS_ARENA_RELEASE
  after code generation has completed, successfully or not).

  To avoid searching for the previous arena, in order to update its next
  member upon reallocation of the arena containing a large single allocation,
  the oversized arena has a back-pointer to that next member stored (but not
  as allocable space within the arena) in a (JSArena **) footer at its end.

- jscntxt.c
  I've observed for many scripts that the bytes of source notes and bytecode
  are of comparable lengths, but only now am I fixing the default arena size
  for cx->notePool to match the size for cx->codePool (1024 instead of 256).

- jsemit.c
  Span-dependent instructions in JS bytecode consist of the jump (JOF_JUMP)
  and switch (JOF_LOOKUPSWITCH, JOF_TABLESWITCH) format opcodes, subdivided
  into unconditional (gotos and gosubs), and conditional jumps or branches
  (which pop a value, test it, and jump depending on its value).  Most jumps
  have just one immediate operand, a signed offset from the jump opcode's pc
  to the target bytecode.  The lookup and table switch opcodes may contain
  many jump offsets.

  This patch adds "X" counterparts to the opcodes/formats (X is suffixed, btw,
  to prefer JSOP_ORX and thereby to avoid colliding on the JSOP_XOR name for
  the extended form of the JSOP_OR branch opcode).  The unextended or short
  formats have 16-bit signed immediate offset operands, the extended or long
  formats have 32-bit signed immediates.  The span-dependency problem consists
  of selecting as few long instructions as possible, or about as few -- since
  jumps can span other jumps, extending one jump may cause another to need to
  be extended.

  Most JS scripts are short, so need no extended jumps.  We optimize for this
  case by generating short jumps until we know a long jump is needed.  After
  that point, we keep generating short jumps, but each jump's 16-bit immediate
  offset operand is actually an unsigned index into cg->spanDeps, an array of
  JSSpanDep structs.  Each struct tells the top offset in the script of the
  opcode, the "before" offset of the jump (which will be the same as top for
  simplex jumps, but which will index further into the bytecode array for a
  non-initial jump offset in a lookup or table switch), the after "offset"
  adjusted during span-dependent instruction selection (initially the same
  value as the "before" offset), and the jump target (more below).

  Since we generate cg->spanDeps lazily, from within js_SetJumpOffset, we must
  ensure that all bytecode generated so far can be inspected to discover where
  the jump offset immediate operands lie within CG_CODE(cg).  But the bonus is
  that we generate span-dependency records sorted by their offsets, so we can
  binary-search when trying to find a JSSpanDep for a given bytecode offset,
  or the nearest JSSpanDep at or above a given pc.

  To avoid limiting scripts to 64K jumps, if the cg->spanDeps index overflows
  65534, we store SPANDEP_INDEX_HUGE in the jump's immediate operand.  This
  tells us that we need to binary-search for the cg->spanDeps entry by the
  jump opcode's bytecode offset (sd->before).

  Jump targets need to be maintained in a data structure that lets us look
  up an already-known target by its address (jumps may have a common target),
  and that also lets us update the addresses (script-relative, a.k.a. absolute
  offsets) of targets that come after a jump target (for when a jump below
  that target needs to be extended).  We use an AVL tree, implemented using
  recursion, but with some tricky optimizations to its height-balancing code
  (see http://www.enteract.com/~bradapp/ftp/src/libs/C++/AvlTrees.html).

  A final wrinkle: backpatch chains are linked by jump-to-jump offsets with
  positive sign, even though they link "backward" (i.e., toward lower bytecode
  address).  We don't want to waste space and search time in the AVL tree for
  such temporary backpatch deltas, so we use a single-bit wildcard scheme to
  tag true JSJumpTarget pointers and encode untagged, signed (positive) deltas
  in JSSpanDep.target pointers, depending on whether the JSSpanDep has a known
  target, or is still awaiting backpatching.

  Note that backpatch chains would present a problem for BuildSpanDepTable,
  which inspects bytecode to build cg->spanDeps on demand, when the first
  short jump offset overflows.  To solve this temporary problem, we emit a
  proxy bytecode (JSOP_BACKPATCH; JSOP_BACKPATCH_PUSH for jumps that push a
  result on the interpreter's stack, namely JSOP_GOSUB; or JSOP_BACKPATCH_POP
  for branch ops) whose nuses/ndefs counts help keep the stack balanced, but
  whose opcode format distinguishes its backpatch delta immediate operand from
  a normal jump offset.

  The cg->spanDeps array and JSJumpTarget structs are allocated from the
  cx->tempPool arena-pool.  This created a LIFO vs. non-LIFO conflict: there
  were two places under the TOK_SWITCH case in js_EmitTree that used tempPool
  to allocate and release a chunk of memory, during whose lifetime JSSpanDep
  and/or JSJumpTarget structs might also be allocated from tempPool -- the
  ensuing release would prove disastrous.  These bitmap and table temporaries
  are now allocated from the malloc heap.

- jsinterp.c
  Straightforward cloning and JUMP => JUMPX mutating of the jump and switch
  format bytecode cases.

- jsobj.c
  Silence warnings about %p used without (void *) casts.

- jsopcode.c
  Massive and scary decompiler whackage to cope with extended jumps, using
  source note offsets to help find jumps whose format (short or long) can't
  be discovered from properties of prior instructions in the script.

  One cute hack here: long || and && expressions are broken up to wrap before
  the 80th column, with the operator at the end of each non-terminal line.

- jsopcode.h, jsopcode.tbl
  The new extended jump opcodes, formats, and fundamental parameterization
  macros.  Also, more comments.

- jsparse.c
  Random and probably only aesthetic fix to avoid decorating a foo[i]++ or
  --foo[i] parse tree node with JSOP_SETCALL, wrongly (only foo(i)++ or
  --foo(i), or the other post- or prefix form operator, should have such an
  opcode decoration on its parse tree).

- jsscript.h
  Random macro naming sanity: use trailing _ rather than leading _ for macro
  local variables in order to avoid invading the standard C global namespace.
2001-10-17 03:16:48 +00:00
..
2001-10-16 22:08:55 +00:00
2001-10-13 12:06:16 +00:00
2001-10-16 19:22:47 +00:00