The existing code for ScanRoots looks at all traversal roots in the graph,
and floods white or black from them. This can take up a large chunk of the
Scan/Unlink slice of ICC, maybe because graph traversal has poor locality.
Outside of a leak, the cycle collector graph is usually only large when
there is a lot of garbage (95% or more of the graph), so we want to
speed up white marking.
To do this, I add a new pass that scans every node and directly sets the
color of any node that should be white, without flooding. This is very
fast. Then a second pass floods black from any remaining grey nodes.
On the page close CC for a real page, I measured a 10x improvement in
ScanRoots() time with this algorithm, from 3ms to 0.3ms.