- Use uint32 for all quantities, and double for all differences, so that we can overflow without breaking deltas such as (busy == allocs - frees). This doesn't help sorting, however: if one of the sort keys has overflowed past 0 just a bit, while the other is a very large unsigned number, beware.
- Separate graph link (half an edge, structurally speaking -- no per-edge stats) from graph edge, so that an edge is two links and some stats. This avoids bloat and copying in connect_nodes (which is soon to become generic and move to tmreader.[ch]).
- Factor data structures better: we now have {allocs,frees} x {bytes,calls} x {direct, total}, declaring the second set as struct tmallcounts and the third as tmcounts. So, for example, total number of calls to allocators would be allocs.calls.total; bytes freed directly by a graphnode (library, component, or method) would be frees.bytes.direct.
- Teach tmreader_eventloop about 'F' (TM_EVENT_FREE) events: it now updates the direct free byte and call counts for a method, its component, and its library when it reads the event. Of course, bloatblame ignores this info, because it is concerned only with bloat (total memory allocated).
- Right-align numbers in the first (trace-malloc stats) table.