From ec9b928e704863af41878e8e84d1d3fc46ae33fe Mon Sep 17 00:00:00 2001
From: Chris Lattner
-
-
+
@@ -782,6 +793,22 @@ vector is also useful when interfacing with code that expects vectors :).
std::deque is, in some senses, a generalized version of std::vector. Like +std::vector, it provides constant time random access and other similar +properties, but it also provides efficient access to the front of the list. It +does not guarantee continuity of elements within memory.
+ +In exchange for this extra flexibility, std::deque has significantly higher +constant factor costs than std::vector. If possible, use std::vector or +something cheaper.
+Other STL containers are available, such as std::deque (which has similar -characteristics to std::vector, but has higher constant factors and provides -efficient push_front/pop_front methods) and std::string.
+Other STL containers are available, such as std::string.
There are also various STL adapter classes such as std::queue, std::priority_queue, std::stack, etc. These provide simplified access to an @@ -845,16 +870,188 @@ underlying container but don't affect the cost of the container itself.
Set-like containers are useful when you need to canonicalize multiple values +into a single representation. There are several different choices for how to do +this, providing various trade-offs.
+ +If you intend to insert a lot of elements, then do a lot of queries, one +great approach is to use a vector (or other sequential container), and then use +std::sort+std::unique to remove duplicates. This approach works really well if +your usage pattern has these two distinct phases (insert then query), and, +coupled with a good choice of sequential container +can provide the several nice properties: the result data is contiguous in memory +(good for cache locality), has few allocations, is easy to address (iterators in +the final vector are just indices or pointers), and can be efficiently queried +with a standard binary search.
+ +If you have a set-like datastructure that is usually small and whose elements +are reasonably small, a SmallSet<Type, N> is a good choice. This set +has space for N elements in place (thus, if the set is dynamically smaller than +N, no malloc traffic is required) and access them with a simple linear search. +When the set grows beyond 'N', it allocates a more expensive representation that +guarantees efficient access (for most types, it falls back to std::set, but for +pointers it uses something far better, see SmallPtrSet).
+ +The magic of this class is that it handles small sets extremely efficiently, +but gracefully handles extremely large sets without loss of efficiency. The +drawback is that the interface is quite small: it supports insertion, queries +and erasing, but does not support iteration.
+ +SmallPtrSet has all the advantages of SmallSet (and a SmallSet of pointers is +transparently implemented with a SmallPtrSet), but also suports iterators. If +more than 'N' allocations are performed, a single quadratically +probed hash table is allocated and grows as needed, providing extremely +efficient access (constant time insertion/deleting/queries with low constant +factors) and is very stingy with malloc traffic.
+ +Note that, unlike std::set, the iterators of SmallPtrSet are invalidated +whenever an insertion occurs. Also, the values visited by the iterators are not +visited in sorted order.
+ +-SmallPtrSet -SmallSet -sorted vector -FoldingSet -hash_set -UniqueVector -SetVector +FoldingSet is an aggregate class that is really good at uniquing +expensive-to-create or polymorphic objects. It is a combination of a chained +hash table with intrusive links (uniqued objects are required to inherit from +FoldingSetNode) that uses SmallVector as part of its ID process.
+ +Consider a case where you want to implement a "getorcreate_foo" method for +a complex object (for example, a node in the code generator). The client has a +description of *what* it wants to generate (it knows the opcode and all the +operands), but we don't want to 'new' a node, then try inserting it into a set +only to find out it already exists (at which point we would have to delete it +and return the node that already exists).
+To support this style of client, FoldingSet perform a query with a +FoldingSetNodeID (which wraps SmallVector) that can be used to describe the +element that we want to query for. The query either returns the element +matching the ID or it returns an opaque ID that indicates where insertion should +take place.
+ +Because FoldingSet uses intrusive links, it can support polymorphic objects +in the set (for example, you can have SDNode instances mixed with LoadSDNodes). +Because the elements are individually allocated, pointers to the elements are +stable: inserting or removing elements does not invalidate any pointers to other +elements. +
+ +std::set is a reasonable all-around set class, which is good at many things +but great at nothing. std::set use a allocates memory for every single element +inserted (thus it is very malloc intensive) and typically stores three pointers +with every element (thus adding a large amount of per-element space overhead). +It offers guaranteed log(n) performance, which is not particularly fast. +
+ +The advantages of std::set is that its iterators are stable (deleting or +inserting an element from the set does not affect iterators or pointers to other +elements) and that iteration over the set is guaranteed to be in sorted order. +If the elements in the set are large, then the relative overhead of the pointers +and malloc traffic is not a big deal, but if the elements of the set are small, +std::set is almost never a good choice.
+ +LLVM's SetVector<Type> is actually a combination of a set along with +a Sequential Container. The important property +that this provides is efficient insertion with uniquing (duplicate elements are +ignored) with iteration support. It implements this by inserting elements into +both a set-like container and the sequential container, using the set-like +container for uniquing and the sequential container for iteration. +
+ +The difference between SetVector and other sets is that the order of +iteration is guaranteed to match the order of insertion into the SetVector. +This property is really important for things like sets of pointers. Because +pointer values are non-deterministic (e.g. vary across runs of the program on +different machines), iterating over the pointers in a std::set or other set will +not be in a well-defined order.
+ ++The drawback of SetVector is that it requires twice as much space as a normal +set and has the sum of constant factors from the set-like container and the +sequential container that it uses. Use it *only* if you need to iterate over +the elements in a deterministic order. SetVector is also expensive to delete +elements out of (linear time). +
+ ++The STL provides several other options, such as std::multiset and the various +"hash_set" like containers (whether from C++TR1 or from the SGI library).
+ +std::multiset is useful if you're not interested in elimination of +duplicates, but has all the drawbacks of std::set. A sorted vector or some +other approach is almost always better.
+ +The various hash_set implementations (exposed portably by +"llvm/ADT/hash_set") is a standard chained hashtable. This algorithm is malloc +intensive like std::set (performing an allocation for each element inserted, +thus having really high constant factors) but (usually) provides O(1) +insertion/deletion of elements. This can be useful if your elements are large +(thus making the constant-factor cost relatively low). Element iteration does +not visit elements in a useful order.
+