Fangrui Song 8bf934f954 [llvm-exegesis] Clustering: don't enqueue a point multiple times
Summary:
SetVector uses both DenseSet and vector, which is time/memory inefficient. The points are represented as natural numbers so we can replace the DenseSet part by indexing into a vector<char> instead.

Don't cargo cult the pseudocode on the wikipedia DBSCAN page. This is a standard BFS style algorithm (the similar loops have been used several times in other LLVM components): every point is processed at most once, thus the queue has at most NumPoints elements. We represent it with a vector and allocate it outside of the loop to avoid allocation in the loop body.

We check `Processed[P]` to avoid enqueueing a point more than once, which also nicely saves us a `ClusterIdForPoint_[Q].isUndef()` check.

Many people hate the oneshot abstraction but some favor it, therefore we make a compromise, use a lambda to abstract away the neighbor adding process.

Delete the comment `assert(Neighbors.capacity() == (Points_.size() - 1));` as it is wrong.

llvm-svn: 350035
2018-12-23 20:48:52 +00:00
..
2018-06-23 19:04:10 +00:00
2018-10-18 20:07:44 +00:00
2018-07-20 17:27:48 +00:00