Make .gnu.hash section smaller.

Our on-disk hash table was unnecessarily large. The cost of collision is
not high in the .gnu.hash table because each symbol in the .gnu.hash
table has a hash value with it. So, for each collided symbol, the
dynamic linker just compares an integer, which is pretty cheap.

This patch increases the load factor by about 8. Here's a comparison.

  $ readelf --histogram libclangSema.so.6.0.0svn-new-lld
  Histogram for `.gnu.hash' bucket list length (total of 582 buckets):
   Length  Number     % of total  Coverage
        0  11         (  1.9%)
        1  35         (  6.0%)      1.5%
        2  93         ( 16.0%)      9.5%
        3  108        ( 18.6%)     23.4%
        4  121        ( 20.8%)     44.1%
        5  86         ( 14.8%)     62.6%
        6  63         ( 10.8%)     78.8%
        7  38         (  6.5%)     90.2%
        8  18         (  3.1%)     96.4%
        9  6          (  1.0%)     98.7%
       10  3          (  0.5%)    100.0%

  $ readelf --histogram libclangSema.so.6.0.0svn-old-lld
  Histogram for `.gnu.hash' bucket list length (total of 4093 buckets):
   Length  Number     % of total  Coverage
        0  1498       ( 36.6%)
        1  1545       ( 37.7%)     37.7%
        2  712        ( 17.4%)     72.5%
        3  251        (  6.1%)     90.9%
        4  66         (  1.6%)     97.3%
        5  16         (  0.4%)     99.3%
        6  5          (  0.1%)    100.0%

  $ readelf --histogram libclangSema.so.6.0.0svn-bfd
  Histogram for `.gnu.hash' bucket list length (total of 1004 buckets):
   Length  Number     % of total  Coverage
      0  92         (  9.2%)
        1  227        ( 22.6%)      9.8%
        2  266        ( 26.5%)     32.6%
        3  222        ( 22.1%)     61.2%
        4  115        ( 11.5%)     81.0%
        5  55         (  5.5%)     92.8%
        6  21         (  2.1%)     98.2%
        7  6          (  0.6%)    100.0%

  $ readelf --histogram libclangSema.so.6.0.0svn-gold
  Histogram for `.gnu.hash' bucket list length (total of 2053 buckets):
   Length  Number     % of total  Coverage
        0  671        ( 32.7%)
        1  709        ( 34.5%)     30.4%
        2  470        ( 22.9%)     70.7%
        3  141        (  6.9%)     88.9%
        4  54         (  2.6%)     98.2%
        5  5          (  0.2%)     99.2%
        6  3          (  0.1%)    100.0%

Differential Revision: https://reviews.llvm.org/D40683

llvm-svn: 319503
This commit is contained in:
Rui Ueyama 2017-11-30 23:59:40 +00:00
parent 2c1e68237f
commit 1cf7f9cc80
2 changed files with 24 additions and 35 deletions

View File

@ -1775,22 +1775,6 @@ static uint32_t hashGnu(StringRef Name) {
return H;
}
// Returns a number of hash buckets to accomodate given number of elements.
// We want to choose a moderate number that is not too small (which
// causes too many hash collisions) and not too large (which wastes
// disk space.)
//
// We return a prime number because it (is believed to) achieve good
// hash distribution.
static size_t getBucketSize(size_t NumSymbols) {
// List of largest prime numbers that are not greater than 2^n + 1.
for (size_t N : {131071, 65521, 32749, 16381, 8191, 4093, 2039, 1021, 509,
251, 127, 61, 31, 13, 7, 3, 1})
if (N <= NumSymbols)
return N;
return 0;
}
// Add symbols to this symbol hash table. Note that this function
// destructively sort a given vector -- which is needed because
// GNU-style hash table places some sorting requirements.
@ -1813,7 +1797,12 @@ void GnuHashTableSection::addSymbols(std::vector<SymbolTableEntry> &V) {
Symbols.push_back({B, Ent.StrTabOffset, hashGnu(B->getName())});
}
NBuckets = getBucketSize(Symbols.size());
// We chose load factor 4 for the on-disk hash table. For each hash
// collision, the dynamic linker will compare a uint32_t hash value.
// Since the integer comparison is quite fast, we believe we can make
// the load factor even larger. 4 is just a conservative choice.
NBuckets = std::max<size_t>(Symbols.size() / 4, 1);
std::stable_sort(Symbols.begin(), Symbols.end(),
[&](const Entry &L, const Entry &R) {
return L.Hash % NBuckets < R.Hash % NBuckets;

View File

@ -34,6 +34,15 @@
# CHECK-NEXT: Section: .text
# CHECK-NEXT: }
# CHECK-NEXT: Symbol {
# CHECK-NEXT: Name: baz
# CHECK-NEXT: Value:
# CHECK-NEXT: Size:
# CHECK-NEXT: Binding: Global
# CHECK-NEXT: Type:
# CHECK-NEXT: Other:
# CHECK-NEXT: Section: Undefined
# CHECK-NEXT: }
# CHECK-NEXT: Symbol {
# CHECK-NEXT: Name: foo
# CHECK-NEXT: Value:
# CHECK-NEXT: Size:
@ -51,15 +60,6 @@
# CHECK-NEXT: Other:
# CHECK-NEXT: Section: Undefined
# CHECK-NEXT: }
# CHECK-NEXT: Symbol {
# CHECK-NEXT: Name: baz
# CHECK-NEXT: Value:
# CHECK-NEXT: Size:
# CHECK-NEXT: Binding: Global
# CHECK-NEXT: Type:
# CHECK-NEXT: Other:
# CHECK-NEXT: Section: Undefined
# CHECK-NEXT: }
# CHECK-NEXT: ]
# CHECK-NOT: NEEDED
@ -90,6 +90,15 @@
# CHECK2-NEXT: Section: .text
# CHECK2-NEXT: }
# CHECK2-NEXT: Symbol {
# CHECK2-NEXT: Name: baz
# CHECK2-NEXT: Value:
# CHECK2-NEXT: Size:
# CHECK2-NEXT: Binding: Global
# CHECK2-NEXT: Type:
# CHECK2-NEXT: Other:
# CHECK2-NEXT: Section: Undefined
# CHECK2-NEXT: }
# CHECK2-NEXT: Symbol {
# CHECK2-NEXT: Name: qux
# CHECK2-NEXT: Value:
# CHECK2-NEXT: Size:
@ -107,15 +116,6 @@
# CHECK2-NEXT: Other:
# CHECK2-NEXT: Section: .text
# CHECK2-NEXT: }
# CHECK2-NEXT: Symbol {
# CHECK2-NEXT: Name: baz
# CHECK2-NEXT: Value:
# CHECK2-NEXT: Size:
# CHECK2-NEXT: Binding: Global
# CHECK2-NEXT: Type:
# CHECK2-NEXT: Other:
# CHECK2-NEXT: Section: Undefined
# CHECK2-NEXT: }
# CHECK2-NEXT: ]
# CHECK2-NOT: NEEDED