[clangd] Extract scoring/ranking logic, and shave yaks.
Summary:
Code completion scoring was embedded in CodeComplete.cpp, which is bad:
- awkward to test. The mechanisms (extracting info from index/sema) can be
unit-tested well, the policy (scoring) should be quantitatively measured.
Neither was easily possible, and debugging was hard.
The intermediate signal struct makes this easier.
- hard to reuse. This is a bug in workspaceSymbols: it just presents the
results in the index order, which is not sorted in practice, it needs to rank
them!
Also, index implementations care about scoring (both query-dependent and
independent) in order to truncate result lists appropriately.
The main yak shaved here is the build() function that had 3 variants across
unit tests is unified in TestTU.h (rather than adding a 4th variant).
Reviewers: ilya-biryukov
Subscribers: klimek, mgorny, ioeric, MaskRay, jkorous, mgrang, cfe-commits
Differential Revision: https://reviews.llvm.org/D46524
llvm-svn: 332378
2018-05-15 17:43:27 +00:00
|
|
|
//===--- Quality.cpp --------------------------------------------*- C++-*-===//
|
|
|
|
//
|
|
|
|
// The LLVM Compiler Infrastructure
|
|
|
|
//
|
|
|
|
// This file is distributed under the University of Illinois Open Source
|
|
|
|
// License. See LICENSE.TXT for details.
|
|
|
|
//
|
|
|
|
//===---------------------------------------------------------------------===//
|
|
|
|
#include "Quality.h"
|
2018-06-28 16:51:12 +00:00
|
|
|
#include <cmath>
|
2018-06-15 08:58:12 +00:00
|
|
|
#include "URI.h"
|
[clangd] Extract scoring/ranking logic, and shave yaks.
Summary:
Code completion scoring was embedded in CodeComplete.cpp, which is bad:
- awkward to test. The mechanisms (extracting info from index/sema) can be
unit-tested well, the policy (scoring) should be quantitatively measured.
Neither was easily possible, and debugging was hard.
The intermediate signal struct makes this easier.
- hard to reuse. This is a bug in workspaceSymbols: it just presents the
results in the index order, which is not sorted in practice, it needs to rank
them!
Also, index implementations care about scoring (both query-dependent and
independent) in order to truncate result lists appropriately.
The main yak shaved here is the build() function that had 3 variants across
unit tests is unified in TestTU.h (rather than adding a 4th variant).
Reviewers: ilya-biryukov
Subscribers: klimek, mgorny, ioeric, MaskRay, jkorous, mgrang, cfe-commits
Differential Revision: https://reviews.llvm.org/D46524
llvm-svn: 332378
2018-05-15 17:43:27 +00:00
|
|
|
#include "index/Index.h"
|
2018-06-04 14:50:59 +00:00
|
|
|
#include "clang/AST/ASTContext.h"
|
2018-06-08 09:36:34 +00:00
|
|
|
#include "clang/Basic/CharInfo.h"
|
2018-06-06 08:53:36 +00:00
|
|
|
#include "clang/AST/DeclVisitor.h"
|
2018-06-04 14:50:59 +00:00
|
|
|
#include "clang/Basic/SourceManager.h"
|
[clangd] Extract scoring/ranking logic, and shave yaks.
Summary:
Code completion scoring was embedded in CodeComplete.cpp, which is bad:
- awkward to test. The mechanisms (extracting info from index/sema) can be
unit-tested well, the policy (scoring) should be quantitatively measured.
Neither was easily possible, and debugging was hard.
The intermediate signal struct makes this easier.
- hard to reuse. This is a bug in workspaceSymbols: it just presents the
results in the index order, which is not sorted in practice, it needs to rank
them!
Also, index implementations care about scoring (both query-dependent and
independent) in order to truncate result lists appropriately.
The main yak shaved here is the build() function that had 3 variants across
unit tests is unified in TestTU.h (rather than adding a 4th variant).
Reviewers: ilya-biryukov
Subscribers: klimek, mgorny, ioeric, MaskRay, jkorous, mgrang, cfe-commits
Differential Revision: https://reviews.llvm.org/D46524
llvm-svn: 332378
2018-05-15 17:43:27 +00:00
|
|
|
#include "clang/Sema/CodeCompleteConsumer.h"
|
|
|
|
#include "llvm/Support/FormatVariadic.h"
|
|
|
|
#include "llvm/Support/MathExtras.h"
|
|
|
|
#include "llvm/Support/raw_ostream.h"
|
|
|
|
|
|
|
|
namespace clang {
|
|
|
|
namespace clangd {
|
|
|
|
using namespace llvm;
|
2018-06-08 09:36:34 +00:00
|
|
|
static bool IsReserved(StringRef Name) {
|
|
|
|
// FIXME: Should we exclude _Bool and others recognized by the standard?
|
|
|
|
return Name.size() >= 2 && Name[0] == '_' &&
|
|
|
|
(isUppercase(Name[1]) || Name[1] == '_');
|
|
|
|
}
|
[clangd] Extract scoring/ranking logic, and shave yaks.
Summary:
Code completion scoring was embedded in CodeComplete.cpp, which is bad:
- awkward to test. The mechanisms (extracting info from index/sema) can be
unit-tested well, the policy (scoring) should be quantitatively measured.
Neither was easily possible, and debugging was hard.
The intermediate signal struct makes this easier.
- hard to reuse. This is a bug in workspaceSymbols: it just presents the
results in the index order, which is not sorted in practice, it needs to rank
them!
Also, index implementations care about scoring (both query-dependent and
independent) in order to truncate result lists appropriately.
The main yak shaved here is the build() function that had 3 variants across
unit tests is unified in TestTU.h (rather than adding a 4th variant).
Reviewers: ilya-biryukov
Subscribers: klimek, mgorny, ioeric, MaskRay, jkorous, mgrang, cfe-commits
Differential Revision: https://reviews.llvm.org/D46524
llvm-svn: 332378
2018-05-15 17:43:27 +00:00
|
|
|
|
2018-06-04 14:50:59 +00:00
|
|
|
static bool hasDeclInMainFile(const Decl &D) {
|
|
|
|
auto &SourceMgr = D.getASTContext().getSourceManager();
|
|
|
|
for (auto *Redecl : D.redecls()) {
|
|
|
|
auto Loc = SourceMgr.getSpellingLoc(Redecl->getLocation());
|
|
|
|
if (SourceMgr.isWrittenInMainFile(Loc))
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2018-06-06 08:53:36 +00:00
|
|
|
static SymbolQualitySignals::SymbolCategory categorize(const NamedDecl &ND) {
|
|
|
|
class Switch
|
|
|
|
: public ConstDeclVisitor<Switch, SymbolQualitySignals::SymbolCategory> {
|
|
|
|
public:
|
|
|
|
#define MAP(DeclType, Category) \
|
|
|
|
SymbolQualitySignals::SymbolCategory Visit##DeclType(const DeclType *) { \
|
|
|
|
return SymbolQualitySignals::Category; \
|
|
|
|
}
|
|
|
|
MAP(NamespaceDecl, Namespace);
|
|
|
|
MAP(NamespaceAliasDecl, Namespace);
|
|
|
|
MAP(TypeDecl, Type);
|
|
|
|
MAP(TypeAliasTemplateDecl, Type);
|
|
|
|
MAP(ClassTemplateDecl, Type);
|
|
|
|
MAP(ValueDecl, Variable);
|
|
|
|
MAP(VarTemplateDecl, Variable);
|
|
|
|
MAP(FunctionDecl, Function);
|
|
|
|
MAP(FunctionTemplateDecl, Function);
|
|
|
|
MAP(Decl, Unknown);
|
|
|
|
#undef MAP
|
|
|
|
};
|
|
|
|
return Switch().Visit(&ND);
|
|
|
|
}
|
|
|
|
|
2018-06-14 13:42:21 +00:00
|
|
|
static SymbolQualitySignals::SymbolCategory categorize(const CodeCompletionResult &R) {
|
|
|
|
if (R.Declaration)
|
|
|
|
return categorize(*R.Declaration);
|
|
|
|
if (R.Kind == CodeCompletionResult::RK_Macro)
|
|
|
|
return SymbolQualitySignals::Macro;
|
|
|
|
// Everything else is a keyword or a pattern. Patterns are mostly keywords
|
|
|
|
// too, except a few which we recognize by cursor kind.
|
|
|
|
switch (R.CursorKind) {
|
|
|
|
case CXCursor_CXXMethod:
|
|
|
|
return SymbolQualitySignals::Function;
|
|
|
|
case CXCursor_ModuleImportDecl:
|
|
|
|
return SymbolQualitySignals::Namespace;
|
|
|
|
case CXCursor_MacroDefinition:
|
|
|
|
return SymbolQualitySignals::Macro;
|
|
|
|
case CXCursor_TypeRef:
|
|
|
|
return SymbolQualitySignals::Type;
|
|
|
|
case CXCursor_MemberRef:
|
|
|
|
return SymbolQualitySignals::Variable;
|
|
|
|
default:
|
|
|
|
return SymbolQualitySignals::Keyword;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-06-06 08:53:36 +00:00
|
|
|
static SymbolQualitySignals::SymbolCategory
|
|
|
|
categorize(const index::SymbolInfo &D) {
|
|
|
|
switch (D.Kind) {
|
|
|
|
case index::SymbolKind::Namespace:
|
|
|
|
case index::SymbolKind::NamespaceAlias:
|
|
|
|
return SymbolQualitySignals::Namespace;
|
|
|
|
case index::SymbolKind::Macro:
|
|
|
|
return SymbolQualitySignals::Macro;
|
|
|
|
case index::SymbolKind::Enum:
|
|
|
|
case index::SymbolKind::Struct:
|
|
|
|
case index::SymbolKind::Class:
|
|
|
|
case index::SymbolKind::Protocol:
|
|
|
|
case index::SymbolKind::Extension:
|
|
|
|
case index::SymbolKind::Union:
|
|
|
|
case index::SymbolKind::TypeAlias:
|
|
|
|
return SymbolQualitySignals::Type;
|
|
|
|
case index::SymbolKind::Function:
|
|
|
|
case index::SymbolKind::ClassMethod:
|
|
|
|
case index::SymbolKind::InstanceMethod:
|
|
|
|
case index::SymbolKind::StaticMethod:
|
|
|
|
case index::SymbolKind::InstanceProperty:
|
|
|
|
case index::SymbolKind::ClassProperty:
|
|
|
|
case index::SymbolKind::StaticProperty:
|
|
|
|
case index::SymbolKind::Constructor:
|
|
|
|
case index::SymbolKind::Destructor:
|
|
|
|
case index::SymbolKind::ConversionFunction:
|
|
|
|
return SymbolQualitySignals::Function;
|
|
|
|
case index::SymbolKind::Variable:
|
|
|
|
case index::SymbolKind::Field:
|
|
|
|
case index::SymbolKind::EnumConstant:
|
|
|
|
case index::SymbolKind::Parameter:
|
|
|
|
return SymbolQualitySignals::Variable;
|
|
|
|
case index::SymbolKind::Using:
|
|
|
|
case index::SymbolKind::Module:
|
|
|
|
case index::SymbolKind::Unknown:
|
|
|
|
return SymbolQualitySignals::Unknown;
|
|
|
|
}
|
2018-06-06 13:28:49 +00:00
|
|
|
llvm_unreachable("Unknown index::SymbolKind");
|
2018-06-06 08:53:36 +00:00
|
|
|
}
|
|
|
|
|
[clangd] Extract scoring/ranking logic, and shave yaks.
Summary:
Code completion scoring was embedded in CodeComplete.cpp, which is bad:
- awkward to test. The mechanisms (extracting info from index/sema) can be
unit-tested well, the policy (scoring) should be quantitatively measured.
Neither was easily possible, and debugging was hard.
The intermediate signal struct makes this easier.
- hard to reuse. This is a bug in workspaceSymbols: it just presents the
results in the index order, which is not sorted in practice, it needs to rank
them!
Also, index implementations care about scoring (both query-dependent and
independent) in order to truncate result lists appropriately.
The main yak shaved here is the build() function that had 3 variants across
unit tests is unified in TestTU.h (rather than adding a 4th variant).
Reviewers: ilya-biryukov
Subscribers: klimek, mgorny, ioeric, MaskRay, jkorous, mgrang, cfe-commits
Differential Revision: https://reviews.llvm.org/D46524
llvm-svn: 332378
2018-05-15 17:43:27 +00:00
|
|
|
void SymbolQualitySignals::merge(const CodeCompletionResult &SemaCCResult) {
|
|
|
|
if (SemaCCResult.Availability == CXAvailability_Deprecated)
|
|
|
|
Deprecated = true;
|
2018-06-06 08:53:36 +00:00
|
|
|
|
2018-06-14 13:42:21 +00:00
|
|
|
Category = categorize(SemaCCResult);
|
2018-06-08 09:36:34 +00:00
|
|
|
|
|
|
|
if (SemaCCResult.Declaration) {
|
|
|
|
if (auto *ID = SemaCCResult.Declaration->getIdentifier())
|
|
|
|
ReservedName = ReservedName || IsReserved(ID->getName());
|
|
|
|
} else if (SemaCCResult.Kind == CodeCompletionResult::RK_Macro)
|
|
|
|
ReservedName = ReservedName || IsReserved(SemaCCResult.Macro->getName());
|
[clangd] Extract scoring/ranking logic, and shave yaks.
Summary:
Code completion scoring was embedded in CodeComplete.cpp, which is bad:
- awkward to test. The mechanisms (extracting info from index/sema) can be
unit-tested well, the policy (scoring) should be quantitatively measured.
Neither was easily possible, and debugging was hard.
The intermediate signal struct makes this easier.
- hard to reuse. This is a bug in workspaceSymbols: it just presents the
results in the index order, which is not sorted in practice, it needs to rank
them!
Also, index implementations care about scoring (both query-dependent and
independent) in order to truncate result lists appropriately.
The main yak shaved here is the build() function that had 3 variants across
unit tests is unified in TestTU.h (rather than adding a 4th variant).
Reviewers: ilya-biryukov
Subscribers: klimek, mgorny, ioeric, MaskRay, jkorous, mgrang, cfe-commits
Differential Revision: https://reviews.llvm.org/D46524
llvm-svn: 332378
2018-05-15 17:43:27 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
void SymbolQualitySignals::merge(const Symbol &IndexResult) {
|
|
|
|
References = std::max(IndexResult.References, References);
|
2018-06-06 08:53:36 +00:00
|
|
|
Category = categorize(IndexResult.SymInfo);
|
2018-06-08 09:36:34 +00:00
|
|
|
ReservedName = ReservedName || IsReserved(IndexResult.Name);
|
[clangd] Extract scoring/ranking logic, and shave yaks.
Summary:
Code completion scoring was embedded in CodeComplete.cpp, which is bad:
- awkward to test. The mechanisms (extracting info from index/sema) can be
unit-tested well, the policy (scoring) should be quantitatively measured.
Neither was easily possible, and debugging was hard.
The intermediate signal struct makes this easier.
- hard to reuse. This is a bug in workspaceSymbols: it just presents the
results in the index order, which is not sorted in practice, it needs to rank
them!
Also, index implementations care about scoring (both query-dependent and
independent) in order to truncate result lists appropriately.
The main yak shaved here is the build() function that had 3 variants across
unit tests is unified in TestTU.h (rather than adding a 4th variant).
Reviewers: ilya-biryukov
Subscribers: klimek, mgorny, ioeric, MaskRay, jkorous, mgrang, cfe-commits
Differential Revision: https://reviews.llvm.org/D46524
llvm-svn: 332378
2018-05-15 17:43:27 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
float SymbolQualitySignals::evaluate() const {
|
|
|
|
float Score = 1;
|
|
|
|
|
|
|
|
// This avoids a sharp gradient for tail symbols, and also neatly avoids the
|
|
|
|
// question of whether 0 references means a bad symbol or missing data.
|
2018-06-28 16:51:12 +00:00
|
|
|
if (References >= 10)
|
|
|
|
Score *= std::log10(References);
|
[clangd] Extract scoring/ranking logic, and shave yaks.
Summary:
Code completion scoring was embedded in CodeComplete.cpp, which is bad:
- awkward to test. The mechanisms (extracting info from index/sema) can be
unit-tested well, the policy (scoring) should be quantitatively measured.
Neither was easily possible, and debugging was hard.
The intermediate signal struct makes this easier.
- hard to reuse. This is a bug in workspaceSymbols: it just presents the
results in the index order, which is not sorted in practice, it needs to rank
them!
Also, index implementations care about scoring (both query-dependent and
independent) in order to truncate result lists appropriately.
The main yak shaved here is the build() function that had 3 variants across
unit tests is unified in TestTU.h (rather than adding a 4th variant).
Reviewers: ilya-biryukov
Subscribers: klimek, mgorny, ioeric, MaskRay, jkorous, mgrang, cfe-commits
Differential Revision: https://reviews.llvm.org/D46524
llvm-svn: 332378
2018-05-15 17:43:27 +00:00
|
|
|
|
|
|
|
if (Deprecated)
|
2018-05-18 13:18:41 +00:00
|
|
|
Score *= 0.1f;
|
2018-06-08 09:36:34 +00:00
|
|
|
if (ReservedName)
|
|
|
|
Score *= 0.1f;
|
[clangd] Extract scoring/ranking logic, and shave yaks.
Summary:
Code completion scoring was embedded in CodeComplete.cpp, which is bad:
- awkward to test. The mechanisms (extracting info from index/sema) can be
unit-tested well, the policy (scoring) should be quantitatively measured.
Neither was easily possible, and debugging was hard.
The intermediate signal struct makes this easier.
- hard to reuse. This is a bug in workspaceSymbols: it just presents the
results in the index order, which is not sorted in practice, it needs to rank
them!
Also, index implementations care about scoring (both query-dependent and
independent) in order to truncate result lists appropriately.
The main yak shaved here is the build() function that had 3 variants across
unit tests is unified in TestTU.h (rather than adding a 4th variant).
Reviewers: ilya-biryukov
Subscribers: klimek, mgorny, ioeric, MaskRay, jkorous, mgrang, cfe-commits
Differential Revision: https://reviews.llvm.org/D46524
llvm-svn: 332378
2018-05-15 17:43:27 +00:00
|
|
|
|
2018-06-06 08:53:36 +00:00
|
|
|
switch (Category) {
|
2018-06-27 11:43:54 +00:00
|
|
|
case Keyword: // Often relevant, but misses most signals.
|
|
|
|
Score *= 4; // FIXME: important keywords should have specific boosts.
|
2018-06-14 13:42:21 +00:00
|
|
|
break;
|
2018-06-06 08:53:36 +00:00
|
|
|
case Type:
|
|
|
|
case Function:
|
|
|
|
case Variable:
|
2018-06-06 12:48:27 +00:00
|
|
|
Score *= 1.1f;
|
2018-06-06 08:53:36 +00:00
|
|
|
break;
|
|
|
|
case Namespace:
|
2018-06-06 12:48:27 +00:00
|
|
|
Score *= 0.8f;
|
2018-06-06 12:38:37 +00:00
|
|
|
break;
|
2018-06-06 08:53:36 +00:00
|
|
|
case Macro:
|
2018-06-06 12:48:27 +00:00
|
|
|
Score *= 0.2f;
|
2018-06-06 08:53:36 +00:00
|
|
|
break;
|
|
|
|
case Unknown:
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
[clangd] Extract scoring/ranking logic, and shave yaks.
Summary:
Code completion scoring was embedded in CodeComplete.cpp, which is bad:
- awkward to test. The mechanisms (extracting info from index/sema) can be
unit-tested well, the policy (scoring) should be quantitatively measured.
Neither was easily possible, and debugging was hard.
The intermediate signal struct makes this easier.
- hard to reuse. This is a bug in workspaceSymbols: it just presents the
results in the index order, which is not sorted in practice, it needs to rank
them!
Also, index implementations care about scoring (both query-dependent and
independent) in order to truncate result lists appropriately.
The main yak shaved here is the build() function that had 3 variants across
unit tests is unified in TestTU.h (rather than adding a 4th variant).
Reviewers: ilya-biryukov
Subscribers: klimek, mgorny, ioeric, MaskRay, jkorous, mgrang, cfe-commits
Differential Revision: https://reviews.llvm.org/D46524
llvm-svn: 332378
2018-05-15 17:43:27 +00:00
|
|
|
return Score;
|
|
|
|
}
|
|
|
|
|
|
|
|
raw_ostream &operator<<(raw_ostream &OS, const SymbolQualitySignals &S) {
|
|
|
|
OS << formatv("=== Symbol quality: {0}\n", S.evaluate());
|
|
|
|
OS << formatv("\tReferences: {0}\n", S.References);
|
|
|
|
OS << formatv("\tDeprecated: {0}\n", S.Deprecated);
|
2018-06-08 09:36:34 +00:00
|
|
|
OS << formatv("\tReserved name: {0}\n", S.ReservedName);
|
2018-06-06 08:53:36 +00:00
|
|
|
OS << formatv("\tCategory: {0}\n", static_cast<int>(S.Category));
|
[clangd] Extract scoring/ranking logic, and shave yaks.
Summary:
Code completion scoring was embedded in CodeComplete.cpp, which is bad:
- awkward to test. The mechanisms (extracting info from index/sema) can be
unit-tested well, the policy (scoring) should be quantitatively measured.
Neither was easily possible, and debugging was hard.
The intermediate signal struct makes this easier.
- hard to reuse. This is a bug in workspaceSymbols: it just presents the
results in the index order, which is not sorted in practice, it needs to rank
them!
Also, index implementations care about scoring (both query-dependent and
independent) in order to truncate result lists appropriately.
The main yak shaved here is the build() function that had 3 variants across
unit tests is unified in TestTU.h (rather than adding a 4th variant).
Reviewers: ilya-biryukov
Subscribers: klimek, mgorny, ioeric, MaskRay, jkorous, mgrang, cfe-commits
Differential Revision: https://reviews.llvm.org/D46524
llvm-svn: 332378
2018-05-15 17:43:27 +00:00
|
|
|
return OS;
|
|
|
|
}
|
|
|
|
|
2018-06-15 08:58:12 +00:00
|
|
|
/// Calculates a proximity score from \p From and \p To, which are URI strings
|
|
|
|
/// that have the same scheme. This does not parse URI. A URI (sans "<scheme>:")
|
|
|
|
/// is split into chunks by '/' and each chunk is considered a file/directory.
|
|
|
|
/// For example, "uri:///a/b/c" will be treated as /a/b/c
|
|
|
|
static float uriProximity(StringRef From, StringRef To) {
|
|
|
|
auto SchemeSplitFrom = From.split(':');
|
|
|
|
auto SchemeSplitTo = To.split(':');
|
|
|
|
assert((SchemeSplitFrom.first == SchemeSplitTo.first) &&
|
|
|
|
"URIs must have the same scheme in order to compute proximity.");
|
|
|
|
auto Split = [](StringRef URIWithoutScheme) {
|
|
|
|
SmallVector<StringRef, 8> Split;
|
|
|
|
URIWithoutScheme.split(Split, '/', /*MaxSplit=*/-1, /*KeepEmpty=*/false);
|
|
|
|
return Split;
|
|
|
|
};
|
|
|
|
SmallVector<StringRef, 8> Fs = Split(SchemeSplitFrom.second);
|
|
|
|
SmallVector<StringRef, 8> Ts = Split(SchemeSplitTo.second);
|
|
|
|
auto F = Fs.begin(), T = Ts.begin(), FE = Fs.end(), TE = Ts.end();
|
|
|
|
for (; F != FE && T != TE && *F == *T; ++F, ++T) {
|
|
|
|
}
|
|
|
|
// We penalize for traversing up and down from \p From to \p To but penalize
|
|
|
|
// less for traversing down because subprojects are more closely related than
|
|
|
|
// superprojects.
|
|
|
|
int UpDist = FE - F;
|
|
|
|
int DownDist = TE - T;
|
|
|
|
return std::pow(0.7, UpDist + DownDist/2);
|
|
|
|
}
|
|
|
|
|
|
|
|
FileProximityMatcher::FileProximityMatcher(ArrayRef<StringRef> ProximityPaths)
|
|
|
|
: ProximityPaths(ProximityPaths.begin(), ProximityPaths.end()) {}
|
|
|
|
|
|
|
|
float FileProximityMatcher::uriProximity(StringRef SymbolURI) const {
|
|
|
|
float Score = 0;
|
|
|
|
if (!ProximityPaths.empty() && !SymbolURI.empty()) {
|
|
|
|
for (const auto &Path : ProximityPaths)
|
|
|
|
// Only calculate proximity score for two URIs with the same scheme so
|
|
|
|
// that the computation can be purely text-based and thus avoid expensive
|
|
|
|
// URI encoding/decoding.
|
|
|
|
if (auto U = URI::create(Path, SymbolURI.split(':').first)) {
|
|
|
|
Score = std::max(Score, clangd::uriProximity(U->toString(), SymbolURI));
|
|
|
|
} else {
|
|
|
|
llvm::consumeError(U.takeError());
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return Score;
|
|
|
|
}
|
|
|
|
|
|
|
|
llvm::raw_ostream &operator<<(llvm::raw_ostream &OS,
|
|
|
|
const FileProximityMatcher &M) {
|
|
|
|
OS << formatv("File proximity matcher: ");
|
2018-06-21 09:51:28 +00:00
|
|
|
OS << formatv("ProximityPaths[{0}]", llvm::join(M.ProximityPaths.begin(),
|
2018-06-15 08:58:12 +00:00
|
|
|
M.ProximityPaths.end(), ","));
|
|
|
|
return OS;
|
|
|
|
}
|
|
|
|
|
2018-06-05 16:30:25 +00:00
|
|
|
static SymbolRelevanceSignals::AccessibleScope
|
2018-06-27 11:43:54 +00:00
|
|
|
ComputeScope(const NamedDecl *D) {
|
|
|
|
// Injected "Foo" within the class "Foo" has file scope, not class scope.
|
|
|
|
const DeclContext *DC = D->getDeclContext();
|
|
|
|
if (auto *R = dyn_cast_or_null<RecordDecl>(D))
|
|
|
|
if (R->isInjectedClassName())
|
|
|
|
DC = DC->getParent();
|
2018-06-05 18:00:48 +00:00
|
|
|
bool InClass = false;
|
2018-06-27 11:43:54 +00:00
|
|
|
for (; !DC->isFileContext(); DC = DC->getParent()) {
|
2018-06-05 16:30:25 +00:00
|
|
|
if (DC->isFunctionOrMethod())
|
|
|
|
return SymbolRelevanceSignals::FunctionScope;
|
|
|
|
InClass = InClass || DC->isRecord();
|
|
|
|
}
|
|
|
|
if (InClass)
|
|
|
|
return SymbolRelevanceSignals::ClassScope;
|
|
|
|
// This threshold could be tweaked, e.g. to treat module-visible as global.
|
2018-06-27 11:43:54 +00:00
|
|
|
if (D->getLinkageInternal() < ExternalLinkage)
|
2018-06-05 16:30:25 +00:00
|
|
|
return SymbolRelevanceSignals::FileScope;
|
|
|
|
return SymbolRelevanceSignals::GlobalScope;
|
|
|
|
}
|
|
|
|
|
|
|
|
void SymbolRelevanceSignals::merge(const Symbol &IndexResult) {
|
|
|
|
// FIXME: Index results always assumed to be at global scope. If Scope becomes
|
|
|
|
// relevant to non-completion requests, we should recognize class members etc.
|
2018-06-15 08:58:12 +00:00
|
|
|
|
|
|
|
SymbolURI = IndexResult.CanonicalDeclaration.FileURI;
|
2018-06-05 16:30:25 +00:00
|
|
|
}
|
|
|
|
|
[clangd] Extract scoring/ranking logic, and shave yaks.
Summary:
Code completion scoring was embedded in CodeComplete.cpp, which is bad:
- awkward to test. The mechanisms (extracting info from index/sema) can be
unit-tested well, the policy (scoring) should be quantitatively measured.
Neither was easily possible, and debugging was hard.
The intermediate signal struct makes this easier.
- hard to reuse. This is a bug in workspaceSymbols: it just presents the
results in the index order, which is not sorted in practice, it needs to rank
them!
Also, index implementations care about scoring (both query-dependent and
independent) in order to truncate result lists appropriately.
The main yak shaved here is the build() function that had 3 variants across
unit tests is unified in TestTU.h (rather than adding a 4th variant).
Reviewers: ilya-biryukov
Subscribers: klimek, mgorny, ioeric, MaskRay, jkorous, mgrang, cfe-commits
Differential Revision: https://reviews.llvm.org/D46524
llvm-svn: 332378
2018-05-15 17:43:27 +00:00
|
|
|
void SymbolRelevanceSignals::merge(const CodeCompletionResult &SemaCCResult) {
|
|
|
|
if (SemaCCResult.Availability == CXAvailability_NotAvailable ||
|
|
|
|
SemaCCResult.Availability == CXAvailability_NotAccessible)
|
|
|
|
Forbidden = true;
|
2018-06-04 14:50:59 +00:00
|
|
|
|
|
|
|
if (SemaCCResult.Declaration) {
|
2018-06-15 08:58:12 +00:00
|
|
|
// We boost things that have decls in the main file. We give a fixed score
|
|
|
|
// for all other declarations in sema as they are already included in the
|
|
|
|
// translation unit.
|
2018-06-04 14:50:59 +00:00
|
|
|
float DeclProximity =
|
2018-06-15 08:58:12 +00:00
|
|
|
hasDeclInMainFile(*SemaCCResult.Declaration) ? 1.0 : 0.6;
|
|
|
|
SemaProximityScore = std::max(DeclProximity, SemaProximityScore);
|
2018-06-04 14:50:59 +00:00
|
|
|
}
|
2018-06-05 16:30:25 +00:00
|
|
|
|
|
|
|
// Declarations are scoped, others (like macros) are assumed global.
|
2018-06-05 17:58:12 +00:00
|
|
|
if (SemaCCResult.Declaration)
|
2018-06-27 11:43:54 +00:00
|
|
|
Scope = std::min(Scope, ComputeScope(SemaCCResult.Declaration));
|
[clangd] Extract scoring/ranking logic, and shave yaks.
Summary:
Code completion scoring was embedded in CodeComplete.cpp, which is bad:
- awkward to test. The mechanisms (extracting info from index/sema) can be
unit-tested well, the policy (scoring) should be quantitatively measured.
Neither was easily possible, and debugging was hard.
The intermediate signal struct makes this easier.
- hard to reuse. This is a bug in workspaceSymbols: it just presents the
results in the index order, which is not sorted in practice, it needs to rank
them!
Also, index implementations care about scoring (both query-dependent and
independent) in order to truncate result lists appropriately.
The main yak shaved here is the build() function that had 3 variants across
unit tests is unified in TestTU.h (rather than adding a 4th variant).
Reviewers: ilya-biryukov
Subscribers: klimek, mgorny, ioeric, MaskRay, jkorous, mgrang, cfe-commits
Differential Revision: https://reviews.llvm.org/D46524
llvm-svn: 332378
2018-05-15 17:43:27 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
float SymbolRelevanceSignals::evaluate() const {
|
2018-06-05 16:30:25 +00:00
|
|
|
float Score = 1;
|
|
|
|
|
[clangd] Extract scoring/ranking logic, and shave yaks.
Summary:
Code completion scoring was embedded in CodeComplete.cpp, which is bad:
- awkward to test. The mechanisms (extracting info from index/sema) can be
unit-tested well, the policy (scoring) should be quantitatively measured.
Neither was easily possible, and debugging was hard.
The intermediate signal struct makes this easier.
- hard to reuse. This is a bug in workspaceSymbols: it just presents the
results in the index order, which is not sorted in practice, it needs to rank
them!
Also, index implementations care about scoring (both query-dependent and
independent) in order to truncate result lists appropriately.
The main yak shaved here is the build() function that had 3 variants across
unit tests is unified in TestTU.h (rather than adding a 4th variant).
Reviewers: ilya-biryukov
Subscribers: klimek, mgorny, ioeric, MaskRay, jkorous, mgrang, cfe-commits
Differential Revision: https://reviews.llvm.org/D46524
llvm-svn: 332378
2018-05-15 17:43:27 +00:00
|
|
|
if (Forbidden)
|
|
|
|
return 0;
|
2018-06-04 14:50:59 +00:00
|
|
|
|
2018-06-05 16:30:25 +00:00
|
|
|
Score *= NameMatch;
|
|
|
|
|
2018-06-15 08:58:12 +00:00
|
|
|
float IndexProximityScore =
|
|
|
|
FileProximityMatch ? FileProximityMatch->uriProximity(SymbolURI) : 0;
|
2018-06-04 14:50:59 +00:00
|
|
|
// Proximity scores are [0,1] and we translate them into a multiplier in the
|
|
|
|
// range from 1 to 2.
|
2018-06-15 08:58:12 +00:00
|
|
|
Score *= 1 + std::max(IndexProximityScore, SemaProximityScore);
|
2018-06-05 16:30:25 +00:00
|
|
|
|
|
|
|
// Symbols like local variables may only be referenced within their scope.
|
|
|
|
// Conversely if we're in that scope, it's likely we'll reference them.
|
|
|
|
if (Query == CodeComplete) {
|
|
|
|
// The narrower the scope where a symbol is visible, the more likely it is
|
|
|
|
// to be relevant when it is available.
|
|
|
|
switch (Scope) {
|
|
|
|
case GlobalScope:
|
|
|
|
break;
|
|
|
|
case FileScope:
|
|
|
|
Score *= 1.5;
|
2018-06-07 08:16:36 +00:00
|
|
|
break;
|
2018-06-05 16:30:25 +00:00
|
|
|
case ClassScope:
|
|
|
|
Score *= 2;
|
2018-06-07 08:16:36 +00:00
|
|
|
break;
|
2018-06-05 16:30:25 +00:00
|
|
|
case FunctionScope:
|
|
|
|
Score *= 4;
|
2018-06-07 08:16:36 +00:00
|
|
|
break;
|
2018-06-05 16:30:25 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-06-04 14:50:59 +00:00
|
|
|
return Score;
|
[clangd] Extract scoring/ranking logic, and shave yaks.
Summary:
Code completion scoring was embedded in CodeComplete.cpp, which is bad:
- awkward to test. The mechanisms (extracting info from index/sema) can be
unit-tested well, the policy (scoring) should be quantitatively measured.
Neither was easily possible, and debugging was hard.
The intermediate signal struct makes this easier.
- hard to reuse. This is a bug in workspaceSymbols: it just presents the
results in the index order, which is not sorted in practice, it needs to rank
them!
Also, index implementations care about scoring (both query-dependent and
independent) in order to truncate result lists appropriately.
The main yak shaved here is the build() function that had 3 variants across
unit tests is unified in TestTU.h (rather than adding a 4th variant).
Reviewers: ilya-biryukov
Subscribers: klimek, mgorny, ioeric, MaskRay, jkorous, mgrang, cfe-commits
Differential Revision: https://reviews.llvm.org/D46524
llvm-svn: 332378
2018-05-15 17:43:27 +00:00
|
|
|
}
|
2018-06-15 08:58:12 +00:00
|
|
|
|
[clangd] Extract scoring/ranking logic, and shave yaks.
Summary:
Code completion scoring was embedded in CodeComplete.cpp, which is bad:
- awkward to test. The mechanisms (extracting info from index/sema) can be
unit-tested well, the policy (scoring) should be quantitatively measured.
Neither was easily possible, and debugging was hard.
The intermediate signal struct makes this easier.
- hard to reuse. This is a bug in workspaceSymbols: it just presents the
results in the index order, which is not sorted in practice, it needs to rank
them!
Also, index implementations care about scoring (both query-dependent and
independent) in order to truncate result lists appropriately.
The main yak shaved here is the build() function that had 3 variants across
unit tests is unified in TestTU.h (rather than adding a 4th variant).
Reviewers: ilya-biryukov
Subscribers: klimek, mgorny, ioeric, MaskRay, jkorous, mgrang, cfe-commits
Differential Revision: https://reviews.llvm.org/D46524
llvm-svn: 332378
2018-05-15 17:43:27 +00:00
|
|
|
raw_ostream &operator<<(raw_ostream &OS, const SymbolRelevanceSignals &S) {
|
|
|
|
OS << formatv("=== Symbol relevance: {0}\n", S.evaluate());
|
|
|
|
OS << formatv("\tName match: {0}\n", S.NameMatch);
|
|
|
|
OS << formatv("\tForbidden: {0}\n", S.Forbidden);
|
2018-06-15 08:58:12 +00:00
|
|
|
OS << formatv("\tSymbol URI: {0}\n", S.SymbolURI);
|
|
|
|
if (S.FileProximityMatch) {
|
2018-06-21 09:51:28 +00:00
|
|
|
OS << "\tIndex proximity: "
|
|
|
|
<< S.FileProximityMatch->uriProximity(S.SymbolURI) << " ("
|
|
|
|
<< *S.FileProximityMatch << ")\n";
|
2018-06-15 08:58:12 +00:00
|
|
|
}
|
|
|
|
OS << formatv("\tSema proximity: {0}\n", S.SemaProximityScore);
|
2018-06-05 17:58:12 +00:00
|
|
|
OS << formatv("\tQuery type: {0}\n", static_cast<int>(S.Query));
|
|
|
|
OS << formatv("\tScope: {0}\n", static_cast<int>(S.Scope));
|
[clangd] Extract scoring/ranking logic, and shave yaks.
Summary:
Code completion scoring was embedded in CodeComplete.cpp, which is bad:
- awkward to test. The mechanisms (extracting info from index/sema) can be
unit-tested well, the policy (scoring) should be quantitatively measured.
Neither was easily possible, and debugging was hard.
The intermediate signal struct makes this easier.
- hard to reuse. This is a bug in workspaceSymbols: it just presents the
results in the index order, which is not sorted in practice, it needs to rank
them!
Also, index implementations care about scoring (both query-dependent and
independent) in order to truncate result lists appropriately.
The main yak shaved here is the build() function that had 3 variants across
unit tests is unified in TestTU.h (rather than adding a 4th variant).
Reviewers: ilya-biryukov
Subscribers: klimek, mgorny, ioeric, MaskRay, jkorous, mgrang, cfe-commits
Differential Revision: https://reviews.llvm.org/D46524
llvm-svn: 332378
2018-05-15 17:43:27 +00:00
|
|
|
return OS;
|
|
|
|
}
|
|
|
|
|
|
|
|
float evaluateSymbolAndRelevance(float SymbolQuality, float SymbolRelevance) {
|
|
|
|
return SymbolQuality * SymbolRelevance;
|
|
|
|
}
|
|
|
|
|
|
|
|
// Produces an integer that sorts in the same order as F.
|
|
|
|
// That is: a < b <==> encodeFloat(a) < encodeFloat(b).
|
|
|
|
static uint32_t encodeFloat(float F) {
|
|
|
|
static_assert(std::numeric_limits<float>::is_iec559, "");
|
|
|
|
constexpr uint32_t TopBit = ~(~uint32_t{0} >> 1);
|
|
|
|
|
|
|
|
// Get the bits of the float. Endianness is the same as for integers.
|
|
|
|
uint32_t U = FloatToBits(F);
|
|
|
|
// IEEE 754 floats compare like sign-magnitude integers.
|
|
|
|
if (U & TopBit) // Negative float.
|
|
|
|
return 0 - U; // Map onto the low half of integers, order reversed.
|
|
|
|
return U + TopBit; // Positive floats map onto the high half of integers.
|
|
|
|
}
|
|
|
|
|
|
|
|
std::string sortText(float Score, llvm::StringRef Name) {
|
|
|
|
// We convert -Score to an integer, and hex-encode for readability.
|
|
|
|
// Example: [0.5, "foo"] -> "41000000foo"
|
|
|
|
std::string S;
|
|
|
|
llvm::raw_string_ostream OS(S);
|
|
|
|
write_hex(OS, encodeFloat(-Score), llvm::HexPrintStyle::Lower,
|
|
|
|
/*Width=*/2 * sizeof(Score));
|
|
|
|
OS << Name;
|
|
|
|
OS.flush();
|
|
|
|
return S;
|
|
|
|
}
|
|
|
|
|
|
|
|
} // namespace clangd
|
|
|
|
} // namespace clang
|