Make it possible to disable building the decision forest ranking
model for clangd. To unbreak build of Clangd on PPC32 in gentoo, see
https://bugs.gentoo.org/829602
Based on D138520.
Reviewed By: sammccall
Differential Revision: https://reviews.llvm.org/D139107
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
I went over the output of the following mess of a command:
`(ulimit -m 2000000; ulimit -v 2000000; git ls-files -z | parallel --xargs -0 cat | aspell list --mode=none --ignore-case | grep -E '^[A-Za-z][a-z]*$' | sort | uniq -c | sort -n | grep -vE '.{25}' | aspell pipe -W3 | grep : | cut -d' ' -f2 | less)`
and proceeded to spend a few days looking at it to find probable typos
and fixed a few hundred of them in all of the llvm project (note, the
ones I found are not anywhere near all of them, but it seems like a
good start).
Reviewed By: kadircet
Differential Revision: https://reviews.llvm.org/D130826
Add support for concepts and requires expression in the clang index.
Genarate USRs for concepts.
Also change how `RecursiveASTVisitor` handles return type requirement in
requires expressions. The new code unpacks the synthetic template parameter
list used for storing the actual expression. This simplifies
implementation of the indexing. No code seems to depend on the original
traversal anyway and the synthesized template parameter list is easily
accessible from inside the requires expression if needed.
Add tests in the clangd codebase.
Fixes https://github.com/clangd/clangd/issues/1103.
Reviewed By: sammccall
Differential Revision: https://reviews.llvm.org/D124441
Main use of these is in the standard library, where they generally clutter up
the index.
Certain macros are also common, we don't touch indexing of macros in this patch.
Differential Revision: https://reviews.llvm.org/D115301
CodeCompletionContext::Kind has 36 Kinds. The completion model currently
only handles categorical features of 32 cardinality.
Changing the datatype to uint64_t will solve the problem.
This reverts commit 438b5bb05a.
This makes code completion use a Decision Forest based ranking algorithm to rank
completion candidates. [Esitmated 6% accuracy boost]. This was
previously hidden behind the flag --ranking-model=decision_forest. This
patch makes it the default ranking algorithm.
Note: this is a generic model, not specialized for any particular
project. clangd does not collect or upload data to train code completion.
Also treat Keywords separately as they are not recorded by the training set generator.
Differential Revision: https://reviews.llvm.org/D96353
On second thought, this can't properly be reused for highlighting.
Consider this example, which Quality wants to consider function-scope,
but highlighting must consider class-scope:
void foo() {
class X {
int ^y;
};
}
This prepares for reuse from the semantic highlighting code.
There's a bit of yak-shaving here:
- when the enum is moved into the clangd namespace, promote it to a
scoped enum. This means teaching the decision forest infrastructure
to deal with scoped enums.
- AccessibleScope isn't quite the right name: e.g. public class members
are treated as accessible, but still have class scope. So rename to
SymbolScope.
- Rename some QualitySignals members to avoid name conflicts.
(the string) SymbolScope -> Scope
(the enum) Scope -> ScopeKind
This patch only introduces new signals but does not use their value
in scoring a CC candidate. Usage of these signals in CC ranking in both
heiristics and ML model will be introduced in later patches.
Differential Revision: https://reviews.llvm.org/D94473
A better sampling strategy was used to generate the dataset for this
model.
New signals introduced in this model:
- NumNameInContext: Number of words in the context that matches the name
of the candidate.
- FractionNameInContext: Fraction of the words in context matching the
name of the candidate.
We remove the signal `IsForbidden` from the model and down rank
forbidden signals aggresively.
Differential Revision: https://reviews.llvm.org/D94697
With every incremental change, one needs to check-in new model upstream.
This also significantly increases the size of the git repo with every
new model.
Testing and comparing the old and previous model is also not possible as
we run only a single model at any point.
One solution is to have a "staging" decision forest which can be
injected into clangd without pushing it to upstream. Compare the
performance of the staging model with the live model. After a couple of
enhancements have been done to staging model, we can then replace the
live model upstream with the staging model. This reduces upstream churn
and also allows us to compare models with current baseline model.
This is done by having a callback in CodeCompleteOptions which is called
only when we want to use a decision forest ranking model. This allows us
to inject different completion model internally.
Differential Revision: https://reviews.llvm.org/D90014
Since we have 2 scoring functions (heuristics and decision forest),
renaming the existing evaluate() function to be more descriptive of the
Heuristics being evaluated in it.
Differential Revision: https://reviews.llvm.org/D88431
By default clangd will score a code completion item using heuristics model.
Scoring can be done by Decision Forest model by passing `--ranking_model=decision_forest` to
clangd.
Features omitted from the model:
- `NameMatch` is excluded because the final score must be multiplicative in `NameMatch` to allow rescoring by the editor.
- `NeedsFixIts` is excluded because the generating dataset that needs 'fixits' is non-trivial.
There are multiple ways (heuristics) to combine the above two features with the prediction of the DF:
- `NeedsFixIts` is used as is with a penalty of `0.5`.
Various alternatives of combining NameMatch `N` and Decision forest Prediction `P`
- N * scale(P, 0, 1): Linearly scale the output of model to range [0, 1]
- N * a^P:
- More natural: Prediction of each Decision Tree can be considered as a multiplicative boost (like NameMatch)
- Ordering is independent of the absolute value of P. Order of two items is proportional to `a^{difference in model prediction score}`. Higher `a` gives higher weightage to model output as compared to NameMatch score.
Baseline MRR = 0.619
MRR for various combinations:
N * P = 0.6346, advantage%=2.5768
N * 1.1^P = 0.6600, advantage%=6.6853
N * **1.2**^P = 0.6669, advantage%=**7.8005**
N * **1.3**^P = 0.6668, advantage%=**7.7795**
N * **1.4**^P = 0.6659, advantage%=**7.6270**
N * 1.5^P = 0.6646, advantage%=7.4200
N * 1.6^P = 0.6636, advantage%=7.2671
N * 1.7^P = 0.6629, advantage%=7.1450
N * 2^P = 0.6612, advantage%=6.8673
N * 2.5^P = 0.6598, advantage%=6.6491
N * 3^P = 0.6590, advantage%=6.5242
N * scaled[0, 1] = 0.6465, advantage%=4.5054
Differential Revision: https://reviews.llvm.org/D88281
Current implementation of heuristic-based scoring function also contains
computation of derived signals (e.g. whether name contains a word from
context, computing file distances, scope distances.)
This is an attempt to separate out the logic for computation of derived
signals from the scoring function.
This will allow us to have a clean API for scoring functions that will
take only concrete code completion signals as input.
Differential Revision: https://reviews.llvm.org/D88146
Summary:
Currently template parameters has symbolkind `Unknown`. This patch
introduces a new kind `TemplateParm` for templatetemplate, templatetype and
nontypetemplate parameters.
Also adds tests in clangd hover feature.
Reviewers: sammccall
Subscribers: kristof.beyls, ilya-biryukov, jkorous, arphaman, usaxena95, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D73696
Summary:
This is a fairly ugly hack - we back off several features for any variable
whose type isn't deduced, to avoid computing/caching linkage.
Better suggestions welcome.
Fixes https://github.com/clangd/clangd/issues/274
Reviewers: kadircet, kbobyrev
Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, usaxena95, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D73960
ContainsActiveParameter is not used anywhere, set incorrectly (see the
removed FIXME) and has no unit tests.
Removing it to simplify the code.
llvm-svn: 362686
Summary:
The hope is this will catch a few patterns with repetition:
SomeClass* S = ^SomeClass::Create()
int getFrobnicator() { return ^frobnicator_; }
// discard the factory, it's no longer valid.
^MyFactory.reset();
Without triggering antipatterns too often:
return Point(x.first, x.^second);
I'm going to gather some data on whether this turns out to be a win overall.
Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, jfb, kadircet, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D61537
llvm-svn: 360030
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
The new guideline is to qualify with 'llvm::' explicitly both in
'.h' and '.cpp' files. This simplifies moving the code between
header and source files and is easier to keep consistent.
llvm-svn: 350531
Standardize on the most common namespace setup in our *.cpp files:
using namespace llvm;
namespace clang {
namespace clangd {
void foo(StringRef) { ... }
And remove redundant llvm:: qualifiers. (Except for cases like
make_unique where this causes problems with std:: and ADL).
This choice is pretty arbitrary, but some broad consistency is nice.
This is going to conflict with everything. Sorry :-/
Squash the other configurations:
A)
using namespace llvm;
using namespace clang;
using namespace clangd;
void clangd::foo(StringRef);
This is in some of the older files. (It prevents accidentally defining a
new function instead of one in the header file, for what that's worth).
B)
namespace clang {
namespace clangd {
void foo(llvm::StringRef) { ... }
This is fine, but in practice the using directive often gets added over time.
C)
namespace clang {
namespace clangd {
using namespace llvm; // inside the namespace
This was pretty common, but is a bit misleading: name lookup preferrs
clang::clangd::foo > clang::foo > llvm:: foo (no matter where the using
directive is).
llvm-svn: 344850
Summary:
These are often not expected to be used directly e.g.
```
TEST_F(Fixture, X) {
^ // "Fixture_X_Test" expanded in the macro should be down ranked.
}
```
Only doing this for sema for now, as such symbols are mostly coming from sema
e.g. gtest macros expanded in the main file. We could also add a similar field
for the index symbol.
Reviewers: sammccall
Reviewed By: sammccall
Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D53374
llvm-svn: 344736
Summary:
This should make all-scope completion more usable. Scope proximity for
indexes will be added in followup patch.
Reviewers: sammccall
Reviewed By: sammccall
Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D53131
llvm-svn: 344688