[analyzer][ctu] Fix wrong 'multiple definitions' errors caused by space characters in lookup names when parsing the ctu index file

This error was found when analyzing MySQL with CTU enabled.

When there are space characters in the lookup name, the current
delimiter searching strategy will make the file path wrongly parsed.
And when two lookup names have the same prefix before their first space
characters, a 'multiple definitions' error will be wrongly reported.

e.g. The lookup names for the two lambda exprs in the test case are
`c:@S@G@F@G#@Sa@F@operator int (*)(char)#1` and
`c:@S@G@F@G#@Sa@F@operator bool (*)(char)#1` respectively. And their
prefixes are both `c:@S@G@F@G#@Sa@F@operator` when using the first space
character as the delimiter.

Solving the problem by adding a length for the lookup name, making the
index items in the format of `<USR-Length>:<USR File> <Path>`.

---

In the test case of this patch, we found that it will trigger a "triple
mismatch" warning when using `clang -cc1` to analyze the source file
with CTU using the on-demand-parsing strategy in Darwin systems. And
this problem is also encountered in D75665, which is the patch
introducing the on-demand parsing strategy.
We temporarily bypass this problem by using the loading-ast-file
strategy.

Refer to the [discourse topic](https://discourse.llvm.org/t/60762) for
more details.

Differential Revision: https://reviews.llvm.org/D102669
This commit is contained in:
Ella Ma 2021-05-18 14:22:46 +08:00
parent e609417cdc
commit 9f90254286
12 changed files with 174 additions and 75 deletions

View File

@ -81,12 +81,12 @@ of `foo.cpp`:
$
The next step is to create a CTU index file which holds the `USR` name and location of external definitions in the
source files:
source files in format `<USR-Length>:<USR> <File-Path>`:
.. code-block:: bash
$ clang-extdef-mapping -p . foo.cpp
c:@F@foo# /path/to/your/project/foo.cpp
9:c:@F@foo# /path/to/your/project/foo.cpp
$ clang-extdef-mapping -p . foo.cpp > externalDefMap.txt
We have to modify `externalDefMap.txt` to contain the name of the `.ast` files instead of the source files:
@ -278,12 +278,12 @@ The `invocation list`:
We'd like to analyze `main.cpp` and discover the division by zero bug.
As we are using On-demand mode, we only need to create a CTU index file which holds the `USR` name and location of
external definitions in the source files:
external definitions in the source files in format `<USR-Length>:<USR> <File-Path>`:
.. code-block:: bash
$ clang-extdef-mapping -p . foo.cpp
c:@F@foo# /path/to/your/project/foo.cpp
9:c:@F@foo# /path/to/your/project/foo.cpp
$ clang-extdef-mapping -p . foo.cpp > externalDefMap.txt
Now everything is available for the CTU analysis.

View File

@ -12,8 +12,8 @@ def err_ctu_error_opening : Error<
"error opening '%0': required by the CrossTU functionality">;
def err_extdefmap_parsing : Error<
"error parsing index file: '%0' line: %1 'UniqueID filename' format "
"expected">;
"error parsing index file: '%0' line: %1 '<USR-Length>:<USR> <File-Path>' "
"format expected">;
def err_multiple_def_index : Error<
"multiple definitions are found for the same key in index ">;

View File

@ -149,6 +149,35 @@ std::error_code IndexError::convertToErrorCode() const {
return std::error_code(static_cast<int>(Code), *Category);
}
/// Parse one line of the input CTU index file.
///
/// @param[in] LineRef The input CTU index item in format
/// "<USR-Length>:<USR> <File-Path>".
/// @param[out] LookupName The lookup name in format "<USR-Length>:<USR>".
/// @param[out] FilePath The file path "<File-Path>".
static bool parseCrossTUIndexItem(StringRef LineRef, StringRef &LookupName,
StringRef &FilePath) {
// `LineRef` is "<USR-Length>:<USR> <File-Path>" now.
size_t USRLength = 0;
if (LineRef.consumeInteger(10, USRLength))
return false;
assert(USRLength && "USRLength should be greater than zero.");
if (!LineRef.consume_front(":"))
return false;
// `LineRef` is now just "<USR> <File-Path>".
// Check LookupName length out of bound and incorrect delimiter.
if (USRLength >= LineRef.size() || ' ' != LineRef[USRLength])
return false;
LookupName = LineRef.substr(0, USRLength);
FilePath = LineRef.substr(USRLength + 1);
return true;
}
llvm::Expected<llvm::StringMap<std::string>>
parseCrossTUIndex(StringRef IndexPath) {
std::ifstream ExternalMapFile{std::string(IndexPath)};
@ -160,24 +189,23 @@ parseCrossTUIndex(StringRef IndexPath) {
std::string Line;
unsigned LineNo = 1;
while (std::getline(ExternalMapFile, Line)) {
StringRef LineRef{Line};
const size_t Delimiter = LineRef.find(' ');
if (Delimiter > 0 && Delimiter != std::string::npos) {
StringRef LookupName = LineRef.substr(0, Delimiter);
// Store paths with posix-style directory separator.
SmallString<32> FilePath(LineRef.substr(Delimiter + 1));
llvm::sys::path::native(FilePath, llvm::sys::path::Style::posix);
bool InsertionOccured;
std::tie(std::ignore, InsertionOccured) =
Result.try_emplace(LookupName, FilePath.begin(), FilePath.end());
if (!InsertionOccured)
return llvm::make_error<IndexError>(
index_error_code::multiple_definitions, IndexPath.str(), LineNo);
} else
// Split lookup name and file path
StringRef LookupName, FilePathInIndex;
if (!parseCrossTUIndexItem(Line, LookupName, FilePathInIndex))
return llvm::make_error<IndexError>(
index_error_code::invalid_index_format, IndexPath.str(), LineNo);
// Store paths with posix-style directory separator.
SmallString<32> FilePath(FilePathInIndex);
llvm::sys::path::native(FilePath, llvm::sys::path::Style::posix);
bool InsertionOccured;
std::tie(std::ignore, InsertionOccured) =
Result.try_emplace(LookupName, FilePath.begin(), FilePath.end());
if (!InsertionOccured)
return llvm::make_error<IndexError>(
index_error_code::multiple_definitions, IndexPath.str(), LineNo);
++LineNo;
}
return Result;
@ -187,7 +215,8 @@ std::string
createCrossTUIndexString(const llvm::StringMap<std::string> &Index) {
std::ostringstream Result;
for (const auto &E : Index)
Result << E.getKey().str() << " " << E.getValue() << '\n';
Result << E.getKey().size() << ':' << E.getKey().str() << ' '
<< E.getValue() << '\n';
return Result.str();
}
@ -428,7 +457,7 @@ CrossTranslationUnitContext::ASTUnitStorage::getASTUnitForFunction(
ensureCTUIndexLoaded(CrossTUDir, IndexName))
return std::move(IndexLoadError);
// Check if there is and entry in the index for the function.
// Check if there is an entry in the index for the function.
if (!NameFileMap.count(FunctionName)) {
++NumNotInOtherTU;
return llvm::make_error<IndexError>(index_error_code::missing_definition);

View File

@ -1 +1 @@
c:@F@testStaticImplicit ctu-import.c.ast
23:c:@F@testStaticImplicit ctu-import.c.ast

View File

@ -0,0 +1,17 @@
void f(void (*)());
void f(void (*)(int));
struct G {
G() {
// multiple definitions are found for the same key in index
f([]() -> void {}); // USR: c:@S@G@F@G#@Sa@F@operator void (*)()#1
f([](int) -> void {}); // USR: c:@S@G@F@G#@Sa@F@operator void (*)(int)#1
// As both lambda exprs have the same prefix, if the CTU index parser uses
// the first space character as the delimiter between USR and file path, a
// "multiple definitions are found for the same key in index" error will
// be reported.
}
};
void importee() {}

View File

@ -1,7 +1,7 @@
c:@F@inlineAsm ctu-other.c.ast
c:@F@g ctu-other.c.ast
c:@F@f ctu-other.c.ast
c:@F@enumCheck ctu-other.c.ast
c:@F@identImplicit ctu-other.c.ast
c:@F@structInProto ctu-other.c.ast
c:@F@switchWithoutCases ctu-other.c.ast
14:c:@F@inlineAsm ctu-other.c.ast
6:c:@F@g ctu-other.c.ast
6:c:@F@f ctu-other.c.ast
14:c:@F@enumCheck ctu-other.c.ast
18:c:@F@identImplicit ctu-other.c.ast
18:c:@F@structInProto ctu-other.c.ast
23:c:@F@switchWithoutCases ctu-other.c.ast

View File

@ -1,30 +1,30 @@
c:@N@chns@F@chf1#I# ctu-other.cpp.ast
c:@N@myns@N@embed_ns@F@fens#I# ctu-other.cpp.ast
c:@F@g#I# ctu-other.cpp.ast
c:@S@mycls@F@fscl#I#S ctu-other.cpp.ast
c:@S@mycls@F@fcl#I# ctu-other.cpp.ast
c:@S@mycls@F@fvcl#I# ctu-other.cpp.ast
c:@N@myns@S@embed_cls@F@fecl#I# ctu-other.cpp.ast
c:@S@mycls@S@embed_cls2@F@fecl2#I# ctu-other.cpp.ast
c:@S@derived@F@fvcl#I# ctu-other.cpp.ast
c:@F@f#I# ctu-other.cpp.ast
c:@N@myns@F@fns#I# ctu-other.cpp.ast
c:@F@h#I# ctu-other.cpp.ast
c:@F@h_chain#I# ctu-chain.cpp.ast
c:@N@chns@S@chcls@F@chf4#I# ctu-chain.cpp.ast
c:@N@chns@F@chf2#I# ctu-chain.cpp.ast
c:@F@fun_using_anon_struct#I# ctu-other.cpp.ast
c:@F@other_macro_diag#I# ctu-other.cpp.ast
c:@extInt ctu-other.cpp.ast
c:@N@intns@extInt ctu-other.cpp.ast
c:@extS ctu-other.cpp.ast
c:@S@A@a ctu-other.cpp.ast
c:@extSC ctu-other.cpp.ast
c:@S@ST@sc ctu-other.cpp.ast
c:@extSCN ctu-other.cpp.ast
c:@extSubSCN ctu-other.cpp.ast
c:@extSCC ctu-other.cpp.ast
c:@extU ctu-other.cpp.ast
c:@S@TestAnonUnionUSR@Test ctu-other.cpp.ast
c:@F@testImportOfIncompleteDefaultParmDuringImport#I# ctu-other.cpp.ast
c:@F@testImportOfDelegateConstructor#I# ctu-other.cpp.ast
19:c:@N@chns@F@chf1#I# ctu-other.cpp.ast
30:c:@N@myns@N@embed_ns@F@fens#I# ctu-other.cpp.ast
9:c:@F@g#I# ctu-other.cpp.ast
21:c:@S@mycls@F@fscl#I#S ctu-other.cpp.ast
19:c:@S@mycls@F@fcl#I# ctu-other.cpp.ast
20:c:@S@mycls@F@fvcl#I# ctu-other.cpp.ast
31:c:@N@myns@S@embed_cls@F@fecl#I# ctu-other.cpp.ast
34:c:@S@mycls@S@embed_cls2@F@fecl2#I# ctu-other.cpp.ast
22:c:@S@derived@F@fvcl#I# ctu-other.cpp.ast
9:c:@F@f#I# ctu-other.cpp.ast
18:c:@N@myns@F@fns#I# ctu-other.cpp.ast
9:c:@F@h#I# ctu-other.cpp.ast
15:c:@F@h_chain#I# ctu-chain.cpp.ast
27:c:@N@chns@S@chcls@F@chf4#I# ctu-chain.cpp.ast
19:c:@N@chns@F@chf2#I# ctu-chain.cpp.ast
29:c:@F@fun_using_anon_struct#I# ctu-other.cpp.ast
24:c:@F@other_macro_diag#I# ctu-other.cpp.ast
9:c:@extInt ctu-other.cpp.ast
17:c:@N@intns@extInt ctu-other.cpp.ast
7:c:@extS ctu-other.cpp.ast
8:c:@S@A@a ctu-other.cpp.ast
8:c:@extSC ctu-other.cpp.ast
10:c:@S@ST@sc ctu-other.cpp.ast
9:c:@extSCN ctu-other.cpp.ast
12:c:@extSubSCN ctu-other.cpp.ast
9:c:@extSCC ctu-other.cpp.ast
7:c:@extU ctu-other.cpp.ast
26:c:@S@TestAnonUnionUSR@Test ctu-other.cpp.ast
53:c:@F@testImportOfIncompleteDefaultParmDuringImport#I# ctu-other.cpp.ast
39:c:@F@testImportOfDelegateConstructor#I# ctu-other.cpp.ast

View File

@ -1,4 +1,4 @@
c:@F@F1 plist-macros-ctu.c.ast
c:@F@F2 plist-macros-ctu.c.ast
c:@F@F3 plist-macros-ctu.c.ast
c:@F@F_H plist-macros-ctu.c.ast
7:c:@F@F1 plist-macros-ctu.c.ast
7:c:@F@F2 plist-macros-ctu.c.ast
7:c:@F@F3 plist-macros-ctu.c.ast
8:c:@F@F_H plist-macros-ctu.c.ast

View File

@ -4,7 +4,7 @@
// RUN: %clang_cc1 -std=c++14 -triple x86_64-pc-linux-gnu \
// RUN: -emit-pch -o %t/ctudir/ctu-inherited-default-ctor-other.cpp.ast \
// RUN: %S/Inputs/ctu-inherited-default-ctor-other.cpp
// RUN: echo "c:@N@clang@S@DeclContextLookupResult@SingleElementDummyList ctu-inherited-default-ctor-other.cpp.ast" \
// RUN: echo "59:c:@N@clang@S@DeclContextLookupResult@SingleElementDummyList ctu-inherited-default-ctor-other.cpp.ast" \
// RUN: > %t/ctudir/externalDefMap.txt
//
// RUN: %clang_analyze_cc1 -std=c++14 -triple x86_64-pc-linux-gnu \

View File

@ -0,0 +1,41 @@
// RUN: rm -rf %t
// RUN: mkdir %t
// RUN: echo '41:c:@S@G@F@G#@Sa@F@operator void (*)(int)#1 %/t/importee.ast' >> %t/externalDefMap.txt
// RUN: echo '38:c:@S@G@F@G#@Sa@F@operator void (*)()#1 %/t/importee.ast' >> %t/externalDefMap.txt
// RUN: echo '14:c:@F@importee# %/t/importee.ast' >> %t/externalDefMap.txt
// RUN: %clang_cc1 -emit-pch %/S/Inputs/ctu-lookup-name-with-space.cpp -o %t/importee.ast
// RUN: cd %t
// RUN: %clang_cc1 -fsyntax-only -analyze \
// RUN: -analyzer-checker=core \
// RUN: -analyzer-config experimental-enable-naive-ctu-analysis=true \
// RUN: -analyzer-config ctu-dir=. \
// RUN: -analyzer-config display-ctu-progress=true \
// RUN: -verify %s 2>&1 | FileCheck %s
// CHECK: CTU loaded AST file
// FIXME: In this test case, we cannot use the on-demand-parsing approach to
// load the external TU.
//
// In the Darwin system, the target triple is determined by the driver,
// rather than using the default one like other systems. However, when
// using bare `clang -cc1`, the adjustment is not done, which cannot
// match the one loaded with on-demand-parsing (adjusted triple).
// We bypass this problem by loading AST files, whose target triple is
// also unadjusted when generated via `clang -cc1 -emit-pch`.
//
// Refer to: https://discourse.llvm.org/t/60762
//
// This is also the reason why the test case of D75665 (introducing
// the on-demand-parsing feature) is enabled only on Linux.
void importee();
void trigger() {
// Call an external function to trigger the parsing process of CTU index.
// Refer to file Inputs/ctu-lookup-name-with-space.cpp for more details.
importee(); // expected-no-diagnostics
}

View File

@ -3,10 +3,10 @@
int f(int) {
return 0;
}
// CHECK-DAG: c:@F@f#I#
// CHECK-DAG: 9:c:@F@f#I#
extern const int x = 5;
// CHECK-DAG: c:@x
// CHECK-DAG: 4:c:@x
// Non-const variables should not be collected.
int y = 5;
@ -18,29 +18,29 @@ struct S {
int a;
};
extern S const s = {.a = 2};
// CHECK-DAG: c:@s
// CHECK-DAG: 4:c:@s
struct SF {
const int a;
};
SF sf = {.a = 2};
// CHECK-DAG: c:@sf
// CHECK-DAG: 5:c:@sf
struct SStatic {
static const int a = 4;
};
const int SStatic::a;
// CHECK-DAG: c:@S@SStatic@a
// CHECK-DAG: 14:c:@S@SStatic@a
extern int const arr[5] = { 0, 1 };
// CHECK-DAG: c:@arr
// CHECK-DAG: 6:c:@arr
union U {
const int a;
const unsigned int b;
};
U u = {.a = 6};
// CHECK-DAG: c:@u
// CHECK-DAG: 4:c:@u
// No USR can be generated for this.
// Check for no crash in this case.
@ -48,3 +48,15 @@ static union {
float uf;
const int ui;
};
void f(int (*)(char));
void f(bool (*)(char));
struct G {
G() {
f([](char) -> int { return 42; });
// CHECK-DAG: 41:c:@S@G@F@G#@Sa@F@operator int (*)(char)#1
f([](char) -> bool { return true; });
// CHECK-DAG: 42:c:@S@G@F@G#@Sa@F@operator bool (*)(char)#1
}
};

View File

@ -61,7 +61,7 @@ public:
ASSERT_FALSE(llvm::sys::fs::createTemporaryFile("index", "txt", IndexFD,
IndexFileName));
llvm::ToolOutputFile IndexFile(IndexFileName, IndexFD);
IndexFile.os() << "c:@F@f#I# " << ASTFileName << "\n";
IndexFile.os() << "9:c:@F@f#I# " << ASTFileName << "\n";
IndexFile.os().flush();
EXPECT_TRUE(llvm::sys::fs::exists(IndexFileName));