27 Commits

Author SHA1 Message Date
Chuanqi Xu
66351a501e [Serialization] Record whether the ODR is skipped (#82302)
Close https://github.com/llvm/llvm-project/issues/80570.

In

a0b6747804,
we skipped ODR checks for decls in GMF. Then it should be natural to
skip storing the ODR values in BMI.

Generally it should be fine as long as the writer and the reader keep
consistent.

However, the use of preamble in clangd shows the tricky part.

For,

```
// test.cpp
module;

// any one off these is enough to crash clangd
// #include <iostream>
// #include <string_view>
// #include <cmath>
// #include <system_error>
// #include <new>
// #include <bit>
// probably many more

// only ok with libc++, not the system provided libstdc++ 13.2.1

// these are ok

export module test;
```

clangd will store the headers as preamble to speedup the parsing and the
preamble reuses the serialization techniques. (Generally we'd call the
preamble as PCH. However it is not true strictly. I've tested the PCH
wouldn't be problematic.) However, the tricky part is that the preamble
is not modules. It literally serialiaze and deserialize things. So
before clangd parsing the above test module, clangd will serialize the
headers into the preamble. Note that there is no concept like GMF now.
So the ODR bits are stored. However, when clangd parse the file
actually, the decls from preamble are thought as in GMF literally, then
hte ODR bits are skipped. Then mismatch happens.

To solve the problem, this patch adds another bit for decls to record
whether or not the ODR bits are skipped.

(cherry picked from commit 49775b1dc0cdb3a9d18811f67f268e3b3a381669)
2024-02-20 16:41:33 -08:00
Chuanqi Xu
d76b56fd28 [NFC] Eliminate warnings in SourceLocationEncodingTest.cpp
There are 2 dangling else warnings and a warning pararences around bit operations.
This patch tries to eliminate that.
2023-11-02 14:01:17 +08:00
Benjamin Kramer
263fc4c79a Turn off memory leaks in unit test 2023-09-14 13:16:22 +02:00
Chuanqi Xu
45c160510f [C++20] [Modules] [Serialization] Check input file contents with option ForceCheckCXX20ModulesInputFiles enabled
With overriden input files, e,g,. the compiler get the file from an
in-memroy buffer, the compiler can't get correct modified time information
to indicate whehter the input files are changed or not. Then, the
semantics of ForceCheckCXX20ModulesInputFiles are broken.

In this patch, if both ForceCheckCXX20ModulesInputFiles and
ValidateASTInputFilesContent and enabled, the compiler will still check the
hash value of the contents even if their modification time is the same.
2023-09-14 11:27:50 +08:00
Mikhail Goncharov
4faa6c6699 [clang][test] Use a physical copy of FS
(missing piece from https://reviews.llvm.org/D152265)
476e7c49ecb762df1d68273696b06c36feb0fd96
2023-06-06 17:36:35 +02:00
Kadir Cetinkaya
476e7c49ec
[clang][test] Use a physical copy of FS
Make use of a physical copy, rather than real FS in unittests that
change working-directory to get rid of the side effect of changing cwd for the
whole process. It's triggering crashes depending on the test order.

Differential Revision: https://reviews.llvm.org/D152265
2023-06-06 15:45:55 +02:00
Chuanqi Xu
c336c983bc [C++20] [Modules] [Serialization] Don't write comments to BMI for C++20 Named Modules
This patch forbids to write comment to BMIs for C++20 Named Modules.
Originally I thought this was helpful for language services like clangd.
But I found clangd don't want the BMI to contain comments actually. So
it is meaningless for C++20 Named Modules to keep such comments in
their BMI.

It is simple to enable this when someday we found we want this actually.
2023-06-06 13:05:17 +08:00
Michael Liao
2daf91dae3 Fix shared library build again from 1c9a800. NFC 2023-05-24 14:09:38 -04:00
Amy Kwan
61262f9ef4 Fix shared library build from 1c9a800.
Fix the shared library build failure on clang-ppc64le-rhel from 1c9a800 as seen
in: https://lab.llvm.org/buildbot/#/builders/57/builds/27080/steps/6/logs/stdio
2023-05-24 12:29:23 -05:00
Chuanqi Xu
1c9a8004ed Recommit [C++20] [Modules] Serialize the evaluated constant values for VarDecl
Close https://github.com/llvm/llvm-project/issues/62796.

Previously, we didn't serialize the evaluated result for VarDecl. This
caused the compilation of template metaprogramming become slower than
expect. This patch fixes the issue.

This is a recommit tested with asan built clang.
2023-05-24 15:45:16 +08:00
Chuanqi Xu
651b40e8ff Revert "[C++20] [Modules] Serialize the evaluated constant values for VarDecl"
This reverts commit c0d6f85e3ae8bcfdb7217d165314f01c1a4af9ae. The asan
bot detected a memory leak after this patch. Revert it for now.
2023-05-24 13:56:09 +08:00
Chuanqi Xu
597dd1f91d [NFC] Fix the warning for dangling pointer for c0d6f85e3ae8bc
The bot notes a warning-converted-error for the dangling pointer. And
the patch fixes that.
2023-05-24 11:42:55 +08:00
Chuanqi Xu
c0d6f85e3a [C++20] [Modules] Serialize the evaluated constant values for VarDecl
Close https://github.com/llvm/llvm-project/issues/62796.

Previously, we didn't serialize the evaluated result for VarDecl. This
caused the compilation of template metaprogramming become slower than
expect. This patch fixes the issue.
2023-05-24 10:17:33 +08:00
Kazu Hirata
6ad0788c33 [clang] Use std::optional instead of llvm::Optional (NFC)
This patch replaces (llvm::|)Optional< with std::optional<.  I'll post
a separate patch to remove #include "llvm/ADT/Optional.h".

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2023-01-14 12:31:01 -08:00
Kazu Hirata
a1580d7b59 [clang] Add #include <optional> (NFC)
This patch adds #include <optional> to those files containing
llvm::Optional<...> or Optional<...>.

I'll post a separate patch to actually replace llvm::Optional with
std::optional.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2023-01-14 11:07:21 -08:00
Kazu Hirata
a41fbb1fc2 [clang/unittests] Use std::nullopt instead of None (NFC)
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated.  The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-03 12:14:19 -08:00
Sam McCall
481691572d [Serialization] Add missing includes for CHAR_BIT 2022-05-19 10:04:25 +02:00
Sam McCall
4df795bff7 [Serialization] Delta-encode consecutive SourceLocations in TypeLoc
Much of the size of PCH/PCM files comes from stored SourceLocations.
These are encoded using (almost) their raw value, VBR-encoded. Absolute
SourceLocations can be relatively large numbers, so this commonly takes
20-30 bits per location.

We can reduce this by exploiting redundancy: many "nearby" SourceLocations are
stored differing only slightly and can be delta-encoded.
Randam-access loading of AST nodes constrains how long these sequences
can be, but we can do it at least within a node that always gets
deserialized as an atomic unit.

TypeLoc is implemented in this patch as it's a relatively small change
that shows most of the API.
This saves ~3.5% of PCH size, I have local changes applying this technique
further that save another 3%, I think it's possible to get to 10% total.

Differential Revision: https://reviews.llvm.org/D125403
2022-05-19 09:40:44 +02:00
Sam McCall
499d0b96cb [clang] createInvocationFromCommandLine -> createInvocation, delete former. NFC
(Followup from 40c13720a4b977d4347bbde53c52a4d0703823c2)

Differential Revision: https://reviews.llvm.org/D125012
2022-05-06 16:21:48 +02:00
Ben Barham
766a08df12 [Frontend] Only compile modules if not already finalized
It was possible to re-add a module to a shared in-memory module cache
when search paths are changed. This can eventually cause a crash if the
original module is referenced after this occurs.
  1. Module A depends on B
  2. B exists in two paths C and D
  3. First run only has C on the search path, finds A and B and loads
     them
  4. Second run adds D to the front of the search path. A is loaded and
     contains a reference to the already compiled module from C. But
     searching finds the module from D instead, causing a mismatch
  5. B and the modules that depend on it are considered out of date and
     thus rebuilt
  6. The recompiled module A is added to the in-memory cache, freeing
     the previously inserted one

This can never occur from a regular clang process, but is very easy to
do through the API - whether through the use of a shared case or just
running multiple compilations from a single `CompilerInstance`. Update
the compilation to return early if a module is already finalized so that
the pre-condition in the in-memory module cache holds.

Resolves rdar://78180255

Differential Revision: https://reviews.llvm.org/D105328
2021-07-15 18:27:08 -07:00
Rumeet Dhindsa
57a2eaf3c1 Revert "[modules] Do not cache invalid state for modules that we attempted to load."
As per comment on https://reviews.llvm.org/D72860, it is suggested to
revert this change in the meantime, since it has introduced regression.

This reverts commit 83f4c3af021cd5322ea10fd1c4e839874c1dae49.
2020-03-10 10:59:26 -07:00
Volodymyr Sapsai
83f4c3af02 [modules] Do not cache invalid state for modules that we attempted to load.
Partially reverts 0a2be46cfdb698fefcc860a56b47dde0884d5335 as it turned
out to cause redundant module rebuilds in multi-process incremental builds.
When a module was getting out of date, all compilation processes started at the
same time were marking it as `ToBuild`. So each process was building the same
module instead of checking if it was built by someone else and using that
result. In addition to the work duplication, contention on the same .pcm file
wasn't making builds faster.

Note that for a single-process build this change would cause redundant module
reads and validations. But reading a module is faster than building it and
multi-process builds are more common than single-process. So I'm willing to
make such a trade-off.

rdar://problem/54395127

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D72860
2020-01-16 17:12:41 -08:00
Tom Stellard
2e97d2aa1b cmake: Add CLANG_LINK_CLANG_DYLIB option
Summary:
Setting CLANG_LINK_CLANG_DYLIB=ON causes clang tools to link against
libclang_shared.so instead of the individual component libraries.

Reviewers: mgorny, beanz, smeenai, phosek, sylvestre.ledru

Subscribers: arphaman, cfe-commits, llvm-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D63503

llvm-svn: 365092
2019-07-03 22:45:55 +00:00
Francis Visoiu Mistrih
e0308279cb [Bitcode] Move Bitstream to a separate library
This moves Bitcode/Bitstream*, Bitcode/BitCodes.h to Bitstream/.

This is needed to avoid a circular dependency when using the bitstream
code for parsing optimization remarks.

Since Bitcode uses Core for the IR part:

libLLVMRemarks -> Bitcode -> Core

and Core uses libLLVMRemarks to generate remarks (see
IR/RemarkStreamer.cpp):

Core -> libLLVMRemarks

we need to separate the Bitstream and Bitcode part.

For clang-doc, it seems that it doesn't need the whole bitcode layer, so
I updated the CMake to only use the bitstream part.

Differential Revision: https://reviews.llvm.org/D63899

llvm-svn: 365091
2019-07-03 22:40:07 +00:00
Duncan P. N. Exon Smith
b7db2e9f82 Stop relying on allocator behaviour in modules unit test
Another fixup for r355778 for Windows bots, this time to stop
accidentally relying on allocator behaviour for the test to pass.

llvm-svn: 355780
2019-03-09 20:15:01 +00:00
Duncan P. N. Exon Smith
0a2be46cfd Modules: Invalidate out-of-date PCMs as they're discovered
Leverage the InMemoryModuleCache to invalidate a module the first time
it fails to import (and to lock a module as soon as it's built or
imported successfully).  For implicit module builds, this optimizes
importing deep graphs where the leaf module is out-of-date; see example
near the end of the commit message.

Previously the cache finalized ("locked in") all modules imported so far
when starting a new module build.  This was sufficient to prevent
loading two versions of the same module, but was somewhat arbitrary and
hard to reason about.

Now the cache explicitly tracks module state, where each module must be
one of:

- Unknown: module not in the cache (yet).
- Tentative: module in the cache, but not yet fully imported.
- ToBuild: module found on disk could not be imported; need to build.
- Final: module in the cache has been successfully built or imported.

Preventing repeated failed imports avoids variation in builds based on
shifting filesystem state.  Now it's guaranteed that a module is loaded
from disk exactly once.  It now seems safe to remove
FileManager::invalidateCache, but I'm leaving that for a later commit.

The new, precise logic uncovered a pre-existing problem in the cache:
the map key is the module filename, and different contexts use different
filenames for the same PCM file.  (In particular, the test
Modules/relative-import-path.c does not build without this commit.
r223577 started using a relative path to describe a module's base
directory when importing it within another module.  As a result, the
module cache sees an absolute path when (a) building the module or
importing it at the top-level, and a relative path when (b) importing
the module underneath another one.)

The "obvious" fix is to resolve paths using FileManager::getVirtualFile
and change the map key for the cache to a FileEntry, but some contexts
(particularly related to ASTUnit) have a shorter lifetime for their
FileManager than the InMemoryModuleCache.  This is worth pursuing
further in a later commit; perhaps by tying together the FileManager and
InMemoryModuleCache lifetime, or moving the in-memory PCM storage into a
VFS layer.

For now, use the PCM's base directory as-written for constructing the
filename to check the ModuleCache.

Example
=======

To understand the build optimization, first consider the build of a
module graph TU -> A -> B -> C -> D with an empty cache:

    TU builds A'
       A' builds B'
          B' builds C'
             C' builds D'
                imports D'
          B' imports C'
             imports D'
       A' imports B'
          imports C'
          imports D'
    TU imports A'
       imports B'
       imports C'
       imports D'

If we build TU again, where A, B, C, and D are in the cache and D is
out-of-date, we would previously get this build:

    TU imports A
       imports B
       imports C
       imports D (out-of-date)
    TU builds A'
       A' imports B
          imports C
          imports D (out-of-date)
          builds B'
          B' imports C
             imports D (out-of-date)
             builds C'
             C' imports D (out-of-date)
                builds D'
                imports D'
          B' imports C'
             imports D'
       A' imports B'
          imports C'
          imports D'
     TU imports A'
        imports B'
        imports C'
        imports D'

After this commit, we'll immediateley invalidate A, B, C, and D when we
first observe that D is out-of-date, giving this build:

    TU imports A
       imports B
       imports C
       imports D (out-of-date)
    TU builds A' // The same graph as an empty cache.
       A' builds B'
          B' builds C'
             C' builds D'
                imports D'
          B' imports C'
             imports D'
       A' imports B'
          imports C'
          imports D'
    TU imports A'
       imports B'
       imports C'
       imports D'

The new build matches what we'd naively expect, pretty closely matching
the original build with the empty cache.

rdar://problem/48545366

llvm-svn: 355778
2019-03-09 17:44:01 +00:00
Duncan P. N. Exon Smith
8bef5cd49a Modules: Rename MemoryBufferCache to InMemoryModuleCache
Change MemoryBufferCache to InMemoryModuleCache, moving it from Basic to
Serialization.  Another patch will start using it to manage module build
more explicitly, but this is split out because it's mostly mechanical.

Because of the move to Serialization we can no longer abuse the
Preprocessor to forward it to the ASTReader.  Besides the rename and
file move, that means Preprocessor::Preprocessor has one fewer parameter
and ASTReader::ASTReader has one more.

llvm-svn: 355777
2019-03-09 17:33:56 +00:00