301eb6b68f
Platform-specific language extensions often want to provide a way of indicating that certain functions should be called in a different way, compiled in a different way, or otherwise treated differently from a “normal” function. Honoring these indications is often required for correctness, rather being than an optimization/QoI thing. If a function declaration has a property P that matters for correctness, it will be ODR-incompatible with a function that does not have property P. If a function type has a property P that affects the calling convention, it will not be two-way compatible with a function type that does not have property P. These properties therefore affect language semantics. That in turn means that they cannot be treated as standard [[]] attributes. Until now, many of these properties have been specified using GNU-style attributes instead. GNU attributes have traditionally been more lax than standard attributes, with many of them having semantic meaning. Examples include calling conventions and the vector_size attribute. However, there is a big drawback to using GNU attributes for semantic information: compilers that don't understand the attributes will (by default) emit a warning rather than an error. They will go on to compile the code as though the attributes weren't present, which will inevitably lead to wrong code in most cases. For users who live dangerously and disable the warning, this wrong code could even be generated silently. A more robust approach would be to specify the properties using keywords, which older compilers would then reject. Some vendor-specific extensions have already taken this approach. But traditionally, each such keyword has been treated as a language extension in its own right. This has three major drawbacks: (1) The parsing rules need to be kept up-to-date as the language evolves. (2) There are often corner cases that similar extensions handle differently. (3) Each extension requires more custom code than a standard attribute. The underlying problem for all three is that, unlike for true attributes, there is no established template that extensions can reuse. The purpose of this patch series is to try to provide such a template. One option would have been to pick an existing keyword and do whatever that keyword does. The problem with that is that most keywords only apply to specific kinds of types, kinds of decls, etc., and so the parsing rules are (for good reason) not generally applicable to all types and decls. Really, the “only” thing wrong with using standard attributes is that standard attributes cannot affect semantics. In all other respects they provide exactly what we need: a well-defined grammar that evolves with the language, clear rules about what an attribute appertains to, and so on. This series therefore adds keyword “attributes” that can appear exactly where a standard attribute can appear and that appertain to exactly what a standard attribute would appertain to. The link is mechanical and no opt-outs or variations are allowed. This should make the keywords predictable for programmers who are already familiar with standard attributes. This does mean that these keywords will be accepted for parsing purposes in many more places than necessary. Inappropriate uses will then be diagnosed during semantic analysis. However, the compiler would need to reject the keywords in those positions whatever happens, and treating them as ostensible attributes shouldn't be any worse than the alternative. In some cases it might even be better. For example, SME's __arm_streaming attribute would make conceptual sense as a statement attribute, so someone who takes a “try-it-and-see” approach might write: __arm_streaming { …block-of-code…; } In fact, we did consider supporting this originally. The reason for rejecting it was that it was too difficult to implement, rather than because it didn't make conceptual sense. One slight disadvantage of the keyword-based approach is that it isn't possible to use #pragma clang attribute with the keywords. Perhaps we could add support for that in future, if it turns out to be useful. For want of a better term, I've called the new attributes "regular" keyword attributes (in the sense that their parsing is regular wrt standard attributes), as opposed to "custom" keyword attributes that have their own parsing rules. This patch adds the Attr.td support for regular keyword attributes. Adding an attribute with a RegularKeyword spelling causes tablegen to define the associated tokens and to record that attributes created with that syntax are regular keyword attributes rather than custom keyword attributes. A follow-on patch contains the main Parse and Sema support, which is enabled automatically by the Attr.td definition. Other notes: * The series does not allow regular keyword attributes to take arguments, but this could be added in future. * I wondered about trying to use tablegen for TypePrinter::printAttributedAfter too, but decided against it. RegularKeyword is really a spelling-level classification rather than an attribute-level classification, and in general, an attribute could have both GNU and RegularKeyword spellings. In contrast, printAttributedAfter is only given the attribute kind and the type that results from applying the attribute. AFAIK, it doesn't have access to the original attribute spelling. This means that some attribute-specific or type-specific knowledge might be needed to print the attribute in the best way. * Generating the tokens automatically from Attr.td means that pseudo's libgrammar does now depend on tablegen. * The patch uses the SME __arm_streaming attribute as an example for testing purposes. The attribute does not do anything at this stage. Later SME-specific patches will add proper semantics for it, and add other SME-related keyword attributes. Differential Revision: https://reviews.llvm.org/D148700 |
||
---|---|---|
.. | ||
benchmarks | ||
fuzzer | ||
gen | ||
include | ||
lib | ||
test | ||
tool | ||
unittests | ||
CMakeLists.txt | ||
DesignNotes.md | ||
Disambiguation.md | ||
README.md |
clang pseudoparser
This directory implements an approximate heuristic parser for C++, based on the clang lexer, the C++ grammar, and the GLR parsing algorithm.
It parses a file in isolation, without reading its included headers. The result is a strict syntactic tree whose structure follows the C++ grammar. There is no semantic analysis, apart from guesses to disambiguate the parse. Disambiguation can optionally be guided by an AST or a symbol index.
For now, the best reference on intended scope is the design proposal, with further discussion on the RFC.
Dependencies between pseudoparser and clang
Dependencies are limited because they don't make sense, but also to avoid placing a burden on clang mantainers.
The pseudoparser reuses the clang lexer (clangLex and clangBasic libraries) but not the higher-level libraries (Parse, Sema, AST, Frontend...).
When the pseudoparser should be used together with an AST (e.g. to guide disambiguation), this is a separate "bridge" library that depends on both.
Clang does not depend on the pseudoparser at all. If this seems useful in future it should be discussed by RFC.
Parity between pseudoparser and clang
The pseudoparser aims to understand real-world code, and particularly the languages and extensions supported by Clang.
However we don't try to keep these in lockstep: there's no expectation that Clang parser changes are accompanied by pseudoparser changes or vice versa.