This patch plugs many holes in static initializer semantics, improves error messages for default initial values and other component properties in parameterized derived type instantiations, and cleans up several small issues noticed during development. We now do proper scalar expansion, folding, and type, rank, and shape conformance checking for component default initializers in derived types and PDT instantiations. The initial values of named constants are now guaranteed to have been folded when installed in the symbol table, and are no longer folded or scalar-expanded at each use in expression folding. Semantics documentation was extended with information about the various kinds of initializations in Fortran and when each of them are processed in the compiler. Some necessary concomitant changes have bulked this patch out a bit: * contextual messages attachments, which are now produced for parameterized derived type instantiations so that the user can figure out which instance caused a problem with a component, have been added as part of ContextualMessages, and their implementation was debugged * several APIs in evaluate::characteristics was changed so that a FoldingContext is passed as an argument rather than just its intrinsic procedure table; this affected client call sites in many files * new tools in Evaluate/check-expression.cpp to determine when an Expr actually is a single constant value and to validate a non-pointer variable initializer or object component default value * shape conformance checking has additional arguments that control whether scalar expansion is allowed * several now-unused functions and data members noticed and removed * several crashes and bogus errors exposed by testing this new code were fixed * a -fdebug-stack-trace option to enable LLVM's stack tracing on a crash, which might be useful in the future TL;DR: Initialization processing does more and takes place at the right times for all of the various kinds of things that can be initialized. Differential Review: https://reviews.llvm.org/D92783
8.9 KiB
Semantic Analysis
.. contents::
:local:
The semantic analysis pass determines if a syntactically correct Fortran program is is legal by enforcing the constraints of the language.
The input is a parse tree with a Program
node at the root;
and a "cooked" character stream, a contiguous stream of characters
containing a normalized form of the Fortran source.
The semantic analysis pass takes a parse tree for a syntactically correct Fortran program and determines whether it is legal by enforcing the constraints of the language.
If the program is not legal, the results of the semantic pass will be a list of errors associated with the program.
If the program is legal, the semantic pass will produce a (possibly modified) parse tree for the semantically correct program with each name mapped to a symbol and each expression fully analyzed.
All user errors are detected either prior to or during semantic analysis. After it completes successfully the program should compile with no error messages. There may still be warnings or informational messages.
Phases of Semantic Analysis
- Validate labels - Check all constraints on labels and branches
- Rewrite DO loops -
Convert all occurrences of
LabelDoStmt
toDoConstruct
. - Name resolution -
Analyze names and declarations, build a tree of Scopes containing Symbols,
and fill in the
Name::symbol
data member in the parse tree - Rewrite parse tree - Fix incorrect parses based on symbol information
- Expression analysis -
Analyze all expressions in the parse tree and fill in
Expr::typedExpr
andVariable::typedExpr
with analyzed expressions; fix incorrect parses based on the result of this analysis - Statement semantics - Perform remaining semantic checks on the execution parts of subprograms
- Write module files -
If no errors have occurred, write out
.mod
files for modules and submodules
If phase 1 or phase 2 encounter an error on any of the program units, compilation terminates. Otherwise, phases 3-6 are all performed even if errors occur. Module files are written (phase 7) only if there are no errors.
Validate labels
Perform semantic checks related to labels and branches:
- check that any labels that are referenced are defined and in scope
- check branches into loop bodies
- check that labeled
DO
loops are properly nested - check labels in data transfer statements
Rewrite DO loops
This phase normalizes the parse tree by removing all unstructured DO
loops
and replacing them with DO
constructs.
Name resolution
The name resolution phase walks the parse tree and constructs the symbol table.
The symbol table consists of a tree of Scope
objects rooted at the global scope.
The global scope is owned by the SemanticsContext
object.
It contains a Scope
for each program unit in the compilation.
Each Scope
in the scope tree contains child scopes representing other scopes
lexically nested in it.
Each Scope
also contains a map of CharBlock
to Symbol
representing names
declared in that scope. (All names in the symbol table are represented as
CharBlock
objects, i.e. as substrings of the cooked character stream.)
All Symbol
objects are owned by the symbol table data structures.
They should be accessed as Symbol *
or Symbol &
outside of the symbol
table classes as they can't be created, copied, or moved.
The Symbol
class has functions and data common across all symbols, and a
details
field that contains more information specific to that type of symbol.
Many symbols also have types, represented by DeclTypeSpec
.
Types are also owned by scopes.
Name resolution happens on the parse tree in this order:
- Process the specification of a program unit:
- Create a new scope for the unit
- Create a symbol for each contained subprogram containing just the name
- Process the opening statement of the unit (
ModuleStmt
,FunctionStmt
, etc.) - Process the specification part of the unit
- Apply the same process recursively to nested subprograms
- Process the execution part of the program unit
- Process the execution parts of nested subprograms recursively
After the completion of this phase, every Name
corresponds to a Symbol
unless an error occurred.
Rewrite parse tree
The parser cannot build a completely correct parse tree without symbol information. This phase corrects mis-parses based on symbols:
- Array element assignments may be parsed as statement functions:
a(i) = ...
- Namelist group names without
NML=
may be parsed as format expressions - A file unit number expression may be parsed as a character variable
This phase also produces an internal error if it finds a Name
that does not
have its symbol
data member filled in. This error is suppressed if other
errors have occurred because in that case a Name
corresponding to an erroneous
symbol may not be resolved.
Expression analysis
Expressions that occur in the specification part are analyzed during name resolution, for example, initial values, array bounds, type parameters. Any remaining expressions are analyzed in this phase.
For each Variable
and top-level Expr
(i.e. one that is not nested below
another Expr
in the parse tree) the analyzed form of the expression is saved
in the typedExpr
data member. After this phase has completed, the analyzed
expression can be accessed using semantics::GetExpr()
.
This phase also corrects mis-parses based on the result of expression analysis:
- An expression like
a(b)
is parsed as a function reference but may need to be rewritten to an array element reference (ifa
is an object entity) or to a structure constructor (ifa
is a derive type) - An expression like
a(b:c)
is parsed as an array section but may need to be rewritten as a substring ifa
is an object with type CHARACTER
Statement semantics
Multiple independent checkers driven by the SemanticsVisitor
framework
perform the remaining semantic checks.
By this phase, all names and expressions that can be successfully resolved
have been. But there may be names without symbols or expressions without
analyzed form if errors occurred earlier.
Initialization processing
Fortran supports many means of specifying static initializers for variables, object pointers, and procedure pointers, as well as default initializers for derived type object components, pointers, and type parameters.
Non-pointer static initializers of variables and named constants are scanned, analyzed, folded, scalar-expanded, and validated as they are traversed during declaration processing in name resolution. So are the default initializers of non-pointer object components in non-parameterized derived types. Name constant arrays with implied shapes take their actual shape from the initialization expression.
Default initializers of non-pointer components and type parameters in distinct parameterized derived type instantiations are similarly processed as those instances are created, as their expressions may depend on the values of type parameters. Error messages produced during parameterized derived type instantiation are decorated with contextual attachments that point to the declarations or other type specifications that caused the instantiation.
Static initializations in DATA
statements are collected, validated,
and converted into static initialization in the symbol table, as if
the initialized objects had used the newer style of static initialization
in their entity declarations.
All statically initialized pointers, and default component initializers for
pointers, are processed late in name resolution after all specification parts
have been traversed.
This allows for forward references even in the presence of IMPLICIT NONE
.
Object pointer initializers in parameterized derived type instantiations are
also cloned and folded at this late stage.
Validation of pointer initializers takes place later in declaration
checking (below).
Declaration checking
Whenever possible, the enforcement of constraints and "shalls" pertaining to properties of symbols is deferred to a single read-only pass over the symbol table that takes place after all name resolution and typing is complete.
Write module files
Separate compilation information is written out on successful compilation
of modules and submodules. These are used as input to name resolution
in program units that USE
the modules.
Module files are stripped down Fortran source for the module. Parts that aren't needed to compile dependent program units (e.g. action statements) are omitted.
The module file for module m
is named m.mod
and the module file for
submodule s
of module m
is named m-s.mod
.