mirror of
https://github.com/capstone-engine/llvm-capstone.git
synced 2024-10-07 19:03:57 +00:00
[flang][NFC] Add F2023X documentation
Add a document that summarizes Fortran 202X's upcoming features and their urgency for implementation. Differential Revision: https://reviews.llvm.org/D153916
This commit is contained in:
parent
8babebe8d5
commit
eb5ffa58f5
355
flang/docs/F202X.md
Normal file
355
flang/docs/F202X.md
Normal file
@ -0,0 +1,355 @@
|
||||
<!--===- docs/F202X.md
|
||||
|
||||
Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
|
||||
See https://llvm.org/LICENSE.txt for license information.
|
||||
SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
|
||||
|
||||
-->
|
||||
|
||||
# A first take on Fortran 202X features for LLVM Flang
|
||||
|
||||
I (Peter Klausler) have been studying the draft PDF of the
|
||||
[Fortran 202X standard](https://j3-fortran.org/doc/year/23/23-007r1.pdf),
|
||||
which will soon be published as ISO Fortran 2023.
|
||||
I have compiled this summary of its changes relative to
|
||||
the current Fortran 2018 standard from the perspective
|
||||
of a [Fortran compiler](https://github.com/llvm/llvm-project/tree/main/flang)
|
||||
implementor.
|
||||
|
||||
## TL;DR
|
||||
|
||||
Fortran 202X doesn't make very many changes to the language
|
||||
relative to Fortran 2018, which was itself a small increment
|
||||
over Fortran 2008.
|
||||
Apart from `REDUCE` clauses that were added to the
|
||||
[still broken](https://github.com/llvm/llvm-project/blob/main/flang/docs/DoConcurrent.md)
|
||||
`DO CONCURRENT` construct, there's little here for Fortran users
|
||||
to get excited about.
|
||||
|
||||
## Priority of implementation in LLVM Flang
|
||||
|
||||
We are working hard to ensure that existing working applications will
|
||||
port successfully to LLVM Flang with minimal effort.
|
||||
I am not particularly concerned with conforming to a new
|
||||
standard as an end in itself.
|
||||
|
||||
The only features below that appear to have already been implemented
|
||||
in other compilers are the `REDUCE` clauses and the degree trigonometric
|
||||
intrinsic functions, so those should have priority as an aid to
|
||||
portability.
|
||||
We would want to support them earlier even if they were not in a standard.
|
||||
|
||||
The `REDUCE` clause also merits early implementation due to
|
||||
its potential for performance improvements in real codes.
|
||||
I don't see any other feature here that would be relevant to
|
||||
performance (maybe a weak argument could be made for `SIMPLE`).
|
||||
The bulk of this revision unfortunately comprises changes to Fortran that
|
||||
are neither performance-related, already available in
|
||||
some compilers, nor (obviously) in use in existing codes.
|
||||
I will not prioritize implementing them myself over
|
||||
other work until they become portability concerns or are
|
||||
requested by actual users.
|
||||
|
||||
Given Fortran's history of the latency between new
|
||||
standards and the support for their features in real compilers,
|
||||
and then the extra lag before the features are then actually used
|
||||
in codes meant to be portable, I doubt that many of the items
|
||||
below will have to be worked on any time soon due to user demand.
|
||||
|
||||
If J3 had chosen to add more features that were material improvements
|
||||
to Fortran -- and there's quite a long list of worthy candidates that
|
||||
were passed over, like read-only pointers -- it would have made sense
|
||||
for me to prioritize their implementation in LLVM Flang more
|
||||
urgently.
|
||||
|
||||
## Specific change descriptions
|
||||
|
||||
The individual features added to the language are summarized
|
||||
in what I see as their order of significance to Fortran users.
|
||||
|
||||
### Alert: There's a breaking change!
|
||||
|
||||
The Fortran committee used to abhor making breaking changes,
|
||||
apart from fixes, so that conforming codes could be portable across
|
||||
time as well as across compilers.
|
||||
Fortran 202X, however, uncharacteristically perpetrates one such
|
||||
change to existing semantics that will silently cause existing
|
||||
codes to work differently, if that change were to be implemented
|
||||
and enabled by default.
|
||||
|
||||
Specifically, automatic reallocation of whole deferred-length character
|
||||
allocatable scalars is now mandated when they appear for internal output
|
||||
(e.g., `WRITE(A,*) ...`)
|
||||
or as output arguments for some statements and intrinsic procedures
|
||||
(e.g., `IOMSG=`, `ERRMSG=`).
|
||||
So existing codes that allocate output buffers
|
||||
for such things will, or would, now observe that their buffers are
|
||||
silently changing their lengths during execution, rather than being
|
||||
padded with blanks or being truncated. For example:
|
||||
|
||||
```
|
||||
character(:), allocatable :: buffer
|
||||
allocate(character(20)::buffer)
|
||||
write(buffer,'F5.3') 3.14159
|
||||
print *, len(buffer)
|
||||
```
|
||||
|
||||
prints 20 with Fortran 2018 but would print 5 with Fortran 202X.
|
||||
|
||||
There would have no problem with the new standard changing the
|
||||
behavior in the current error case of an unallocated variable;
|
||||
defining new semantics for old errors is a generally safe means
|
||||
for extending a programming language.
|
||||
However, in this case, we'll need to protect existing conforming
|
||||
codes from the surprising new reallocation semantics, which
|
||||
affect cases that are not errors.
|
||||
|
||||
When/if there are requests from real users to implement this breaking
|
||||
change, and if it is implemented, I'll have to ensure that users
|
||||
have the ability to control this change in behavior via an option &/or the
|
||||
runtime environment, and when it's enabled, emit a warning at code
|
||||
sites that are at risk.
|
||||
This warning should mention a source change they can make to protect
|
||||
themselves from this change by passing the complete substring (`A(:)`)
|
||||
instead of a whole character allocatable.
|
||||
|
||||
This feature reminds me of Fortran 2003's change to whole
|
||||
allocatable array assignment, although in that case users were
|
||||
put at risk only of extra runtime overhead that was needless in
|
||||
existing codes, not a change in behavior, and users learned to
|
||||
assign to whole array sections (`A(:)=...`) rather than to whole
|
||||
allocatable arrays where the performance hit mattered.
|
||||
|
||||
### Major Items
|
||||
|
||||
The features in this section are expensive to implement in
|
||||
terms of engineering time to design, code, refactor, and test
|
||||
(i.e., weeks or months, not days).
|
||||
|
||||
#### `DO CONCURRENT REDUCE`
|
||||
|
||||
J3 continues to ignore the
|
||||
[serious semantic problems](https://github.com/llvm/llvm-project/blob/main/flang/docs/DoConcurrent.md)
|
||||
with `DO CONCURRENT`, despite the simplicity of the necessary fix and their
|
||||
admirable willingness to repair the standard to fix problems with
|
||||
other features (e.g., plugging holes in `PURE` procedure requirements)
|
||||
and their less admirable willingness to make breaking changes (see above).
|
||||
They did add `REDUCE` clauses to `DO CONCURRENT`, and those seem to be
|
||||
immediately useful to HPC codes and worth implementing soon.
|
||||
|
||||
#### `SIMPLE` procedures
|
||||
|
||||
The new `SIMPLE` procedures constitute a subset of F'95/HPF's `PURE`
|
||||
procedures.
|
||||
There are things that one can do in a `PURE` procedure
|
||||
but cannot in a `SIMPLE` one. But the virtue of being `SIMPLE` seems
|
||||
to be its own reward, not a requirement to access any other
|
||||
feature.
|
||||
|
||||
`SIMPLE` procedures might have been more useful had `DO CONCURRENT` been
|
||||
changed to require callees to be `SIMPLE`, not just `PURE`.
|
||||
|
||||
The implementation of `SIMPLE` will be nontrivial: it involves
|
||||
some parsing and symbol table work, and some generalization of the
|
||||
predicate function `IsPureProcedure()`, extending the semantic checking on
|
||||
calls in `PURE` procedures to ensure that `SIMPLE` procedures
|
||||
only call other `SIMPLE` procedures, and modifying the intrinsic
|
||||
procedure table to note that most intrinsics are now `SIMPLE`
|
||||
rather than just `PURE`.
|
||||
|
||||
I don't expect any codes to rush to change their `PURE` procedures
|
||||
to be `SIMPLE`, since it buys little and reduces portability.
|
||||
This makes `SIMPLE` a lower-priority feature.
|
||||
|
||||
#### Conditional expressions and actual arguments
|
||||
|
||||
Next on the list of "big ticket" items are C-style conditional
|
||||
expressions. These come in two forms, each of which is a distinct
|
||||
feature that would be nontrivial to implement, and I would not be
|
||||
surprised to see some compilers implement one before the other.
|
||||
|
||||
The first form is a new parenthesized expression primary that any C programmer
|
||||
would recognize. It has straightforward parsing and semantics,
|
||||
but will require support in folding and all other code that
|
||||
processes expressions. Lowering will be nontrivial due to
|
||||
control flow.
|
||||
|
||||
The second form is a conditional actual argument syntax
|
||||
that allows runtime selection of argument associations, as well
|
||||
as a `.NIL.` syntax for optional arguments to signify an absent actual
|
||||
argument. This would have been more useful if it had also been
|
||||
allowed as a pointer assignment statement right-hand side, and
|
||||
that might be a worthwhile extension. As this form is essentially
|
||||
a conditional variable reference it may be cleaner to have a
|
||||
distinct representation from the conditional expression primary
|
||||
in the parse tree and strongly-typed `Expr<T>` representations.
|
||||
|
||||
#### `ENUMERATION TYPE`
|
||||
|
||||
Fortran 202X has a new category of type. The new non-interoperable
|
||||
`ENUMERATION TYPE` feature is like C++'s `enum class` -- not, unfortunately,
|
||||
a powerful sum data type as in Haskell or Rust. Unlike the
|
||||
current `ENUM, BIND(C)` feature, `ENUMERATION TYPE` defines a new
|
||||
type name and its distinct values.
|
||||
|
||||
This feature may well be the item requiring the largest patch to
|
||||
the compiler for its implementation, as it affects parsing,
|
||||
type checking on assignment and argument association, generic
|
||||
resolution, formatted I/O, NAMELIST, debugging symbols, &c.
|
||||
It will indirectly affect every switch statement in the compiler
|
||||
that switches over the six (now seven) type categories.
|
||||
This will be a big project for little useful return to users.
|
||||
|
||||
#### `TYPEOF` and `CLASSOF`
|
||||
|
||||
Last on the list of "big ticket" items are the new TYPEOF and CLASSOF
|
||||
type specifiers, which allow declarations to indirectly use the
|
||||
types of previously-defined entities. These would have obvious utility
|
||||
in a language with type polymorphism but aren't going to be very
|
||||
useful yet in Fortran 202X (esp. `TYPEOF`), although they would be worth
|
||||
supporting as a utility feature for a parametric module extension.
|
||||
|
||||
`CLASSOF` has implications for semantics and lowering that need to
|
||||
be thought through as it seems to provide a means of
|
||||
declaring polymorphic local variables and function results that are
|
||||
neither allocatables nor pointers.
|
||||
|
||||
#### Coarray extensions:
|
||||
|
||||
* `NOTIFY_TYPE`, `NOTIFY WAIT` statement, `NOTIFY=` specifier on image selector
|
||||
* Arrays with coarray components
|
||||
|
||||
#### "Rank Independent" Features
|
||||
|
||||
The `RANK(n)` attribute declaration syntax is equivalent to
|
||||
`DIMENSION(:,:,...,:)` or an equivalent entity-decl containing `n` colons.
|
||||
As `n` must be a constant expression, that's straightforward to implement,
|
||||
though not terribly useful until the language acquires additional features.
|
||||
(I can see some utility in being able to declare PDT components with a
|
||||
`RANK` that depends on a `KIND` type parameter.)
|
||||
|
||||
It is now possible to declare the lower and upper bounds of an explicit
|
||||
shape entity using a constant-length vector specification expression
|
||||
in a declaration, `ALLOCATE` statement, or pointer assignment with
|
||||
bounds remapping.
|
||||
For example, `real A([2,3])` is equivalent to `real A(2,3)`.
|
||||
|
||||
The new `A(@V)` "multiple subscript" indexing syntax uses an integer
|
||||
vector to supply a list of subscripts or of triplet bounds/strides. This one
|
||||
has tough edge cases for lowering that need to be thought through;
|
||||
for example, when the lengths of two or more of the vectors in
|
||||
`A(@U,@V,@W)` are not known at compilation time, implementing the indexing
|
||||
would be tricky in generated code and might just end up filling a
|
||||
temporary with `[U,V,W]` first.
|
||||
|
||||
The obvious use case for "multiple subscripts" would be as a means to
|
||||
index into an assumed-rank dummy argument without the bother of a `SELECT RANK`
|
||||
construct, but that usage is not supported in Fortran 202X.
|
||||
|
||||
This feature may well turn out to be Fortran 202X's analog to Fortran 2003's
|
||||
`LEN` derived type parameters.
|
||||
|
||||
### Minor Items
|
||||
|
||||
So much for the major features of Fortran 202X. The longer list
|
||||
of minor features can be more briefly summarized.
|
||||
|
||||
#### New Edit Descriptors
|
||||
|
||||
Fortran 202X has some noncontroversial small tweaks to formatted output.
|
||||
The `AT` edit descriptor automatically trims character output. The `LZP`,
|
||||
`LZS`, and `LZ` control edit descriptors and `LEADING_ZERO=` specifier provide a
|
||||
means for controlling the output of leading zero digits.
|
||||
|
||||
#### Intrinsic Module Extensions
|
||||
|
||||
Addressing some issues and omissions in intrinsic modules:
|
||||
|
||||
* LOGICAL8/16/32/64 and REAL16
|
||||
* IEEE module facilities upgraded to match latest IEEE FP standard
|
||||
* C_F_STRPOINTER, F_C_STRING for NUL-terminated strings
|
||||
* C_F_POINTER(LOWER=)
|
||||
|
||||
#### Intrinsic Procedure Extensions
|
||||
|
||||
The `SYSTEM_CLOCK` intrinsic function got some semantic tweaks.
|
||||
|
||||
There are new intrinsic functions for trigonometric functions in
|
||||
units of degrees and half-circles.
|
||||
GNU Fortran already supports the forms that use degree units.
|
||||
These should call into math library implementations that are
|
||||
specialized for those units rather than simply multiplying
|
||||
arguments or results with conversion factors.
|
||||
* `ACOSD`, `ASIND`, `ATAND`, `ATAN2D`, `COSD`, `SIND`, `TAND`
|
||||
* `ACOSPI`, `ASINPI`, `ATANPI`, `ATAN2PI`, `COSPI`, `SINPI`, `TANPI`
|
||||
|
||||
`SELECTED_LOGICAL_KIND` maps a bit size to a kind of `LOGICAL`
|
||||
|
||||
There are two new character utility intrinsic
|
||||
functions whose implementations have very low priority: `SPLIT` and `TOKENIZE`.
|
||||
`TOKENIZE` requires memory allocation to return its results,
|
||||
and could and should have been implemented once in some Fortran utility
|
||||
library for those who need a slow tokenization facility rather than
|
||||
requiring implementations in each vendor's runtime support library with
|
||||
all the extra cost and compatibilty risk that entails.
|
||||
|
||||
`SPLIT` is worse -- not only could it, like `TOKENIZE`,
|
||||
have been supplied by a Fortran utility library rather than being
|
||||
added to the standard, it's redundant;
|
||||
it provides nothing that cannot be already accomplished by
|
||||
composing today's `SCAN` intrinsic function with substring indexing:
|
||||
|
||||
```
|
||||
module m
|
||||
interface split
|
||||
module procedure :: split
|
||||
end interface
|
||||
!instantiate for all possible ck/ik/lk combinations
|
||||
integer, parameter :: ck = kind(''), ik = kind(0), lk = kind(.true.)
|
||||
contains
|
||||
simple elemental subroutine split(string, set, pos, back)
|
||||
character(*, kind=ck), intent(in) :: string, set
|
||||
integer(kind=ik), intent(in out) :: pos
|
||||
logical(kind=lk), intent(in), optional :: back
|
||||
if (present(back)) then
|
||||
if (back) then
|
||||
pos = scan(string(:pos-1), set, .true.)
|
||||
return
|
||||
end if
|
||||
end if
|
||||
npos = scan(string(pos+1:), set)
|
||||
pos = merge(pos + npos, len(string) + 1, npos /= 0)
|
||||
end
|
||||
end
|
||||
```
|
||||
|
||||
(The code above isn't a proposed implementation for `SPLIT`, just a
|
||||
demonstration of how programs could use `SCAN` to accomplish the same
|
||||
results today.)
|
||||
|
||||
## Source limitations
|
||||
|
||||
Fortran 202X raises the maximum number of characters per free form
|
||||
source line and the maximum total number of characters per statement.
|
||||
Both of these have always been unlimited in this compiler (or
|
||||
limited only by available memory, to be more accurate.)
|
||||
|
||||
## More BOZ usage opportunities
|
||||
|
||||
BOZ literal constants (binary, octal, and hexadecimal constants,
|
||||
also known as "typeless" values) have more conforming usage in the
|
||||
new standard in contexts where the type is unambiguously known.
|
||||
They may now appear as initializers, as right-hand sides of intrinsic
|
||||
assignments to integer and real variables, in explicitly typed
|
||||
array constructors, and in the definitions of enumerations.
|
||||
|
||||
## Citation updates
|
||||
|
||||
The source base contains hundreds of references to the subclauses,
|
||||
requirements, and constraints of the Fortran 2018 standard, mostly in code comments.
|
||||
These will need to be mapped to their Fortran 202X counterparts once the
|
||||
new standard is published, as the Fortran committee does not provide a
|
||||
means for citing these items by names that are fixed over time like the
|
||||
C++ committee does.
|
||||
If we had access to the LaTeX sources of the standard, we could generate
|
||||
a mapping table and automate this update.
|
Loading…
Reference in New Issue
Block a user