[flang][NFC] Add F2023X documentation

Add a document that summarizes Fortran 202X's upcoming features and their urgency for implementation. Differential Revision: https://reviews.llvm.org/D153916
2024-10-07 19:03:57 +00:00 · 2023-06-22 11:28:47 -07:00 · 2023-06-22 11:28:47 -07:00 · eb5ffa58f5
commit eb5ffa58f5
parent 8babebe8d5
1 changed files with 355 additions and 0 deletions
--- a/flang/docs/F202X.md
+++ b/flang/docs/F202X.md
@ -0,0 +1,355 @@
+<!--===- docs/F202X.md 
+  
+   Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+   See https://llvm.org/LICENSE.txt for license information.
+   SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+  
+-->
+
+# A first take on Fortran 202X features for LLVM Flang
+
+I (Peter Klausler) have been studying the draft PDF of the
+[Fortran 202X standard](https://j3-fortran.org/doc/year/23/23-007r1.pdf),
+which will soon be published as ISO Fortran 2023.
+I have compiled this summary of its changes relative to
+the current Fortran 2018 standard from the perspective
+of a [Fortran compiler](https://github.com/llvm/llvm-project/tree/main/flang)
+implementor.
+
+## TL;DR
+
+Fortran 202X doesn't make very many changes to the language
+relative to Fortran 2018, which was itself a small increment
+over Fortran 2008.
+Apart from `REDUCE` clauses that were added to the
+[still broken](https://github.com/llvm/llvm-project/blob/main/flang/docs/DoConcurrent.md)
+`DO CONCURRENT` construct, there's little here for Fortran users
+to get excited about.
+
+## Priority of implementation in LLVM Flang
+
+We are working hard to ensure that existing working applications will
+port successfully to LLVM Flang with minimal effort.
+I am not particularly concerned with conforming to a new
+standard as an end in itself.
+
+The only features below that appear to have already been implemented
+in other compilers are the `REDUCE` clauses and the degree trigonometric
+intrinsic functions, so those should have priority as an aid to
+portability.
+We would want to support them earlier even if they were not in a standard.
+
+The `REDUCE` clause also merits early implementation due to
+its potential for performance improvements in real codes.
+I don't see any other feature here that would be relevant to
+performance (maybe a weak argument could be made for `SIMPLE`).
+The bulk of this revision unfortunately comprises changes to Fortran that
+are neither performance-related, already available in
+some compilers, nor (obviously) in use in existing codes.
+I will not prioritize implementing them myself over
+other work until they become portability concerns or are
+requested by actual users.
+
+Given Fortran's history of the latency between new
+standards and the support for their features in real compilers,
+and then the extra lag before the features are then actually used
+in codes meant to be portable, I doubt that many of the items
+below will have to be worked on any time soon due to user demand.
+
+If J3 had chosen to add more features that were material improvements
+to Fortran -- and there's quite a long list of worthy candidates that
+were passed over, like read-only pointers -- it would have made sense
+for me to prioritize their implementation in LLVM Flang more
+urgently.
+
+## Specific change descriptions
+
+The individual features added to the language are summarized
+in what I see as their order of significance to Fortran users.
+
+### Alert: There's a breaking change!
+
+The Fortran committee used to abhor making breaking changes,
+apart from fixes, so that conforming codes could be portable across
+time as well as across compilers.
+Fortran 202X, however, uncharacteristically perpetrates one such
+change to existing semantics that will silently cause existing
+codes to work differently, if that change were to be implemented
+and enabled by default.
+
+Specifically, automatic reallocation of whole deferred-length character
+allocatable scalars is now mandated when they appear for internal output
+(e.g., `WRITE(A,*) ...`)
+or as output arguments for some statements and intrinsic procedures
+(e.g., `IOMSG=`, `ERRMSG=`).
+So existing codes that allocate output buffers
+for such things will, or would, now observe that their buffers are
+silently changing their lengths during execution, rather than being
+padded with blanks or being truncated.  For example:
+
+```
+  character(:), allocatable :: buffer
+  allocate(character(20)::buffer)
+  write(buffer,'F5.3') 3.14159
+  print *, len(buffer)
+```
+
+prints 20 with Fortran 2018 but would print 5 with Fortran 202X.
+
+There would have no problem with the new standard changing the
+behavior in the current error case of an unallocated variable;
+defining new semantics for old errors is a generally safe means
+for extending a programming language.
+However, in this case, we'll need to protect existing conforming
+codes from the surprising new reallocation semantics, which
+affect cases that are not errors.
+
+When/if there are requests from real users to implement this breaking
+change, and if it is implemented, I'll have to ensure that users
+have the ability to control this change in behavior via an option &/or the
+runtime environment, and when it's enabled, emit a warning at code
+sites that are at risk.
+This warning should mention a source change they can make to protect
+themselves from this change by passing the complete substring (`A(:)`)
+instead of a whole character allocatable.
+
+This feature reminds me of Fortran 2003's change to whole
+allocatable array assignment, although in that case users were
+put at risk only of extra runtime overhead that was needless in
+existing codes, not a change in behavior, and users learned to
+assign to whole array sections (`A(:)=...`) rather than to whole
+allocatable arrays where the performance hit mattered.
+
+### Major Items
+
+The features in this section are expensive to implement in
+terms of engineering time to design, code, refactor, and test
+(i.e., weeks or months, not days).
+
+#### `DO CONCURRENT REDUCE`
+
+J3 continues to ignore the
+[serious semantic problems](https://github.com/llvm/llvm-project/blob/main/flang/docs/DoConcurrent.md)
+with `DO CONCURRENT`, despite the simplicity of the necessary fix and their
+admirable willingness to repair the standard to fix problems with
+other features (e.g., plugging holes in `PURE` procedure requirements)
+and their less admirable willingness to make breaking changes (see above).
+They did add `REDUCE` clauses to `DO CONCURRENT`, and those seem to be
+immediately useful to HPC codes and worth implementing soon.
+
+#### `SIMPLE` procedures
+
+The new `SIMPLE` procedures constitute a subset of F'95/HPF's `PURE`
+procedures.
+There are things that one can do in a `PURE` procedure
+but cannot in a `SIMPLE` one.  But the virtue of being `SIMPLE` seems
+to be its own reward, not a requirement to access any other
+feature.
+
+`SIMPLE` procedures might have been more useful had `DO CONCURRENT` been
+changed to require callees to be `SIMPLE`, not just `PURE`.
+
+The implementation of `SIMPLE` will be nontrivial: it involves
+some parsing and symbol table work, and some generalization of the
+predicate function `IsPureProcedure()`, extending the semantic checking on
+calls in `PURE` procedures to ensure that `SIMPLE` procedures
+only call other `SIMPLE` procedures, and modifying the intrinsic
+procedure table to note that most intrinsics are now `SIMPLE`
+rather than just `PURE`.
+
+I don't expect any codes to rush to change their `PURE` procedures
+to be `SIMPLE`, since it buys little and reduces portability.
+This makes `SIMPLE` a lower-priority feature.
+
+#### Conditional expressions and actual arguments
+
+Next on the list of "big ticket" items are C-style conditional
+expressions.  These come in two forms, each of which is a distinct
+feature that would be nontrivial to implement, and I would not be
+surprised to see some compilers implement one before the other.
+
+The first form is a new parenthesized expression primary that any C programmer
+would recognize.  It has straightforward parsing and semantics,
+but will require support in folding and all other code that
+processes expressions.  Lowering will be nontrivial due to
+control flow.
+
+The second form is a conditional actual argument syntax
+that allows runtime selection of argument associations, as well
+as a `.NIL.` syntax for optional arguments to signify an absent actual
+argument.  This would have been more useful if it had also been
+allowed as a pointer assignment statement right-hand side, and
+that might be a worthwhile extension.  As this form is essentially
+a conditional variable reference it may be cleaner to have a
+distinct representation from the conditional expression primary
+in the parse tree and strongly-typed `Expr<T>` representations.
+
+#### `ENUMERATION TYPE`
+
+Fortran 202X has a new category of type.  The new non-interoperable
+`ENUMERATION TYPE` feature is like C++'s `enum class` -- not, unfortunately,
+a powerful sum data type as in Haskell or Rust.  Unlike the
+current `ENUM, BIND(C)` feature, `ENUMERATION TYPE` defines a new
+type name and its distinct values.
+
+This feature may well be the item requiring the largest patch to
+the compiler for its implementation, as it affects parsing,
+type checking on assignment and argument association, generic
+resolution, formatted I/O, NAMELIST, debugging symbols, &c.
+It will indirectly affect every switch statement in the compiler
+that switches over the six (now seven) type categories.
+This will be a big project for little useful return to users.
+
+#### `TYPEOF` and `CLASSOF`
+
+Last on the list of "big ticket" items are the new TYPEOF and CLASSOF
+type specifiers, which allow declarations to indirectly use the
+types of previously-defined entities.  These would have obvious utility
+in a language with type polymorphism but aren't going to be very
+useful yet in Fortran 202X (esp. `TYPEOF`), although they would be worth
+supporting as a utility feature for a parametric module extension.
+
+`CLASSOF` has implications for semantics and lowering that need to
+be thought through as it seems to provide a means of
+declaring polymorphic local variables and function results that are
+neither allocatables nor pointers.
+
+#### Coarray extensions:
+
+ * `NOTIFY_TYPE`, `NOTIFY WAIT` statement, `NOTIFY=` specifier on image selector
+ * Arrays with coarray components
+
+#### "Rank Independent" Features
+
+The `RANK(n)` attribute declaration syntax is equivalent to
+`DIMENSION(:,:,...,:)` or an equivalent entity-decl containing `n` colons.
+As `n` must be a constant expression, that's straightforward to implement,
+though not terribly useful until the language acquires additional features.
+(I can see some utility in being able to declare PDT components with a
+`RANK` that depends on a `KIND` type parameter.)
+
+It is now possible to declare the lower and upper bounds of an explicit
+shape entity using a constant-length vector specification expression
+in a declaration, `ALLOCATE` statement, or pointer assignment with
+bounds remapping.
+For example, `real A([2,3])` is equivalent to `real A(2,3)`.
+
+The new `A(@V)` "multiple subscript" indexing syntax uses an integer
+vector to supply a list of subscripts or of triplet bounds/strides.  This one
+has tough edge cases for lowering that need to be thought through;
+for example, when the lengths of two or more of the vectors in
+`A(@U,@V,@W)` are not known at compilation time, implementing the indexing
+would be tricky in generated code and might just end up filling a
+temporary with `[U,V,W]` first.
+
+The obvious use case for "multiple subscripts" would be as a means to
+index into an assumed-rank dummy argument without the bother of a `SELECT RANK`
+construct, but that usage is not supported in Fortran 202X.
+
+This feature may well turn out to be Fortran 202X's analog to Fortran 2003's
+`LEN` derived type parameters.
+
+### Minor Items
+
+So much for the major features of Fortran 202X.  The longer list
+of minor features can be more briefly summarized.
+
+#### New Edit Descriptors
+
+Fortran 202X has some noncontroversial small tweaks to formatted output.
+The `AT` edit descriptor automatically trims character output.  The `LZP`,
+`LZS`, and `LZ` control edit descriptors and `LEADING_ZERO=` specifier provide a
+means for controlling the output of leading zero digits.
+
+#### Intrinsic Module Extensions
+
+Addressing some issues and omissions in intrinsic modules:
+
+ * LOGICAL8/16/32/64 and REAL16
+ * IEEE module facilities upgraded to match latest IEEE FP standard
+ * C_F_STRPOINTER, F_C_STRING for NUL-terminated strings
+ * C_F_POINTER(LOWER=)
+
+#### Intrinsic Procedure Extensions
+
+The `SYSTEM_CLOCK` intrinsic function got some semantic tweaks.
+
+There are new intrinsic functions for trigonometric functions in
+units of degrees and half-circles.
+GNU Fortran already supports the forms that use degree units.
+These should call into math library implementations that are
+specialized for those units rather than simply multiplying
+arguments or results with conversion factors.
+ * `ACOSD`, `ASIND`, `ATAND`, `ATAN2D`, `COSD`, `SIND`, `TAND`
+ * `ACOSPI`, `ASINPI`, `ATANPI`, `ATAN2PI`, `COSPI`, `SINPI`, `TANPI`
+
+`SELECTED_LOGICAL_KIND` maps a bit size to a kind of `LOGICAL`
+
+There are two new character utility intrinsic
+functions whose implementations have very low priority: `SPLIT` and `TOKENIZE`.
+`TOKENIZE` requires memory allocation to return its results,
+and could and should have been implemented once in some Fortran utility
+library for those who need a slow tokenization facility rather than
+requiring implementations in each vendor's runtime support library with
+all the extra cost and compatibilty risk that entails.
+
+`SPLIT` is worse -- not only could it, like `TOKENIZE`,
+have been supplied by a Fortran utility library rather than being
+added to the standard, it's redundant;
+it provides nothing that cannot be already accomplished by
+composing today's `SCAN` intrinsic function with substring indexing:
+
+```
+module m
+  interface split
+    module procedure :: split
+  end interface
+  !instantiate for all possible ck/ik/lk combinations
+  integer, parameter :: ck = kind(''), ik = kind(0), lk = kind(.true.)
+ contains
+  simple elemental subroutine split(string, set, pos, back)
+    character(*, kind=ck), intent(in) :: string, set
+    integer(kind=ik), intent(in out) :: pos
+    logical(kind=lk), intent(in), optional :: back
+    if (present(back)) then
+      if (back) then
+        pos = scan(string(:pos-1), set, .true.)
+        return
+      end if
+    end if
+    npos = scan(string(pos+1:), set)
+    pos = merge(pos + npos, len(string) + 1, npos /= 0)
+  end
+end
+```
+
+(The code above isn't a proposed implementation for `SPLIT`, just a
+demonstration of how programs could use `SCAN` to accomplish the same
+results today.)
+
+## Source limitations
+
+Fortran 202X raises the maximum number of characters per free form
+source line and the maximum total number of characters per statement.
+Both of these have always been unlimited in this compiler (or
+limited only by available memory, to be more accurate.)
+
+## More BOZ usage opportunities
+
+BOZ literal constants (binary, octal, and hexadecimal constants,
+also known as "typeless" values) have more conforming usage in the
+new standard in contexts where the type is unambiguously known.
+They may now appear as initializers, as right-hand sides of intrinsic
+assignments to integer and real variables, in explicitly typed
+array constructors, and in the definitions of enumerations.
+
+## Citation updates
+
+The source base contains hundreds of references to the subclauses,
+requirements, and constraints of the Fortran 2018 standard, mostly in code comments.
+These will need to be mapped to their Fortran 202X counterparts once the
+new standard is published, as the Fortran committee does not provide a
+means for citing these items by names that are fixed over time like the
+C++ committee does.
+If we had access to the LaTeX sources of the standard, we could generate
+a mapping table and automate this update.