A first take on Fortran 202X features for LLVM Flang¶
I (Peter Klausler) have been studying the draft PDF of the Fortran 202X standard, which will soon be published as ISO Fortran 2023. I have compiled this summary of its changes relative to the current Fortran 2018 standard from the perspective of a Fortran compiler implementor.
TL;DR¶
Fortran 202X doesn’t make very many changes to the language
relative to Fortran 2018, which was itself a small increment
over Fortran 2008.
Apart from REDUCE
clauses that were added to the
still broken
DO CONCURRENT
construct, there’s little here for Fortran users
to get excited about.
Priority of implementation in LLVM Flang¶
We are working hard to ensure that existing working applications will port successfully to LLVM Flang with minimal effort. I am not particularly concerned with conforming to a new standard as an end in itself.
The only features below that appear to have already been implemented
in other compilers are the REDUCE
clauses and the degree trigonometric
intrinsic functions, so those should have priority as an aid to
portability.
We would want to support them earlier even if they were not in a standard.
The REDUCE
clause also merits early implementation due to
its potential for performance improvements in real codes.
I don’t see any other feature here that would be relevant to
performance (maybe a weak argument could be made for SIMPLE
).
The bulk of this revision unfortunately comprises changes to Fortran that
are neither performance-related, already available in
some compilers, nor (obviously) in use in existing codes.
I will not prioritize implementing them myself over
other work until they become portability concerns or are
requested by actual users.
Given Fortran’s history of the latency between new standards and the support for their features in real compilers, and then the extra lag before the features are then actually used in codes meant to be portable, I doubt that many of the items below will have to be worked on any time soon due to user demand.
If J3 had chosen to add more features that were material improvements to Fortran – and there’s quite a long list of worthy candidates that were passed over, like read-only pointers – it would have made sense for me to prioritize their implementation in LLVM Flang more urgently.
Specific change descriptions¶
The individual features added to the language are summarized in what I see as their order of significance to Fortran users.
Alert: There’s a breaking change!¶
The Fortran committee used to abhor making breaking changes, apart from fixes, so that conforming codes could be portable across time as well as across compilers. Fortran 202X, however, uncharacteristically perpetrates one such change to existing semantics that will silently cause existing codes to work differently, if that change were to be implemented and enabled by default.
Specifically, automatic reallocation of whole deferred-length character
allocatable scalars is now mandated when they appear for internal output
(e.g., WRITE(A,*) ...
)
or as output arguments for some statements and intrinsic procedures
(e.g., IOMSG=
, ERRMSG=
).
So existing codes that allocate output buffers
for such things will, or would, now observe that their buffers are
silently changing their lengths during execution, rather than being
padded with blanks or being truncated. For example:
character(:), allocatable :: buffer
allocate(character(20)::buffer)
write(buffer,'F5.3') 3.14159
print *, len(buffer)
prints 20 with Fortran 2018 but would print 5 with Fortran 202X.
There would have no problem with the new standard changing the behavior in the current error case of an unallocated variable; defining new semantics for old errors is a generally safe means for extending a programming language. However, in this case, we’ll need to protect existing conforming codes from the surprising new reallocation semantics, which affect cases that are not errors.
When/if there are requests from real users to implement this breaking
change, and if it is implemented, I’ll have to ensure that users
have the ability to control this change in behavior via an option &/or the
runtime environment, and when it’s enabled, emit a warning at code
sites that are at risk.
This warning should mention a source change they can make to protect
themselves from this change by passing the complete substring (A(:)
)
instead of a whole character allocatable.
This feature reminds me of Fortran 2003’s change to whole
allocatable array assignment, although in that case users were
put at risk only of extra runtime overhead that was needless in
existing codes, not a change in behavior, and users learned to
assign to whole array sections (A(:)=...
) rather than to whole
allocatable arrays where the performance hit mattered.
Major Items¶
The features in this section are expensive to implement in terms of engineering time to design, code, refactor, and test (i.e., weeks or months, not days).
DO CONCURRENT REDUCE
¶
J3 continues to ignore the
serious semantic problems
with DO CONCURRENT
, despite the simplicity of the necessary fix and their
admirable willingness to repair the standard to fix problems with
other features (e.g., plugging holes in PURE
procedure requirements)
and their less admirable willingness to make breaking changes (see above).
They did add REDUCE
clauses to DO CONCURRENT
, and those seem to be
immediately useful to HPC codes and worth implementing soon.
SIMPLE
procedures¶
The new SIMPLE
procedures constitute a subset of F’95/HPF’s PURE
procedures.
There are things that one can do in a PURE
procedure
but cannot in a SIMPLE
one. But the virtue of being SIMPLE
seems
to be its own reward, not a requirement to access any other
feature.
SIMPLE
procedures might have been more useful had DO CONCURRENT
been
changed to require callees to be SIMPLE
, not just PURE
.
The implementation of SIMPLE
will be nontrivial: it involves
some parsing and symbol table work, and some generalization of the
predicate function IsPureProcedure()
, extending the semantic checking on
calls in PURE
procedures to ensure that SIMPLE
procedures
only call other SIMPLE
procedures, and modifying the intrinsic
procedure table to note that most intrinsics are now SIMPLE
rather than just PURE
.
I don’t expect any codes to rush to change their PURE
procedures
to be SIMPLE
, since it buys little and reduces portability.
This makes SIMPLE
a lower-priority feature.
Conditional expressions and actual arguments¶
Next on the list of “big ticket” items are C-style conditional expressions. These come in two forms, each of which is a distinct feature that would be nontrivial to implement, and I would not be surprised to see some compilers implement one before the other.
The first form is a new parenthesized expression primary that any C programmer would recognize. It has straightforward parsing and semantics, but will require support in folding and all other code that processes expressions. Lowering will be nontrivial due to control flow.
The second form is a conditional actual argument syntax
that allows runtime selection of argument associations, as well
as a .NIL.
syntax for optional arguments to signify an absent actual
argument. This would have been more useful if it had also been
allowed as a pointer assignment statement right-hand side, and
that might be a worthwhile extension. As this form is essentially
a conditional variable reference it may be cleaner to have a
distinct representation from the conditional expression primary
in the parse tree and strongly-typed Expr<T>
representations.
ENUMERATION TYPE
¶
Fortran 202X has a new category of type. The new non-interoperable
ENUMERATION TYPE
feature is like C++’s enum class
– not, unfortunately,
a powerful sum data type as in Haskell or Rust. Unlike the
current ENUM, BIND(C)
feature, ENUMERATION TYPE
defines a new
type name and its distinct values.
This feature may well be the item requiring the largest patch to the compiler for its implementation, as it affects parsing, type checking on assignment and argument association, generic resolution, formatted I/O, NAMELIST, debugging symbols, &c. It will indirectly affect every switch statement in the compiler that switches over the six (now seven) type categories. This will be a big project for little useful return to users.
TYPEOF
and CLASSOF
¶
Last on the list of “big ticket” items are the new TYPEOF and CLASSOF
type specifiers, which allow declarations to indirectly use the
types of previously-defined entities. These would have obvious utility
in a language with type polymorphism but aren’t going to be very
useful yet in Fortran 202X (esp. TYPEOF
), although they would be worth
supporting as a utility feature for a parametric module extension.
CLASSOF
has implications for semantics and lowering that need to
be thought through as it seems to provide a means of
declaring polymorphic local variables and function results that are
neither allocatables nor pointers.
Coarray extensions:¶
NOTIFY_TYPE
,NOTIFY WAIT
statement,NOTIFY=
specifier on image selectorArrays with coarray components
“Rank Independent” Features¶
The RANK(n)
attribute declaration syntax is equivalent to
DIMENSION(:,:,...,:)
or an equivalent entity-decl containing n
colons.
As n
must be a constant expression, that’s straightforward to implement,
though not terribly useful until the language acquires additional features.
(I can see some utility in being able to declare PDT components with a
RANK
that depends on a KIND
type parameter.)
It is now possible to declare the lower and upper bounds of an explicit
shape entity using a constant-length vector specification expression
in a declaration, ALLOCATE
statement, or pointer assignment with
bounds remapping.
For example, real A([2,3])
is equivalent to real A(2,3)
.
The new A(@V)
“multiple subscript” indexing syntax uses an integer
vector to supply a list of subscripts or of triplet bounds/strides. This one
has tough edge cases for lowering that need to be thought through;
for example, when the lengths of two or more of the vectors in
A(@U,@V,@W)
are not known at compilation time, implementing the indexing
would be tricky in generated code and might just end up filling a
temporary with [U,V,W]
first.
The obvious use case for “multiple subscripts” would be as a means to
index into an assumed-rank dummy argument without the bother of a SELECT RANK
construct, but that usage is not supported in Fortran 202X.
This feature may well turn out to be Fortran 202X’s analog to Fortran 2003’s
LEN
derived type parameters.
Minor Items¶
So much for the major features of Fortran 202X. The longer list of minor features can be more briefly summarized.
New Edit Descriptors¶
Fortran 202X has some noncontroversial small tweaks to formatted output.
The AT
edit descriptor automatically trims character output. The LZP
,
LZS
, and LZ
control edit descriptors and LEADING_ZERO=
specifier provide a
means for controlling the output of leading zero digits.
Intrinsic Module Extensions¶
Addressing some issues and omissions in intrinsic modules:
LOGICAL8/16/32/64 and REAL16
IEEE module facilities upgraded to match latest IEEE FP standard
C_F_STRPOINTER, F_C_STRING for NUL-terminated strings
C_F_POINTER(LOWER=)
Intrinsic Procedure Extensions¶
The SYSTEM_CLOCK
intrinsic function got some semantic tweaks.
There are new intrinsic functions for trigonometric functions in units of degrees and half-circles. GNU Fortran already supports the forms that use degree units. These should call into math library implementations that are specialized for those units rather than simply multiplying arguments or results with conversion factors.
ACOSD
,ASIND
,ATAND
,ATAN2D
,COSD
,SIND
,TAND
ACOSPI
,ASINPI
,ATANPI
,ATAN2PI
,COSPI
,SINPI
,TANPI
SELECTED_LOGICAL_KIND
maps a bit size to a kind of LOGICAL
There are two new character utility intrinsic
functions whose implementations have very low priority: SPLIT
and TOKENIZE
.
TOKENIZE
requires memory allocation to return its results,
and could and should have been implemented once in some Fortran utility
library for those who need a slow tokenization facility rather than
requiring implementations in each vendor’s runtime support library with
all the extra cost and compatibilty risk that entails.
SPLIT
is worse – not only could it, like TOKENIZE
,
have been supplied by a Fortran utility library rather than being
added to the standard, it’s redundant;
it provides nothing that cannot be already accomplished by
composing today’s SCAN
intrinsic function with substring indexing:
module m
interface split
module procedure :: split
end interface
!instantiate for all possible ck/ik/lk combinations
integer, parameter :: ck = kind(''), ik = kind(0), lk = kind(.true.)
contains
simple elemental subroutine split(string, set, pos, back)
character(*, kind=ck), intent(in) :: string, set
integer(kind=ik), intent(in out) :: pos
logical(kind=lk), intent(in), optional :: back
if (present(back)) then
if (back) then
pos = scan(string(:pos-1), set, .true.)
return
end if
end if
npos = scan(string(pos+1:), set)
pos = merge(pos + npos, len(string) + 1, npos /= 0)
end
end
(The code above isn’t a proposed implementation for SPLIT
, just a
demonstration of how programs could use SCAN
to accomplish the same
results today.)
Source limitations¶
Fortran 202X raises the maximum number of characters per free form source line and the maximum total number of characters per statement. Both of these have always been unlimited in this compiler (or limited only by available memory, to be more accurate.)
More BOZ usage opportunities¶
BOZ literal constants (binary, octal, and hexadecimal constants, also known as “typeless” values) have more conforming usage in the new standard in contexts where the type is unambiguously known. They may now appear as initializers, as right-hand sides of intrinsic assignments to integer and real variables, in explicitly typed array constructors, and in the definitions of enumerations.
Citation updates¶
The source base contains hundreds of references to the subclauses, requirements, and constraints of the Fortran 2018 standard, mostly in code comments. These will need to be mapped to their Fortran 202X counterparts once the new standard is published, as the Fortran committee does not provide a means for citing these items by names that are fixed over time like the C++ committee does. If we had access to the LaTeX sources of the standard, we could generate a mapping table and automate this update.