Introduction to Declare Target¶
In OpenMP declare target
is a directive that can be applied to a function or
variable (primarily global) to notate to the compiler that it should be
generated in a particular device’s environment. In essence whether something
should be emitted for host or device, or both. An example of its usage for
both data and functions can be seen below.
module test_0
integer :: sp = 0
!$omp declare target link(sp)
end module test_0
program main
use test_0
!$omp target map(tofrom:sp)
sp = 1
!$omp end target
end program
In the above example, we create a variable in a separate module, mark it
as declare target
and then map it, embedding it into the device IR and
assigning to it.
function func_t_device() result(i)
!$omp declare target to(func_t_device) device_type(nohost)
INTEGER :: I
I = 1
end function func_t_device
program main
!$omp target
call func_t_device()
!$omp end target
end program
In the above example, we are stating that a function is required on device
utilising declare target
, and that we will not be utilising it on host,
so we are in theory free to remove or ignore it there. A user could also
in this case, leave off the declare target
from the function and it
would be implicitly marked declare target any
(for both host and device),
as it’s been utilised within a target region.
Declare Target as represented in the OpenMP Dialect¶
In the OpenMP Dialect declare target
is not represented by a specific
operation
. Instead, it’s an OpenMP dialect specific attribute
that can be
applied to any operation in any dialect, which helps to simplify the
utilisation of it. Rather than replacing or modifying existing global or
function operations
in a dialect, it applies to it as extra metadata that
the lowering can use in different ways as is necessary.
The attribute
is composed of multiple fields representing the clauses you
would find on the declare target
directive i.e. device type (nohost
,
any
, host
) or the capture clause (link
or to
). A small example of
declare target
applied to a Fortran real
can be found below:
fir.global internal @_QFEi {omp.declare_target =
#omp.declaretarget<device_type = (any), capture_clause = (to)>} : f32 {
%0 = fir.undefined f32
fir.has_value %0 : f32
}
This would look similar for function style operations
.
The application and access of this attribute is aided by an OpenMP Dialect
MLIR Interface named DeclareTargetInterface
, which can be utilised on
operations to access the appropriate interface functions, e.g.:
auto declareTargetGlobal =
llvm::dyn_cast<mlir::omp::DeclareTargetInterface>(Op.getOperation());
declareTargetGlobal.isDeclareTarget();
Declare Target Fortran OpenMP Lowering¶
The initial lowering of declare target
to MLIR for both use-cases is done
inside of the usual OpenMP lowering in flang/lib/Lower/OpenMP.cpp. However,
some direct calls to declare target
related functions from Flang’s
lowering bridge in flang/lib/Lower/Bridge.cpp are made.
The marking of operations with the declare target attribute happens in two
phases, the second one optional and contingent on the first failing. The
initial phase happens when the declare target directive and its clauses
are initially processed, with the primary data gathering for the directive and
clause happening in a function called getDeclareTargetInfo
. This is then used
to feed the markDeclareTarget
function, which does the actual marking
utilising the DeclareTargetInterface
. If it encounters a variable or function
that has been marked twice over multiple directives with two differing device
types (e.g. host
, nohost
), then it will swap the device type to any
.
Whenever we invoke genFIR
on an OpenMPDeclarativeConstruct
from the
lowering bridge, we are also invoking another function called
gatherOpenMPDeferredDeclareTargets
, which gathers information relevant to the
application of the declare target
attribute. This information
includes the symbol that it should be applied to, device type clause,
and capture clause, and it is stored in a vector that is part of the lowering
bridge’s instantiation of the AbstractConverter
. It is only stored if we
encounter a function or variable symbol that does not have an operation
instantiated for it yet. This cannot happen as part of the
initial marking as we must store this data in the lowering bridge and we
only have access to the abstract version of the converter via the OpenMP
lowering.
The information produced by the first phase is used in the second phase,
which is a form of deferred processing of the declare target
marked
operations that have delayed generation and cannot be proccessed in the
first phase. The main notable case this occurs currently is when a
Fortran function interface has been marked. This is
done via the function
markOpenMPDeferredDeclareTargetFunctions
, which is called from the lowering
bridge at the end of the lowering process allowing us to mark those where
possible. It iterates over the data previously gathered by
gatherOpenMPDeferredDeclareTargets
checking if any of the recorded symbols have now had their corresponding
operations instantiated and applying the declare target attribute where
possible utilising markDeclareTarget
. However, it must be noted that it
is still possible for operations not to be generated for certain symbols,
in particular the case of function interfaces that are not directly used
or defined within the current module. This means we cannot emit errors in
the case of left-over unmarked symbols. These must (and should) be caught
by the initial semantic analysis.
NOTE: declare target
can be applied to implicit SAVE
attributed variables.
However, by default Flang does not represent these as GlobalOp
’s, which means
we cannot tag and lower them as declare target
normally. Instead, similarly
to the way threadprivate
handles these cases, we raise and initialize the
variable as an internal GlobalOp
and apply the attribute. This occurs in the
flang/lib/Lower/OpenMP.cpp function genDeclareTargetIntGlobal
.
Declare Target Transformation Passes for Flang¶
There are currently two passes within Flang that are related to the processing
of declare target
:
MarkDeclareTarget
- This pass is in charge of marking functions captured (called from) intarget
regions or otherdeclare target
marked functions asdeclare target
. It does so recursively, i.e. nested calls will also be implicitly marked. It currently will try to mark things as conservatively as possible, e.g. if captured in atarget
region it will applynohost
, unless it encounters ahost
declare target
in which case it will apply theany
device type. Functions are handled similarly, except we utilise the parent’s device type where possible.FunctionFiltering
- This is executed after theMarkDeclareTarget
pass, and its job is to conservatively remove host functions from the module where possible when compiling for the device. This helps make sure that most incompatible code for the host is not lowered for the device. Host functions withtarget
regions in them need to be preserved (e.g. for lowering thetarget region
(s) inside). Otherwise, it removes any function marked as adeclare target host
function and any uses will be replaced withundef
’s so that the remaining host code doesn’t become broken. Host functions withtarget
regions are marked with adeclare target host
attribute so they will be removed after outlining the target regions contained inside.
While this infrastructure could be generally applicable to more than just Flang, it is only utilised in the Flang frontend, so it resides there rather than in the OpenMP dialect codebase.
Declare Target OpenMP Dialect To LLVM-IR Lowering¶
The OpenMP dialect lowering of declare target
is done through the
amendOperation
flow, as it’s not an operation
but rather an
attribute
. This is triggered immediately after the corresponding
operation has been lowered to LLVM-IR. As it is applicable to
different types of operations, we must specialise this function for
each operation type that we may encounter. Currently, this is
GlobalOp
’s and FuncOp
’s.
FuncOp
processing is fairly simple. When compiling for the device,
host
marked functions are removed, including those that could not
be removed earlier due to having target
directives within. This
leaves any
, device
or indeterminable functions left in the
module to lower further. When compiling for the host, no filtering is
done because nohost
functions must be available as a fallback
implementation.
For GlobalOp
’s, the processing is a little more complex. We
currently leverage the registerTargetGlobalVariable
and
getAddrOfDeclareTargetVar
OMPIRBuilder
functions shared with Clang.
These two functions invoke each other depending on the clauses and options
provided to the OMPIRBuilder
(in particular, unified shared memory). Their
main purposes are the generation of a new global device pointer with a
“ref_” prefix on the device and enqueuing metadata generation by the
OMPIRBuilder
to be produced at module finalization time. This is done
for both host and device and it links the newly generated device global
pointer and the host pointer together across the two modules.
Similarly to other metadata (e.g. for TargetOp
) that is shared across
both host and device modules, processing of GlobalOp
’s in the device
needs access to the previously generated host IR file, which is done
through another attribute
applied to the ModuleOp
by the compiler
frontend. The file is loaded in and consumed by the OMPIRBuilder
to
populate it’s OffloadInfoManager
data structures, keeping host and
device appropriately synchronised.
The second (and more important to remember) is that as we effectively replace
the original LLVM-IR generated for the declare target
marked GlobalOp
we
have some corrections we need to do for TargetOp
’s (or other region
operations that use them directly) which still refer to the original lowered
global operation. This is done via handleDeclareTargetMapVar
which is invoked
as the final function and alteration to the lowered target
region, it’s only
invoked for device as it’s only required in the case where we have emitted the
“ref” pointer , and it effectively replaces all uses of the originally lowered
global symbol, with our new global ref pointer’s symbol. Currently we do not
remove or delete the old symbol, this is due to the fact that the same symbol
can be utilised across multiple target regions, if we remove it, we risk
breaking lowerings of target regions that will be processed at a later time.
To appropriately delete these no longer necessary symbols we would need a
deferred removal process at the end of the module, which is currently not in
place. It may be possible to store this information in the OMPIRBuilder and
then perform this cleanup process on finalization, but this is open for
discussion and implementation still.
Current Support¶
For the moment, declare target
should work for:
Marking functions/subroutines and function/subroutine interfaces for generation on host, device or both.
Implicit function/subroutine capture for calls emitted in a
target
region or explicitly markeddeclare target
function/subroutine. Note: Calls made via arguments passed to other functions must still be themselves markeddeclare target
, e.g. passing aC
function pointer and invoking it, then the interface and theC
function in the other module must be markeddeclare target
, with the same type of marking as indicated by the specification.Marking global variables with
declare target
’slink
clause and mapping the data to the device data environment utilisingdeclare target
. This may not work for all types yet, but for scalars and arrays of scalars, it should.
Doesn’t work for, or needs further testing for:
Marking the following types with
declare target link
(needs further testing):Descriptor based types, e.g. pointers/allocatables.
Derived types.
Members of derived types (use-case needs legality checking with OpenMP specification).
Marking global variables with
declare target
’sto
clause. A lot of the lowering should exist, but it needs further testing and likely some further changes to fully function.