Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Closes #2716) Add support for module-inlining calls to polymorphic kernels/routines #2732

Open
wants to merge 68 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
3bbdef5
#2716 add initial fix and test [skip ci]
arporter Sep 17, 2024
459e26d
#2716 fix linting
arporter Sep 17, 2024
37188bf
#2716 rm check for polymorphic kernels, ensure renamed kern is public
arporter Sep 18, 2024
4aa1c0c
#2716 fix linting
arporter Sep 18, 2024
dd29ca3
Merge branch 'master' into 2716_transform_interface_bug
arporter Sep 27, 2024
4383c7c
#2716 WIP exploring options
arporter Sep 27, 2024
60f4add
Merge branch 'master' into 2716_transform_interface_bug
arporter Oct 2, 2024
62b5da5
#2716 WIP plumbing-in inlining of multiple kernel routines
arporter Oct 2, 2024
5a10b88
#2716 more fixes [skip ci]
arporter Oct 2, 2024
d217661
Merge branch 'master' into 2716_transform_interface_bug
arporter Oct 3, 2024
cc657a9
#2716 get KernelModuleInlineTrans tests working [skip ci]
arporter Oct 3, 2024
08f18c8
#2716 fix linting
arporter Oct 3, 2024
53147c6
#2716 more linting
arporter Oct 3, 2024
4198608
#2716 fix a lot of tests
arporter Oct 3, 2024
e079988
#2716 fix remaining tests
arporter Oct 3, 2024
ca153a4
#2716 fix examples
arporter Oct 3, 2024
e35bcb2
Merge branch 'master' into 2716_transform_interface_bug
arporter Oct 3, 2024
71a2630
#2716 revert some unnecessary changes
arporter Oct 3, 2024
ba58c82
#2716 tidying and improving comments/docstrings
arporter Oct 4, 2024
2ac83b5
#2716 add tests for KernelModuleInlineTrans
arporter Oct 7, 2024
e5d5699
#2716 fix coverage of gocean_move_iteration_boundaries_inside
arporter Oct 7, 2024
f026e21
#2716 rm need for polymorphic checks for GOcean Kernels
arporter Oct 7, 2024
6925697
#2716 improve coverage
arporter Oct 7, 2024
0b6ce32
#2716 improve _rm_imported_symbol and only attempt to add interface s…
arporter Oct 8, 2024
a00f367
#2716 add InterfaceDeclGen to f2pygen
arporter Oct 8, 2024
2b82201
#2716 fixes for the transformation in LFRic
arporter Oct 8, 2024
b9987fc
Merge branch 'master' into 2716_transform_interface_bug
arporter Oct 8, 2024
0fd9464
#2716 fix tests broken by merge
arporter Oct 8, 2024
2e76d3a
#2716 update opt script in repo and fix OMPDeclareTargetTrans
arporter Oct 8, 2024
721ff5d
#2716 mark MATMUL as available on GPU
arporter Oct 8, 2024
8319e95
#2716 fix test for matmul on gpu
arporter Oct 8, 2024
33376ff
#2716 ensure Kern points to inlined PSyIR after transformation [skip ci]
arporter Oct 9, 2024
e8b3c0b
#2716 improvements to validation of calls that resolve to multiple ro…
arporter Oct 10, 2024
0903b0a
#2716 add new inlining test
arporter Oct 10, 2024
a8d357d
#2716 add new test source file
arporter Oct 10, 2024
48bac41
#2716 return early if PSyKAl kernel already module inlined
arporter Oct 11, 2024
df74591
Merge branch 'master' into 2716_transform_interface_bug
arporter Oct 11, 2024
c26b8cc
#2716 improve apply() so that it returns early if routine already inl…
arporter Oct 11, 2024
4568467
#2716 update lfric inlining example (eg2)
arporter Oct 14, 2024
98daf23
Merge branch 'master' into 2716_transform_interface_bug
arporter Nov 4, 2024
bcb18ea
Merge branch 'master' into 2716_transform_interface_bug
arporter Nov 7, 2024
f11dbc9
Merge branch 'master' into 2716_transform_interface_bug
arporter Nov 7, 2024
e91a4e7
Merge branch 'master' into 2716_transform_interface_bug
arporter Nov 14, 2024
dbdec4f
#2716 tidying after merge
arporter Nov 14, 2024
a0ac65a
Merge branch 'master' into 2716_transform_interface_bug
arporter Nov 18, 2024
62b53d6
#2716 add RANDOM_NUMBER to intrinsics available on device
arporter Nov 18, 2024
8b025db
#2716 rename Kern._kern_schedule to plural and tidy
arporter Nov 18, 2024
3d5985b
Merge branch 'master' into 2716_transform_interface_bug
arporter Nov 19, 2024
7cdb373
#2716 add xfail for failure to compile case where only one kernel is …
arporter Nov 19, 2024
d18730f
#2716 improve xfailing test to be more specific
arporter Nov 20, 2024
029e647
Merge branch 'master' into 2716_transform_interface_bug
arporter Nov 21, 2024
be9d61a
#2716 improve comment
arporter Nov 22, 2024
25a3bc6
Merge branch 'master' into 2716_transform_interface_bug
arporter Nov 25, 2024
7dce555
Merge branch 'master' into 2716_transform_interface_bug
arporter Nov 27, 2024
b37f41a
Merge branch 'master' into 2716_transform_interface_bug
arporter Nov 28, 2024
ee64d54
#2732 mv fix for interface symbols into new lfric_psy.py file
arporter Nov 28, 2024
5289791
#2716 extend get_callees() to allow for private routines
arporter Nov 28, 2024
ce6f6a9
#2716 fix linting
arporter Nov 28, 2024
54b9a7f
#2716 experiment with recursive inlining
arporter Nov 28, 2024
c739468
#2716 fix bug in get_callees() when no Container is created
arporter Nov 29, 2024
622e015
#2716 improve robustness of Matmul2CodeTrans.validate()
arporter Nov 29, 2024
6f45c36
#2716 add Matmul2CodeTrans to gpu_offloading.py
arporter Nov 29, 2024
5fdf36d
#2716 fixes to KernelModuleInlineTrans to allow for previously-inline…
arporter Nov 29, 2024
b602078
Merge branch 'master' into 2716_transform_interface_bug [skip ci]
arporter Dec 2, 2024
bc31ad7
#2716 tidy after merge
arporter Dec 2, 2024
6b1af69
#2716 fix test failures after merge
arporter Dec 2, 2024
f1dc447
#2716 tidy updated gpu_offloading.py to fix linting errors
arporter Dec 10, 2024
5c72806
Merge branch 'master' into 2716_transform_interface_bug
arporter Dec 10, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 8 additions & 5 deletions examples/gocean/eg3/ocl_trans.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,10 +37,10 @@
the first Invoke to use OpenCL. '''

from psyclone.psyGen import InvokeSchedule
from psyclone.psyir.transformations import \
FoldConditionalReturnExpressionsTrans
from psyclone.domain.gocean.transformations import GOOpenCLTrans, \
GOMoveIterationBoundariesInsideKernelTrans
from psyclone.psyir.transformations import (
FoldConditionalReturnExpressionsTrans)
from psyclone.domain.gocean.transformations import (
GOOpenCLTrans, GOMoveIterationBoundariesInsideKernelTrans)


def trans(psyir):
Expand All @@ -62,7 +62,10 @@ def trans(psyir):
move_boundaries_trans.apply(kern)
# Change the syntax to remove the return statements introduced by the
# previous transformation
fold_trans.apply(kern.get_kernel_schedule())
_, kschedules = kern.get_kernel_schedule()
# NOTE: we assume the kernel is not polymorphic and thus there is
# only one schedule associated with it.
fold_trans.apply(kschedules[0])
# Specify the OpenCL queue and workgroup size of the kernel
# In this case we dispatch each kernel in a different queue to check
# that the output code has the necessary barriers to guarantee the
Expand Down
5 changes: 4 additions & 1 deletion examples/lfric/eg15/matvec_opt.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,10 @@ def trans(psyir):

for kernel in psyir.coded_kernels():
if kernel.name.lower() == "matrix_vector_kernel_code":
kernel_schedule = kernel.get_kernel_schedule()
_, kernel_schedules = kernel.get_kernel_schedule()
# For simplicity, ASSUME that the kernel is not polymorphic and
# thus only has one schedule.
kernel_schedule = kernel_schedules[0]
# Replace matmul with inline code
for icall in kernel_schedule.walk(IntrinsicCall):
if icall.intrinsic is IntrinsicCall.Intrinsic.MATMUL:
Expand Down
71 changes: 68 additions & 3 deletions examples/lfric/scripts/gpu_offloading.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,10 +43,14 @@
'''
import os
import sys
from psyclone.domain.common.transformations import KernelModuleInlineTrans
from psyclone.domain.lfric import LFRicConstants
from psyclone.psyir.nodes import Directive, Loop, Routine
from psyclone.psyGen import CodedKern
from psyclone.psyir.nodes import (
Call, Directive, IntrinsicCall, Loop, Routine, Schedule)
from psyclone.psyir.transformations import (
ACCKernelsTrans, TransformationError, OMPTargetTrans)
ACCKernelsTrans, InlineTrans, Matmul2CodeTrans, OMPTargetTrans,
TransformationError)
from psyclone.transformations import (
Dynamo0p3ColourTrans, Dynamo0p3OMPLoopTrans,
Dynamo0p3RedundantComputationTrans, OMPParallelTrans,
Expand All @@ -58,9 +62,54 @@
INVOKE_EXCLUSIONS = [
]

# We won't attempt to inline calls to routines with names that contain
# these strings.
INLINE_EXCLUSIONS = ["abort", "logging"]

OFFLOAD_DIRECTIVES = os.getenv('LFRIC_OFFLOAD_DIRECTIVES', "none")


def _inline_calls(kern):
'''
Recursively inline all calls within the supplied Kernel or Routine.

:param kern: the Kernel or Routine to inline any Calls into.

'''
mod_inline_trans = KernelModuleInlineTrans()
intrans = InlineTrans()
matrans = Matmul2CodeTrans()

if isinstance(kern, CodedKern):
_, scheds = kern.get_kernel_schedule()
else:
scheds = [kern]
for sched in scheds:
sched: Schedule
for call in sched.walk(Call):
call: Call
if isinstance(call, IntrinsicCall):
try:
matrans.apply(call)
except TransformationError:
pass
continue
if any(name in call.routine.name for name in INLINE_EXCLUSIONS):
continue
try:
for inner_call in call.get_callees():
_inline_calls(inner_call)
mod_inline_trans.apply(call)
try:
intrans.apply(call)
except TransformationError as err:
print(f"Failed to inline call {call.debug_string()}:\n"
f"{err}")
except (TransformationError, NotImplementedError) as err:
print(f"Failed to module-inline routine {call.routine.name}:\n"
f"{err}")


def trans(psyir):
'''Applies PSyclone colouring and GPU offloading transformations. Any
kernels that cannot be offloaded to GPU are parallelised using OpenMP
Expand All @@ -76,6 +125,7 @@ def trans(psyir):
otrans = Dynamo0p3OMPLoopTrans()
const = LFRicConstants()
cpu_parallel = OMPParallelTrans()
mod_inline_trans = KernelModuleInlineTrans()

if OFFLOAD_DIRECTIVES == "omp":
# Use OpenMP offloading
Expand Down Expand Up @@ -120,7 +170,8 @@ def trans(psyir):
else:
offload = True

# Keep a record of any kernels we fail to offload
# Keep a record of any kernels we fail to offload.
failed_inline = set()
failed_to_offload = set()

# Colour loops over cells unless they are on discontinuous spaces
Expand All @@ -137,6 +188,20 @@ def trans(psyir):
if loop.iteration_space.endswith("cell_column"):
if offload:
for kern in loop.kernels():
try:
mod_inline_trans.apply(kern)
_inline_calls(kern)
# At this point we would like to fully inline the
# kernel but InlineTrans does not accept a
# CodedKern. If we lower this kernel first then we
# get errors later (at code-generation time).
# Hopefully this will be resolved when we move
# LFRic to use the PSyIR backend for code
# generation.
except TransformationError as err:
failed_inline.add(kern.name.lower())
print(f"Failed to module-inline kernel "
f"'{kern.name}' due to:\n{err.value}")
try:
gpu_annotation_trans.apply(kern)
except TransformationError as err:
Expand Down
13 changes: 7 additions & 6 deletions examples/lfric/scripts/kernel_print.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,11 +55,12 @@ def trans(psyir):
# Loop over all of the Kernels Calls
for kernel in psyir.coded_kernels():
try:
kernel_schedule = kernel.get_kernel_schedule()
if kernel_schedule not in already_printed:
kern = fortran_writer(kernel_schedule)
print(kern)
already_printed.append(kernel_schedule)
_, kernel_schedules = kernel.get_kernel_schedule()
for ksched in kernel_schedules:
if ksched not in already_printed:
kern = fortran_writer(ksched)
print(kern)
already_printed.append(ksched)
except Exception as err: # pylint: disable=broad-except
print(f"Code of '{kernel.name}' in "
print(f"Code of '{kernel.name}' "
f"cannot be printed because:\n{err}")
4 changes: 3 additions & 1 deletion examples/xdsl/backend/xdsl.py
Original file line number Diff line number Diff line change
Expand Up @@ -439,7 +439,9 @@ def checkIfStringIsType(self, string, typ):

def nemokern_node(self, node):
exec_statements = []
schedule = node.get_kernel_schedule()
_, schedules = node.get_kernel_schedule()
# IGNORE polymorphic routines.
schedule = schedules[0]
for child in schedule.children:
exec_statements.append(self._visit(child))
return exec_statements
Expand Down
Loading
Loading