Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt at including offsets in kernel launch #399

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

simone-silvestri
Copy link

This PR tries to include offsets in kernel launches so that the Global indices returned by
@index(Global, NTuple) and @index(Global, Linear) are offset by an offset argument.

Example:

julia> @kernel function show_index()
             i, j = @index(Global, NTuple)
             @show i, j
       end
show_index (generic function with 6 methods)

julia> show_index(CPU(), (2, 2), (3, 3), (-1, -2))()
(i, j) = (0, -1)
(i, j) = (1, -1)
(i, j) = (0, 0)
(i, j) = (1, 0)
(i, j) = (2, -1)
(i, j) = (2, 0)
(i, j) = (0, 1)
(i, j) = (1, 1)
(i, j) = (2, 1)

where the last argument (-1, -2) is the offsets to the global indices.

This PR constrains the offsetting of global indices on static kernel size at launch.

@vchuravy I found it a bit difficult to implement arbitrary indices because of the division in blocks, which would have to be rethought. Aka, this is the easiest (probably not the most general) implementation of offsets. Let me know if you would rather it be implemented in another way.

@vchuravy
Copy link
Member

Thanks for the initial implementation I will have to think about this a bit.
I still feel like this may be better expressed as a projection f(Idx) -> Idx.

abstract type Projection end
struct Identity <: Projection end
(::Identity)(Idx) = Idx

struct Offset{Offsets} <: Projection end

@vchuravy
Copy link
Member

@timholy might also be able to offer advise. IIUC you are trying to implement an exterior/interior iteration split like EdgeIterator from https://github.com/JuliaArrays/TiledIteration.jl?

@luraess
Copy link

luraess commented Jun 14, 2023

@vchuravy following this as ability to handle ranges passed to kernels is also a feature that we would necessitate (FD MPI code) to allow for communication computation overlap (in a similar way as pointed out by @simone-silvestri).

@vchuravy
Copy link
Member

You can do this right now as you would do with CUDA.jl/AMDGPU.jl by projecting a smaller ndrange to your custom index space. This is more about if we can do something like that automatically.

@lcw I think had some code that does this for his DG code

@luraess
Copy link

luraess commented Jun 14, 2023

Yeah - having something more automatised could be a nice thing. @utkinis may have a small MWE on what we did recently which would be handy to have as well in KA (similar to the proposed thing).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants