Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rules for sortslices, unique #546

Merged
merged 6 commits into from
Nov 30, 2021
Merged

Rules for sortslices, unique #546

merged 6 commits into from
Nov 30, 2021

Conversation

mcabbott
Copy link
Member

Closes #392

@oxinabox
Copy link
Member

ReversePropagation and Diffractor failures are unrelated

src/rulesets/Base/sort.jl Outdated Show resolved Hide resolved
function sortslices_pullback(dy)
# No actual need to zero this, and if you didn't, then you could widen eltype
# Also, you could use similar(dy) here not x, same size?
dx = _zerolike_writeat(x, unthunk(dy), (), inds...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the unthunk?
If so, should we push it down inside the _zerolike_writeat ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that ideally, _zerolike_writeat should be upgraded to return an InplaceThunk. And eventually it should be called grad_getindex or something, too.

I'm not sure whether it should handle un-thunking. I guess it wouldn't hurt to add a method. But since most rules at present call unthunk explicitly, maybe it's clearer to call it here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arguably we shouldn't be unthunking if the destination that we are writing into can accept Any.
(but practically that case doesn't really matter since performance is already shot. And likely Zygote will hate that)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is far from being a high-performance function!

If you don't take the shortcut above, then not all entries were unique, and thus _zerolike_writeat has to copy dy into dx at some nontrivial indices. So it has to slice up dy, I don't think it can write just one thunk anywhere.

if dims isa Colon
xs, ys = vec(x), y
else
xs, ys = collect(eachslice(x; dims=dims)), collect(eachslice(y; dims=dims))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an issue open on BlueStyle to remcomment against this
JuliaDiff/BlueStyle#80

If we are going to do this then how do you feel about:

Suggested change
xs, ys = collect(eachslice(x; dims=dims)), collect(eachslice(y; dims=dims))
xs, ys = collect.(eachslice((x, y); dims=dims))

Copy link
Member Author

@mcabbott mcabbott Nov 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could avoid this for style. But I think the broadcast is confusing, and perhaps you do too, because I also think it's missing an easy-to-miss dot:

julia> x, y = rand(2,3), rand(2,3);

julia> collect.(eachslice((x, y); dims=1))
ERROR: MethodError: no method matching eachslice(::Tuple{Matrix{Float64}, Matrix{Float64}}; dims=1)

julia> collect.(eachslice.((x, y); dims=1))
(SubArray{Float64, 1, Matrix{Float64}, Tuple{Int64, Base.Slice{Base.OneTo{Int64}}}, true}[[0.5119304534786525, 0.6182654562598278, 0.16701230957752622], [0.7959010118362386, 0.9477852004109513, 0.3864

xs, ys = collect(eachslice(x; dims=dims)), collect(eachslice(y; dims=dims))
end
mask = isequal.(permutedims(ys), xs) # unique([0.0, -0.0, NaN, NaN])
mask .= (mask .== cumsum(mask, dims=1) .== true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the .== true for handling missing ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's that false == 0 satisfies the first ==.

This is a hacky way of writing findfirst(randn(3,3) .> 0; dims=1) as that doesn't exist. I feel like there ought to be a cleverer way like accumulate(xor, mask; dims) or something, but I didn't see it yet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it wants a comment:

Suggested change
mask .= (mask .== cumsum(mask, dims=1) .== true)
mask .= (mask .== cumsum(mask, dims=1) .== true) # this implements findfirst(mask; dims=1)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we open an issue on JuliaLang/julia and link it here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess another way to write this is map(findfirst, eachcol(mask)), since we have many slices already.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am lazy to wait for another round of CI, so I think I call it good enough for now.

Co-authored-by: Lyndon White <[email protected]>
Copy link
Member

@oxinabox oxinabox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically LGTM

end
mask = isequal.(permutedims(ys), xs) # unique([0.0, -0.0, NaN, NaN])
mask .= (mask .== cumsum(mask, dims=1) .== true)
keep = map(I -> I[1], findall(mask))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
keep = map(I -> I[1], findall(mask))
keep = map(first, findall(mask))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this will work:

julia> map(first, findall(randn(3,3) .> 0))
ERROR: iteration is deliberately unsupported for CartesianIndex. Use `I` rather than `I...`, or use `Tuple(I)...`

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow, I hate it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I sort-of understand why you can't splat or broadcast it, but it's pretty weird that you can still index it.

src/rulesets/Base/sort.jl Outdated Show resolved Hide resolved
function sortslices_pullback(dy)
# No actual need to zero this, and if you didn't, then you could widen eltype
# Also, you could use similar(dy) here not x, same size?
dx = _zerolike_writeat(x, unthunk(dy), (), inds...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arguably we shouldn't be unthunking if the destination that we are writing into can accept Any.
(but practically that case doesn't really matter since performance is already shot. And likely Zygote will hate that)

@mcabbott
Copy link
Member Author

The argument against this treatment of unique, BTW, is that if the variables are "active", then perhaps they should all be considered unique. As e.g. here: JuliaDiff/ForwardDiff.jl#481 (comment)

The counter is that we don't know about activity. Zygote will want a rule for unique even if you aren't differentiating with respect to its argument, and at present this fails. Sadly, in this case, the calculation done here is quite complicated and would be better skipped. Maybe much more of it should be inside a @thunk... although at present that's ignored anyway.

@oxinabox
Copy link
Member

The argument against this treatment of unique, BTW, is that if the variables are "active", then perhaps they should all be considered unique. As e.g. here: JuliaDiff/ForwardDiff.jl#481 (comment)

This is interesting as a topic. But I think beyond the scope of this PR. We can change it later.
We should cross check against what Jax does.
Pretty sure they will do this

Co-authored-by: Lyndon White <[email protected]>
@mcabbott mcabbott merged commit ce78d3d into JuliaDiff:main Nov 30, 2021
@mcabbott mcabbott deleted the unique branch December 1, 2021 10:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rules for sort and friends
2 participants