Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't specify AbstractArray for column types in DataFrame rcopy #528

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

asinghvi17
Copy link

This allows other packages to hook into the conversion system when loading R DataFrames.

My usecase here is to provide seamless interop between R and Julia for sf dataframes, so people can take advantage of all of R's tooling without having to re-implement it in Julia.

An example of the use is that by defining rcopytype and rcopy for sfc_MULTIPOLYGON, which is a simple features collection of multipolygons, I can get those geometries converted to the relevant GeoInterface.jl geometry representation for free.

This allows other packages to hook into the conversion system when loading R DataFrames.
@asinghvi17
Copy link
Author

asinghvi17 commented May 7, 2024

It looks like this change breaks single-row DataFrame conversion. Is there another way to get around this, maybe a maybe_wrap_in_array function or something? This functionality is pretty nice because it allows known types in R to be converted seamlessly if they're in data tables.

@mkitti
Copy link
Member

mkitti commented May 12, 2024

I'm looking. I can merge. Why is CI in such a poor state though?

@asinghvi17
Copy link
Author

asinghvi17 commented May 12, 2024

This particular change breaks one of the CI examples, specifically:

rcopy(R"data.frame(a=1,b=2)")

which internally receives a Float64 for each column rather than a vector for each column. I'm not 100% sure how to solve this (though it should definitely be solved before merging).

The issue is that since we aren't passing AbstractArray to rcopy as a defined type anymore, if there's only one element in a column it's interpreted as a single value, leading to the issue here.

Copy link
Collaborator

@palday palday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fundamental problem is that R doesn't actually have native scalar values -- a "scalar" in R is just a vector of length 1.

I'm not familiar with sf, but I'm guessing it's from https://r-spatial.github.io/sf/ ? The correct fix wouldn't be to change the behavior of a general R dataframe, but rather to change the conversion behavior for the relevant type. Notably, R objects can have multiple types, so I suspect sf objects are also data.frame? Let's see how we can implement this with a simpler example

We start by defining the conversion when given an R object and a target Julia type.

julia> using RCall                                                                            
struct                                                                                        
julia> struct Custom end                                                                      
                                                                                              
R> df <- data.frame(a=1)                                                                      
                                                                                              
julia> rcopy(R"df")                                                                           
1×1 DataFrame                                                                                  Row │ a                                                                                      
     │ Float64                                                                                ─────┼─────────                                                                               
   11.0                                                                                
                                                                                              
julia> RCall.rcopy(::Type{Custom}, s::Ptr{Sxp}) = Custom()                                    
                                                                                              
julia> rcopy(Custom, R"df")                                                                   
Custom()

This may already do what you need to do, but requires a bit more manual control for the caller. We can also define a default copy type:

julia> RCall.rcopytype(::Type{RCall.RClass{:custom}}, s::Ptr{VecSxp}) = Custom

The way to give an object multiple types in R is simply assigning a vector:

R> class(df) <- c("data.frame", "custom")

And now the default conversion just works:

julia> rcopy(R"df")
Custom()

We can also still force the conversion to DataFrame

julia> rcopy(RCall.DataFrame, R"df")
1×1 DataFrame
 Row │ a       
     │ Float64 
─────┼─────────
   11.0

These conversion methods would probably be great as a package extension somewhere (unsure in here in RCall or in the relevant spatial package would be better).

@asinghvi17
Copy link
Author

asinghvi17 commented May 16, 2024

Huh! OK, I didn't realize that other classes are able to override data.table even if it is first in the vector of classes. For my particular application, I can just implement a top-level conversion function, so that's my issue solved.

On a more general level, though, this does mean that anything within a data.table can't be copied by a custom function, unless I misunderstand? Consider a data table with 2 columns - one has real numbers, and one has a custom object defined in R. There seems to be no way (without this PR) to allow that custom object to be converted correctly within the data.table as it's turned into a Julia DataFrame?

@palday
Copy link
Collaborator

palday commented May 16, 2024

You can always define a custom rcopy method with the target datatype (as shown in my example) and call that explicitly. rcopy without the target datatype is just syntactic sugar that uses a registered default datatype.

@mkitti
Copy link
Member

mkitti commented May 16, 2024

I see. I understand better now.

@mkitti
Copy link
Member

mkitti commented May 16, 2024

Anyways, I want to see CI passing here. If you need to make the change to CI to make it pass, do it. We can then consider how much breakage is involved and consider if we need to jump major versions.

@palday
Copy link
Collaborator

palday commented May 16, 2024

Anyways, I want to see CI passing here. If you need to make the change to CI to make it pass, do it. We can then consider how much breakage is involved and consider if we need to jump major versions.

@mkitti

Please no, don't change the tests. The tests here caught the introduction of arguably incorrect behavior. The correct behavior is using the explicit conversion target type and defining an appropriate method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants