Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combine get_n_obs() and get_n_individuals() into new function n_observations() #237

Open
peterdesmet opened this issue Jul 13, 2023 · 1 comment
Milestone

Comments

@peterdesmet
Copy link
Member

peterdesmet commented Jul 13, 2023

Suggested in camtraptor July 2023 coding sprint

get_n_obs() returns the number of observations per deployment and species (unless species = NULL)

library(camtraptor)
get_n_obs(mica, species = "Anas platyrhynchos")
#> There are 3 deployments without observations: 577b543a-2cf1-4b23-b6d2-cda7e2eac372, 62c200a9-0e03-4495-bcd8-032944f6f5a1 and 7ca633fa-64f8-4cfc-a628-6b0c419056d7
#> # A tibble: 4 × 3
#>   deploymentID                         scientificName         n
#>   <chr>                                <chr>              <int>
#> 1 29b7d356-4bb4-4ec4-b792-2af5cc32efa8 Anas platyrhynchos     4
#> 2 577b543a-2cf1-4b23-b6d2-cda7e2eac372 Anas platyrhynchos     0
#> 3 62c200a9-0e03-4495-bcd8-032944f6f5a1 Anas platyrhynchos     0
#> 4 7ca633fa-64f8-4cfc-a628-6b0c419056d7 Anas platyrhynchos     0

Created on 2023-07-13 with reprex v2.0.2

get_n_individuals() returns the number of individuals (count sum) per deployment and species (unless species = NULL)

library(camtraptor)
get_n_individuals(mica, species = "Anas platyrhynchos")
#> There are 3 deployments without observations: 577b543a-2cf1-4b23-b6d2-cda7e2eac372, 62c200a9-0e03-4495-bcd8-032944f6f5a1 and 7ca633fa-64f8-4cfc-a628-6b0c419056d7
#> # A tibble: 4 × 3
#>   deploymentID                         scientificName         n
#>   <chr>                                <chr>              <int>
#> 1 29b7d356-4bb4-4ec4-b792-2af5cc32efa8 Anas platyrhynchos    13
#> 2 577b543a-2cf1-4b23-b6d2-cda7e2eac372 Anas platyrhynchos     0
#> 3 62c200a9-0e03-4495-bcd8-032944f6f5a1 Anas platyrhynchos     0
#> 4 7ca633fa-64f8-4cfc-a628-6b0c419056d7 Anas platyrhynchos     0

Created on 2023-07-13 with reprex v2.0.2

We suggest to combine this behaviour into a single function n_observations() that returns count characteristics per deployment and species. Filters are removed from the function, but supported by filters_ functions

n_observations(
  package = NULL,
  group_by = c("deploymentID", "scientificName") # this is the default, but can be changed by the user.
  # The options should probably be limited to this default and "deploymentID",
  # because the function needs to know what table to chose the column from
  # ... removed, see filters
  # species = "all" removed, see filters
  # sex = NULL removed, see filters
  # life_stage = NULL removed, see filters
  # datapkg removed
)

The returned information would be:

deploymentID
scientificName (if part of group_by)
n_events # also useful, number of sequences/events of within the deployment (for this species)
n_observations # same as n of get_n_obs
n_individuals # same as n of get_n_individuals
@damianooldoni
Copy link
Member

damianooldoni commented Jul 24, 2024

Some thoughts:

n_observations()

I was doubting about the new name, n_observations(), as it is against the best code practice we wrote about naming functions: use verbs to name functions whenever possible. Why not get_n_observations()? Because it's too long 😄 So, maybe we should indeed stick to our original plan from the code sprint of 2023 and so using n_observations() as the new flagship function, together with rai() (#238) and n_species() (#243).

Return number of obs/individuals/events in one data.frame

Getting all these information in one shot can be practical. Otherwise, users must run functions get_n_obs() and get_n_idividuals separately. Function get_n_events() doesn't exist, but it should be added if we opt for leaving functions separate. Also, the reasoning of merging information in one data.frame is the result of a logical thematic grouping: observations, RAI (#238) and species (#243). So, I agree on the approach described above. Now, it's important to take into account the consequences of it. See sections below.

Deprecation vs defunct

I don't like making the get_* functions defunct as good practice says: "make functions defunct only after a sufficiently long deprecation period", see ROpenScience guide. I will deprecate them and I will make them defunct while releasing a later 1.x version of the package. Same for RAI related functions,

Filtering

Yes, filtering about sex and life stage will occur before via filter_observations() and so sex and life_stage will not be part of the new function n_observations(). But what about using sex and life_stage in deprecated functions get_n_obs() and get_n_individuals()? Again, making arguments defunct is bad practice. I would return a deprecation warning, but still I would allow the users to use them. A x <- filter_observations(x, sex == ... , life_stage == ...) will run behind the screen.

In my opinion, what I described in this comment is the best way to both advance the package developement and provide a smooth experience to users.

@peterdesmet, @PietrH, @sannegovaert, @jimcasaer, @MartijnUH: any thought?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants