Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"read_camtrap_dp" duplicates sequenceID when motionDetection and timeLapse are taken simultanuously #297

Open
lrdijkhuis opened this issue Feb 16, 2024 · 6 comments
Assignees
Labels
blocked bug Something isn't working

Comments

@lrdijkhuis
Copy link

When a timeLapse photo is taken while a motion trigger is active, read_camtrap_dp() now duplicates the eventID of the timeLapse sequence and activityDetection sequence. This issue seems to be triggered only when a timeLapse is taken during a acitivity trigger. It occurs here in the source code:

dplyr::full_join(event_obs, by) %>%
when the event_obs is joined to the media. This is done on the basis of deploymentID and a time interval, without specifying what type of trigger induced the asset creation.
A proper way to solve this issue would be adding back the eventID-identifier to the media.csv (now absent) and joining to eventID. Otherwise adding an extra grouping to the event_obs join that differentiates between timeLapse and activityDetection would likely solve the issue. However, it does not seem right to drop a key column like eventID from the media.csv because it holds the very much required connection of asset-info to the photo sequences.

A dummy project reproducing the issue is available, as is an example script.

See below an example of the issue
thumbnail_image001
thumbnail_image002

Kind regards,
Laurens Dijkhuis

@damianooldoni
Copy link
Member

Thanks a lot, @lrdijkhuis, for reporting!

I will try to look at it this week.

@damianooldoni damianooldoni added the bug Something isn't working label Feb 19, 2024
@damianooldoni damianooldoni self-assigned this Feb 19, 2024
@lrdijkhuis
Copy link
Author

@damianooldoni Here is an example to better understand the bug.

Here is the sample data:
datapackage.json
deployments.csv
media.csv
observations.csv

Here is a code to inspect the bug:
` library(camtraptor)
library(tidyverse)

dat <- read_camtrap_dp("C:/data/sequencebug-20240213120120/datapackage.json")

dat$data$observations %>% select(sequenceID, observationType) %>% count(sequenceID) %>% arrange(desc(n))

no duplicate sequenceID in observations

count number of duplicates without losing captureMethod

dat$data$media %>% select(sequenceID, captureMethod) %>% distinct() %>% group_by(sequenceID) %>% mutate(n = n())
#> # A tibble: 314 × 3
#> # Groups: sequenceID [198]
#> sequenceID captureMethod n
#>
#> 1 73da6fa2-0b89-4caa-8ef8-c3b1006b6ebb motionDetection 2
#> 2 e1db19a4-0f63-4c3d-99a6-c9875e49e4f3 motionDetection 2
#> 3 73da6fa2-0b89-4caa-8ef8-c3b1006b6ebb timeLapse 2
#> 4 e1db19a4-0f63-4c3d-99a6-c9875e49e4f3 timeLapse 2
#> 5 e87d6963-ac41-47fa-ac12-04aa89801866 motionDetection 1
#> 6 e8f08db5-07f5-429f-a6f8-e58baa638057 motionDetection 2
#> 7 8298c4ad-1b89-4332-bbfe-44bdf005c588 motionDetection 2
#> 8 e8f08db5-07f5-429f-a6f8-e58baa638057 timeLapse 2
#> 9 8298c4ad-1b89-4332-bbfe-44bdf005c588 timeLapse 2
#> 10 38cc000f-7a87-4d7a-a858-3826d8977894 motionDetection 2
#> # ℹ 304 more rows

many duplicate sequences in media

filter some duplicate seq from media

dat$data$media %>% filter(sequenceID %in% c("b87211da-aae9-4829-a5f3-ede037518617",
"5b27b112-749d-4df9-bf89-66e331150b8d",
"02e7d706-92a0-4937-ab7b-1dabf541a9a8")) %>% as.data.frame()
#> mediaID deploymentID
#> 1 73b66818-317c-4b2d-bbcc-d89ba7d02dbe 28906770-05e4-4427-932b-001618157c98
#> 2 caae59bb-e07f-4ddb-8d7d-7b703c11e124 28906770-05e4-4427-932b-001618157c98
#> 3 b917ad68-35ac-4a6b-9a12-a46eaf1c89d6 28906770-05e4-4427-932b-001618157c98
#> 4 dfd7ac4e-2e7b-4c4d-9499-ca7dfcdadc86 28906770-05e4-4427-932b-001618157c98
#> 5 881dd3b4-9916-4ce6-a870-c3187c6760f8 28906770-05e4-4427-932b-001618157c98
#> 6 63bceb5b-8ec7-40cb-bcaf-1a4380edf47a 28906770-05e4-4427-932b-001618157c98
#> 7 9822aab0-46de-4e27-a5ae-0e34a77307ba 28906770-05e4-4427-932b-001618157c98
#> sequenceID captureMethod timestamp
#> 1 02e7d706-92a0-4937-ab7b-1dabf541a9a8 motionDetection 2019-06-03 15:49:25
#> 2 02e7d706-92a0-4937-ab7b-1dabf541a9a8 timeLapse 2019-06-03 15:49:25
#> 3 5b27b112-749d-4df9-bf89-66e331150b8d motionDetection 2019-06-05 12:38:55
#> 4 5b27b112-749d-4df9-bf89-66e331150b8d motionDetection 2019-06-05 12:38:57
#> 5 5b27b112-749d-4df9-bf89-66e331150b8d timeLapse 2019-06-05 12:38:57
#> 6 b87211da-aae9-4829-a5f3-ede037518617 motionDetection 2019-06-25 09:00:34
#> 7 b87211da-aae9-4829-a5f3-ede037518617 timeLapse 2019-06-25 09:00:34
#> fileName fileMediatype exifData favourite comments _id
#> 1 20240213093957-IMG_0137.JPG image/jpeg FALSE NA
#> 2 20240213093957-IMG_0138.JPG image/jpeg FALSE NA
#> 3 20240213093950-IMG_0198.JPG image/jpeg FALSE NA
#> 4 20240213093950-IMG_0199.JPG image/jpeg FALSE NA
#> 5 20240213093950-IMG_0200.JPG image/jpeg FALSE NA
#> 6 20240213093826-IMG_1654.JPG image/jpeg FALSE NA
#> 7 20240213093826-IMG_1655.JPG image/jpeg FALSE NA

Inspect one sequence in media: sequenceID is unique when captureMethod is not unique, same meta data.

dat$data$media %>% filter(sequenceID %in% c("b87211da-aae9-4829-a5f3-ede037518617"))
#> # A tibble: 2 × 12
#> mediaID deploymentID sequenceID captureMethod timestamp filePath
#>
#> 1 63bceb5b-8… 28906770-05… b87211da-… motionDetect… 2019-06-25 09:00:34 https:/…
#> 2 9822aab0-4… 28906770-05… b87211da-… timeLapse 2019-06-25 09:00:34 https:/…
#> # ℹ 6 more variables: fileName , fileMediatype , exifData ,
#> # favourite , comments , _id

from above selection: media ID is not unique!

dat$data$media %>% filter(mediaID %in% c("63bceb5b-8ec7-40cb-bcaf-1a4380edf47a")) %>% as.data.frame()
#> mediaID deploymentID
#> 1 63bceb5b-8ec7-40cb-bcaf-1a4380edf47a 28906770-05e4-4427-932b-001618157c98
#> 2 63bceb5b-8ec7-40cb-bcaf-1a4380edf47a 28906770-05e4-4427-932b-001618157c98
#> sequenceID captureMethod timestamp
#> 1 b87211da-aae9-4829-a5f3-ede037518617 motionDetection 2019-06-25 09:00:34
#> 2 4d40751c-d17c-417e-953e-700935a4c5e2 motionDetection 2019-06-25 09:00:34
#> filePath
#> 1 https://multimedia.agouti.eu/assets/63bceb5b-8ec7-40cb-bcaf-1a4380edf47a/file
#> 2 https://multimedia.agouti.eu/assets/63bceb5b-8ec7-40cb-bcaf-1a4380edf47a/file
#> fileName fileMediatype exifData favourite comments _id
#> 1 20240213093826-IMG_1654.JPG image/jpeg FALSE NA
#> 2 20240213093826-IMG_1654.JPG image/jpeg FALSE NA
dat$data$media %>% filter(mediaID %in% c("9822aab0-46de-4e27-a5ae-0e34a77307ba")) %>% as.data.frame()
#> mediaID deploymentID
#> 1 9822aab0-46de-4e27-a5ae-0e34a77307ba 28906770-05e4-4427-932b-001618157c98
#> 2 9822aab0-46de-4e27-a5ae-0e34a77307ba 28906770-05e4-4427-932b-001618157c98
#> sequenceID captureMethod timestamp
#> 1 b87211da-aae9-4829-a5f3-ede037518617 timeLapse 2019-06-25 09:00:34
#> 2 4d40751c-d17c-417e-953e-700935a4c5e2 timeLapse 2019-06-25 09:00:34
#> filePath
#> 1 https://multimedia.agouti.eu/assets/9822aab0-46de-4e27-a5ae-0e34a77307ba/file
#> 2 https://multimedia.agouti.eu/assets/9822aab0-46de-4e27-a5ae-0e34a77307ba/file
#> fileName fileMediatype exifData favourite comments _id
#> 1 20240213093826-IMG_1655.JPG image/jpeg FALSE NA
#> 2 20240213093826-IMG_1655.JPG image/jpeg FALSE NA
`

@damianooldoni
Copy link
Member

Hi @lrdijkhuis. Sorry for the delay. Today I will work on this, at last! Thanks for the example. Very much appreciated.

@damianooldoni
Copy link
Member

We are moving the reading/writing functionalities of camtrap Data Packages to a dedicated package, camtrapdp. See also #298). This issue arises during a downconversion from v1.0 to v0.1.6, something we will stop to support very soon. As camtrapdp R package will support camtrap Data Packages from v1.0 onwards, your issue will be automatically solved by using camtrapdp:

# install.packages("devtools")
devtools::install_github("inbo/camtrapdp")
library(camtrapdp)

dat <- read_camtrap_dp("C:/data/sequencebug-20240213120120/datapackage.json") # same function name, but coming from a different package

dat$observations will therefore not contain sequenceID anymore, but eventID.

@lrdijkhuis
Copy link
Author

Hi @damianooldoni, Thanks for your solution, however it only partially fixes my issue. With the camtrapdp package there no longer is a link connecting the media information to the event (photo-sequence). How do you plan to deal with this, while keeping the hierarchical structure: deployments > observations > media?

@damianooldoni
Copy link
Member

Hi @lrdijkhuis. I think this has been solved now in camtrapdp package. @peterdesmet added 3 weeks ago eventID column to media. See section about eventIDs in documentation.
As mentioned there, it can happen that mediaIDs are duplicated, but this happens only if they are associated to multiple events. Indeed, this happens in your datapackage, but notice that the the pair medaID - eventID is unique and the presence of images with captureMethod = timeLapse has no influence on this behavior.

library(camtrapdp)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

# read datapackage
dat <- read_camtrapdp("C://Documents and Settings/damiano_oldoni/Documents/sequencebug-20240213120120/datapackage.json")

# Some mediaIDs are duplicated
media(dat) %>%
  group_by(mediaID) %>%
  add_tally() %>%
  filter(n > 1)
#> # A tibble: 232 × 13
#> # Groups:   mediaID [116]
#>    mediaID    deploymentID captureMethod timestamp           filePath filePublic
#>    <chr>      <chr>        <fct>         <dttm>              <chr>    <lgl>     
#>  1 b8d325f4-… 28906770-05… activityDete… 2019-06-01 17:42:21 https:/… FALSE     
#>  2 b8d325f4-… 28906770-05… activityDete… 2019-06-01 17:42:21 https:/… FALSE     
#>  3 5e48f4a5-… 28906770-05… timeLapse     2019-06-01 17:42:21 https:/… FALSE     
#>  4 5e48f4a5-… 28906770-05… timeLapse     2019-06-01 17:42:21 https:/… FALSE     
#>  5 5518912c-… 28906770-05… activityDete… 2019-06-02 13:11:06 https:/… FALSE     
#>  6 5518912c-… 28906770-05… activityDete… 2019-06-02 13:11:06 https:/… FALSE     
#>  7 c2127096-… 28906770-05… timeLapse     2019-06-02 13:11:06 https:/… FALSE     
#>  8 c2127096-… 28906770-05… timeLapse     2019-06-02 13:11:06 https:/… FALSE     
#>  9 7d72c1c1-… 28906770-05… activityDete… 2019-06-02 13:11:08 https:/… FALSE     
#> 10 7d72c1c1-… 28906770-05… activityDete… 2019-06-02 13:11:08 https:/… FALSE     
#> # ℹ 222 more rows
#> # ℹ 7 more variables: fileName <chr>, fileMediatype <chr>, exifData <chr>,
#> #   favorite <lgl>, mediaComments <chr>, eventID <chr>, n <int>

# But the pair (mediaID - eventID) is unique: no duplicates!
media(dat) %>%
  group_by(mediaID, eventID) %>%
  add_tally() %>%
  filter(n > 1)
#> # A tibble: 0 × 13
#> # Groups:   mediaID, eventID [0]
#> # ℹ 13 variables: mediaID <chr>, deploymentID <chr>, captureMethod <fct>,
#> #   timestamp <dttm>, filePath <chr>, filePublic <lgl>, fileName <chr>,
#> #   fileMediatype <chr>, exifData <chr>, favorite <lgl>, mediaComments <chr>,
#> #   eventID <chr>, n <int>

Created on 2024-05-15 with reprex v2.1.0

In camtraptor we are going to read data packages using camtrapdp under the hood. So, same behavior will be expected once the new camtraptor will be released. I will not fix this in the actual version of camtraptor, as I would rather work on the refactoring to avoid any downconversion.

Please, @lrdijkhuis, let me know if having eventID in media and the behavior shown here will solve your issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants