This repository has been archived by the owner on Nov 1, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 3
/
03-AccessingData.Rmd
591 lines (413 loc) · 34.4 KB
/
03-AccessingData.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
# Accessing and understanding detections data {#accessingData}
*This chapter was contributed by Tara L. Crewe, Zoe Crysler, and Philip Taylor. Revisions by Steffi LaZerte and Denis Lepage*
```{r tidyr3, include = FALSE}
library(knitr)
opts_chunk$set(tidy.opts=list(width.cutoff=50), tidy = FALSE)
# Run the tagme() here so we can avoid an overabundance of messages
library(motus)
motus:::sessionVariable(name = "userLogin", val = "motus.sample")
motus:::sessionVariable(name = "userPassword", val = "motus.sample")
sql.motus <- tagme(projRecv = 176, new = !file.exists("./data/project-176.motus"),
update = TRUE, dir = "./data/")
```
**Before downloading your detection data, please ensure that you have no pending metadata issues through the online [Data Issues page](https://motus.org/data/issues)**
This chapter will begin with an introduction to the structure of the detections database, followed by instructions on how to download and access the data. At the end, a summary section that includes a script to download, select variables, clean data, and export is provided (see \@ref(importSummary))
## Data structure {#databaseStructure}
Each tag detection database is stored as an SQLite (`dplyr::src_sqlite`) file with the extension '.motus'. The sqlite format was chosen because:
1. it is **flexible**, allowing for many data formats.
2. it is **accessible** from many software platforms (not just R).
3. it is **appendable**, meaning the database can be created and updated on disk without having to read in and resave the entire contents. This will save time and computer memory when searching to see if any new detections are available for your project or receiver.
The .motus file contains a series of interrelated tables where data are stored in a condensed format to save memory. The following tables are included in your .motus file;
1. **`activity`**: data related to radio activity for each hour period (`hourBin`) at each antenna, including a count of the number of short runs used in helping identify false detections.
2. **`admInfo`**: internal table used to keep track of your the motus package used to create your motus file, and the data version.
3. **`antDeps`**: metadata related to antenna deployments, e.g., deployment height, angle, antenna type.
4. **`batchRuns`**: metadata for runIDs and associated batchIDs
5. **`batches`**: detection data for a given receiver and boot number.
6. **`filters`**: metadata related to user created filters associated with the specified receiver.
7. **`gps`**: metadata related to Geographic Positioning System (GPS) position of receiver.
8. **`hits`**: detection data at the level of individual hits.
9. **`meta`**: metadata related to the project and datatype (tags vs. receivers) that are included in the .motus file
101. **`nodeData`**: data related to nodes by `batchID` and time (`ts`)
102. **`nodeDeps`**; metadata related to nodes
10. **`projAmbig`**: metadata related to what projects have ambiguous tag detections
11. **`projs`**: metadata related to projects, e.g., project name, principal investigator.
12. **`pulseCounts`**: number of radio pulses measured on each antenna over 1 hour periods (`hourBin`).
13. **`recvDeps`**: metadata related to receiver deployments, e.g., deployment date, location, receiver characteristics.
14. **`recvs`**: metadata related to receiver serial number and associated Motus deviceID
15. **`runs`**: detection data associated with a run (continuous detections of a unique tag on a given receiver).
16. **`runsFilters`**: a list of runIDs associated with user created filters and assigned probabilities.
17. **`species`**: metadata related to species, e.g., unique identifier, scientific name, common name.
18. **`tagAmbig`**: metadata related to ambiguous tags, e.g., ambigID and associated motusTagID
19. **`tagDeps`**: metadata related to tag deployments, e.g., deployment date, location, and species.
20. **`tagProp`**: metadata related to custom deployment properties entered by the principal investigator (e.g. body weight).
21. **`tags`**: metadata related to tags, e.g., unique identifier, tag characteristics (e.g., burst interval).
In addition to these tables, there are also 'virtual' tables or 'views', which have been created through queries that merge data from the various tables into a single convenient 'view' that contains all of the fields you are likely to need. The following views are currently included in each .motus file:
1. **`allambigs`**: lists in long-data format each motusTagID (up to 6) associated with each negative ambigID.
2. **`alltags`**: provides the full detection data for all tags, and all ambiguous (duplicate) tags, associated with your project. Ambiguous detections are repeated for each motusTagID represented by each ambigID.
3. **`alltagsGPS`**: same as `alltags` but includes GPS latitude, longitude and altitude (much slower to load on large databases).
Because the file is a `dplyr::src_sqlite` file, all of the `dplyr` functions can be used to filter and summarize the .motus database, without needing to first save the data as a *flat* file (a typical two-dimensional dataframe). The SQL format is very advantageous when you have a large file -- the queries using SQL will be substantially faster than those done on a flat dataframe.
## Database types {#databaseTypes}
There are two types of tag detection databases available for download:
1. **receiver database**: includes all detections of any registered tags from a single receiver. A receiver database has a name like SG-1234BBBK5678.motus, where the name is the serial number of the receiver.
2. **project database**: includes all detections of your registered tags from across the Motus network. A tag project database has a name like project-123.motus, where the number is the Motus project ID.
These two databases correspond to the basic model of data sharing:
1. you get all detections of *anyone's* tags by *your* receivers (i.e., one receiver tag database for each receiver you deploy).
2. you get all detections of *your* tags by *anyone's* receivers (i.e., one project tag database for each of your Motus projects).
## Load relevant R packages {#loadPackages}
Before we begin working with data, we need to load the required packages for this chapter. If you have not yet *installed* these packages (from github and CRAN) then please return to Chapter \@ref(loadingPackages) and do so.
```{r loadPackages, warning = FALSE, message = FALSE}
library(motus)
library(lubridate)
library(dplyr)
```
## Set system environment
Set the system environment time zone to Greenwich Mean Time (UTC), to ensure that you are always working in UTC. This is a very important step, and should be part of every working session. If you fail to do this, then two problems can arise. Times are stored in the Motus database in UTC, and if you do not keep your environment in UTC, then they can be inadvertently changed during import. Second, if tags have been detected across multiple time zones, then they can also inadvertently be changed.
```{r setTimeZone1}
Sys.setenv(TZ = "UTC")
```
## Downloading tag detections {#importDetections}
To import tag detections for your project or receiver, you need a numerical project id or character scalar receiver serial number.
The success of the Motus network is dependent on the timely upload of detection data from receivers, and on the maintenance of accurate and up to date tag and receiver metadata by collaborators. After downloading your data from the Motus server, users are encouraged to check for updated detection data (see sections \@ref(tagmeUpdate) and \@ref(tellme)) and metadata (see section \@ref(forceMeta)) each time they run an analysis, because collaborators can add detection data and metadata at any time, and these could influence the completeness of your own detections data.
Be warned that large datasets can take some time (sometimes a few hours) to download from the Motus server when downloading for the first time. After the initial download, loading a .motus file into R and updating for any new data will be near instantaneous.
### Download data for a project for the _first time_ {#downloadData}
All data downloads are completed using the `tagme()` function in the `motus` R package. This function will save an SQLite database to your computer with the extension ".motus"; further details on data structure are in section \@ref(databaseStructure). The following parameters are available for the `tagme()` function:
- **`projRecv`**: integer project number OR a character vector receiver serial number.
- **`new`**: if set to `TRUE`, it will create a new empty .motus file in your local directory. Do not use this parameter or set it to `FALSE` if you already have a .motus file.
- **`update`**: if set to `TRUE`, will download all available data to your existing .motus file. Must be set to `TRUE` on your first data download and any subsequent downloads if you wish to check for new data. Set to `FALSE` if you do not wish to check for new data (e.g., if working offline).
- **`dir`**: Your .motus data is automatically saved to your working directory, unless you specify a different location using this parameter.
- **`forceMeta`**: if set to `TRUE`, it will force an update of metadata to an existing .motus file.
Throughout this book we use sample data (see section \@ref(sampleData)) which has been assigned to project 176.
Let's get started by downloading data by project - this will include all detections of your tags on any receiver.
Note that when downloading data from the Motus server for the **first time**, you must specify `new = TRUE` and `update = TRUE`. You will also be prompted to login (see \@ref(userAuthentication) below).
**Unless the directory that you want your data saved in is stated explicitly within the function call, data will be downloaded to the current working directory.**
Lets start by determining what our working directory is so we know where our file will be saved.
```{r tagme0, eval = FALSE}
getwd()
```
Specify the project number you wish to download detections for, in this case the sample data project.
```{r, eval = TRUE}
proj.num <- 176
```
As this is the first time you are downloading data for project 176, set `new = TRUE` and `update = TRUE`. This will create a .motus file in your current working directory, which was shown above using `getwd()` This will also create an SQL object in your R environment called "sql.motus"
```{r, eval = FALSE}
sql.motus <- tagme(projRecv = proj.num, new = TRUE, update = TRUE)
```
Alternatively you can specify a different location to save the data by entering your preferred filepath. In this example we save to our data folder using the dir parameter. Note that "./" simply means 'relative to the current folder' (shown by `getwd()`).
```{r, eval = FALSE}
sql.motus <- tagme(projRecv = proj.num, new = TRUE, update = TRUE,
dir = "./data/")
```
> **Note:** You'll need to use the **username** `motus.sample` and the **password** `motus.sample` to access this data (see below for more details)!
Using `tagme()` as shown above will download a file to your working or specified directory called "project-176.motus" for the sample data (the number in the file name corresponds to the project number). The progress of the download process should print on the console; if you are not seeing it, try scrolling down your screen while `tagme()` is running.
In the event that your connection to the Motus server fails prior to a complete download (e.g., due to a poor internet connection), use `tagme(proj.num, update = TRUE)` to continue the download from where it left off, ensuring to specify a directory if it is saved outside the working directory.
### User Authentication {#userAuthentication}
#### Login {#login}
The first time you call a function using the Motus R package, you will be asked to enter your motus.org username and password in the R console to authenticate your access to project data. This will only happen once per R session. If you do not have a Motus username and password, you can sign up at <https://motus.org/data/user/new>. Permission to access project data will then be granted by Motus staff or the project principal investigator.
Throughout this book we use sample data (see section \@ref(sampleData)) which has been assigned to project 176. When accessing this data you will need to login using username and password 'motus.sample' in the R console when prompted by the `tagme()` function (see section \@ref(downloadData)). It will look like this:
```{r fig3.2, echo=FALSE, fig.align='center'}
knitr::include_graphics("images/SampleLogin.png")
```
To download data for one of your own projects, you simply need to change the project number to that of your own project in the `tagme()` call, and enter your own Motus login/password in the R console when prompted. If you are already logged in as the sample data user, you will need to first logout to download your own data (see \@ref(logout)).
#### Logging out {#logout}
Once you are logged in under one user account, you will not be able to access data from another account. If you need to logout of the current account to access other data, you can run the code below.
```{r logout, eval = FALSE}
motusLogout()
```
### Download data for a receiver for the _first time_
We could also download data by receiver through the same process as described above. This will provide you with all detections of any tags on the specified receiver. As there are no receivers registered to sample project 176, **this call will not work**. If you have a receiver registered to your own project, replace the receiver serial number in the tagme call below with the serial number for your own receiver, ensuring that you are logged in using your own credentials (see section \@ref(logout)).
```{r tagmeRecv, eval = FALSE}
proj.num <- "SG-123BBBK1234"
sql.motus <- tagme(projRecv = proj.num, new = TRUE, update = TRUE)
```
This will download a file to your working directory named 'SG-123BBBK1234.motus'.
Some users may wish to work directly with the .motus SQLite file. However, since many users are more familiar with a 'flat' dataframe format, instructions to view the the data as a flat dataframe within R, and on how to export the flat file to .csv or .rds format, are included below. Throughout the majority of this book, we use a flat dataframe format.
### Downloading multiple receivers at the same time
If you have a large number of receivers in your project, and wish to get receiver specific data for each one, rather than downloading them one by one as above, we can download them with a simple loop. Note that since the sample project doesn't have any receivers associated with it, this script will not result in a download but you can try it with your own project if you have receivers.
```{r tagmeAllRecvs, eval = FALSE}
# specify the project whose receivers you wish to download:
proj.num <- 176
# get a copy of the metadata only
sql.motus <- tagme(proj.num, new = TRUE, update = FALSE, dir = "./data/")
metadata(sql.motus, proj.num)
tbl.recvDeps <- tbl(sql.motus, "recvDeps")
df.serno <- tbl.recvDeps %>%
filter(projectID == proj.num) %>%
select(serno) %>%
distinct() %>%
collect() %>% as.data.frame()
# loop through each receiver (may take a while!)
for (row in 1:nrow(df.serno)) {
tagme(df.serno[row, "serno"], dir = "./data/", new = TRUE, update = TRUE)
}
# Note you can remove the dir argument if you want to save it to your working
# directory, just make sure that you use the same directory in both calls
```
You can also create a list of receivers you'd like to download if you don't want to download project-wide receivers:
```{r tagmeMultipleRecvs, eval = FALSE}
# create list of receivers you'd like to download
df.serno <- c("SG-AB12RPI3CD34", "SG-1234BBBK4321"))
# loop through each receiver (may take a while!), and save to the working directory
for (k in 1:length(df.serno)) {
tagme(df.serno[k], new = TRUE, update = TRUE)
}
# loop through each receiver (may take a while!), and save to a specified directory
for (k in 1:length(df.serno)) {
tagme(df.serno[k], dir = "/Users/zoecrysler/Downloads/",
new = TRUE, update = TRUE)
}
```
### Updating all .motus files within a directory
Once you have .motus files, you can also update them all by simply calling the `tagme()` function but leaving all arguments blank, apart from the directory:
```{r AllRecvsUpdate, eval = FALSE}
# If you have them saved your working directory:
tagme()
# If you have them saved in a different directory:
tagme(dir = "./data/")
```
### Accessing downloaded detection data {#accessSQL}
Now that we've downloaded our data as an SQLite database and loaded it into an R object called `sql.motus`, we want to access the tables stored within. Detailed descriptions of all the tables stored in the .motus file can be found in section \@ref(databaseStructure).
You can also view the list of tables, and variables contained within those tables, using the `DBI` and `RSQLite` packages (these are automatically installed when you install `motus`).
```{r sqlTables, eval = TRUE}
library(DBI)
library(RSQLite)
# specify the filepath where your .motus file is saved, and the file name.
file.name <- dbConnect(SQLite(), "./data/project-176.motus")
# get a list of tables in the .motus file specified above.
dbListTables(file.name)
# get a list of variables in the "species" table in the .motus file.
dbListFields(file.name, "species")
```
The *virtual* table `alltags` contains the detection data, along with all metadata variables that most users will ever need from the various underlying .motus tables (but now excludes GPS data).
We access the tables using the `tbl()` function from the dplyr package which we installed in Chapter 2:
```{r getAllTagsTable, eval = TRUE}
# this retrieves the "alltags" table from the "sql.motus" SQLite file we read in earlier
tbl.alltags <- tbl(sql.motus, "alltags") # virtual table
```
We now have a new `tbl.alltags` object in R. The underlying structure of these tables is a list of length 2:
```{r dfStructure, eval = TRUE}
str(tbl.alltags)
```
The first part of the list, `src`, is a list that provides details of the SQLiteConnection, including the directory where the database is stored. The second part is a list that includes the underlying table. Thus, the R object `alltags` is a *virtual* table that stores the database structure and information required to connect to the underlying data in the .motus file. As stated above, the advantage of storing the data in this way is that it saves memory when accessing very large databases, and functions within the dplyr package can be used to manipulate and summarize the tables before collecting the results into a typical 'flat' format dataframe.
If you want to use familiar functions to get access to components of the underlying data frame, then use the `collect()` function. For example, to look at the names of the variables in the alltags table:
```{r, eval = TRUE}
tbl.alltags %>%
collect() %>%
names() # list the variable names in the table
```
### Adding GPS Data {#addGPS}
If GPS data is required, users will either have to use the `alltagsGPS` view, or will have to add GPS values to a data subset.
To use the `alltagsGPS` view (note that this can be slow, which is why GPS data is excluded by default):
```{r getAllTagsGPSTable, eval = TRUE}
tbl.alltagsGPS <- tbl(sql.motus, "alltagsGPS")
```
Alternatively, to speed things up you can first filter your data, then use the `getGPS()` function to retrieve GPS values.
```{r}
# Filter the data
tbl.Dunlin <- tbl(sql.motus, "alltags") %>%
filter(speciesEN == "Dunlin") %>%
collect()
# Get the daily median location of GPS points for these data
Dunlin.GPS <- getGPS(src = sql.motus, data = tbl.Dunlin)
```
We match GPS locations to `hitID`s according to one of several different time values, specified by the `by` argument.
This can be one of three options:
1. the median location within **`by = X`** minutes of a `hitID`
- here, `X` can be any number greater than zero and represents the size of the time block in minutes over which to calculate a median location
- be aware that you should ideally not chose a period smaller than the frequency at which GPS fixes are recorded, or some hits will not be associated with GPS
2. **`by = "daily"`** median location (**default**, implied in the above example)
- similar to `by = X` except the duration is 24hr (same as `by = 1440`)
- this method is most suitable for receiver deployments at fixed location.
3. or the **`by = "closest"`** location in time
- individual GPS lat/lons are returned, matching the closest `hitID` timestamp
- this method is most accurate for mobile deployments, but is potentially slower than `by = X`.
- you can also specify a `cutoff` which will only match GPS records which are within `cutoff = X` minutes of the hit. This way you can avoid having situations where the 'closest' GPS record is actually days away.
```{r}
Dunlin.GPS <- getGPS(src = sql.motus, data = tbl.Dunlin, by = 60)
Dunlin.GPS <- getGPS(src = sql.motus, data = tbl.Dunlin, by = "closest", cutoff = 120)
```
To keep all `hitID`s, regardless of whether they GPS data or not, use the argument `keepAll = TRUE`.
```{r}
Dunlin.GPS <- getGPS(src = sql.motus, data = tbl.Dunlin, keepAll = TRUE)
```
Finally, use the `left_join()` function to add GPS to your data subset. Note that this sample data doesn't actually have GPS values so these examples are for illustration only.
```{r}
tbl.Dunlin.GPS <- left_join(tbl.Dunlin, Dunlin.GPS, by = "hitID")
```
As there is no GPS data in this example data set, we'll continue with our original `tbl.alltags` file.
### Converting to flat file {#convertToFlat}
To convert the `alltags` view or other table in the .motus file into a typical 'flat' format, i.e., with every record for each field filled in, use the `collect()` and `as.data.frame()` functions. The output can then be further manipulated, or used to generate a RDS file of your data for archiving or export.
```{r df.alltags, eval = FALSE}
df.alltags <- tbl.alltags %>%
collect() %>%
as.data.frame()
```
Now we have a flat dataframe of the alltags table called `df.alltags`. We can look at some metrics of the file:
```{r quickLook, eval = FALSE}
names(df.alltags) # field names
str(df.alltags) # structure of your data fields
head(df.alltags) # prints the first 6 rows of your df to the console
summary(df.alltags) # summary of each column in your df
```
Note that the format of the time stamp (ts) field is numeric and represents seconds since January 1 1970. We recommend that when you transform your tables into flat dataframes, that you format the time stamp using the lubridate package at that time, e.g.:
```{r collect_TimeStamp, eval = TRUE}
df.alltags <- tbl.alltags %>%
collect() %>%
as.data.frame() %>% # for all fields in the df (data frame)
mutate(ts = as_datetime(ts, tz = "UTC", origin = "1970-01-01"))
# the tz = "UTC" is not necessary here, provided you have set your system time to UTC
# ... but it serves as a useful reminder!
```
Note that time stamps can only be manipulated in this way *after* collecting the data into a flat dataframe.
If you want to load only part of your entire virtual table (e.g. certain fields, certain tags, or all tags from a specified project or species), you can use `dplyr` functions to filter the data before collecting into a dataframe. Some examples are below:
1. To select certain variables:
```{r collect1, eval = TRUE}
# to grab a subset of variables, in this case a unique list of Motus tag IDs at
# each receiver and antenna.
df.alltagsSub <- tbl.alltags %>%
select(recv, port, motusTagID) %>%
distinct() %>%
collect() %>%
as.data.frame()
```
2. To select certain tag IDs:
```{r collect2, eval = TRUE}
# filter to include only motusTagIDs 16011, 23316
df.alltagsSub <- tbl.alltags %>%
filter(motusTagID %in% c(16011, 23316)) %>%
collect() %>%
as.data.frame() %>%
mutate(ts = as_datetime(ts, tz = "UTC", origin = "1970-01-01"))
```
3. To select a specific species:
```{r collect3, eval = TRUE}
# filter to only Red Knot (using speciesID)
df.4670 <- tbl.alltags %>%
filter(speciesID == 4670) %>%
collect() %>%
as.data.frame() %>%
mutate(ts = as_datetime(ts, tz = "UTC", origin = "1970-01-01"))
# filter to only Red Knot (using English name)
df.redKnot <- tbl.alltags %>%
filter(speciesEN == "Red Knot") %>%
collect() %>%
as.data.frame() %>%
mutate(ts = as_datetime(ts, tz = "UTC", origin = "1970-01-01"))
```
Using `dplyr`, your virtual table can also be summarized before converting to a flat file. For example, to find the number of different detections for each tag at each receiver:
```{r collectSum, eval = TRUE}
df.detectSum <- tbl.alltags %>%
count(motusTagID, recv) %>%
collect() %>%
as.data.frame()
```
In later chapter(s) we will show you additional ways of summarizing and working with your data.
## Export your 'flat' dataframe to CSV or RDS file {#exportDetections}
A good workflow is to create a script that deals with all your data issues (as described in later chapters), and then saves the resulting dataframe for re-use. If you do this, you can quickly start an analysis or visualization session from a known (and consistent) starting point. We use an .rds file, which preserves all of the associated R data structures (such as time stamps).
```{r createRDS, eval = TRUE, message = FALSE, warning = FALSE}
saveRDS(df.alltags, "./data/df_alltags.rds")
```
Some users may also want to export the flat dataframe into a .csv file for analysis in other programs. This can easily be done with the following code. Note that it **does not** preserve time stamps:
```{r createCSV, eval = TRUE, message = FALSE, warning = FALSE}
write.csv(df.alltags, "./data/df_alltags.csv")
```
## Update and/or open an existing database {#tagmeUpdate}
As you or other users upload data to our server, you may have additional tag detections that weren't present in your initial data download. Since the .motus file is a SQLite database, you can update your existing file with any newly available data, rather than doing a complete new download of the entire database. To open and update a detections database that already exists (has been downloaded previously), we use the `tagme()` function but set `new = FALSE`:
```{r tagme3, eval = FALSE}
sql.motus <- tagme(projRecv = proj.num, new = FALSE, update = TRUE, dir = "./data/")
```
If you are working offline, and simply want to open an already downloaded database without connecting to the server to update, use `new = FALSE` and `update = FALSE`:
```{r tagme3.1, eval = FALSE}
# use dir = to specify a directory
sql.motus <- tagme(projRecv = proj.num, new = FALSE, update = FALSE)
```
## Check if new data are available {#tellme}
To check if new data are available for your project or receiver without downloading the data, you can use the `tellme()` function, which returns a list with:
- **`numHits`**: number of new tag detections.
- **`numBytes`**: approximate uncompressed size of data transfer required, in megabytes.
- **`numRuns`**: number of runs of new tag detections, where a run is a series of continuous detections for a tag on a given antenna.
- **`numBatches`**: number of batches of new data.
- **`numGPS`**: number of GPS records of new data.
The following assumes that a local copy of the database already exists:
```{r tagme5, eval = FALSE}
tellme(projRecv = proj.num) # If db is in the working directory
tellme(projRecv = proj.num, dir = "./data/") # To specify a different directory
```
To check how much data is available for a project but you _do not_ have a database for it, use the 'new' parameter:
```{r tagme6, eval = FALSE}
tellme(projRecv = proj.num, new = TRUE)
```
## Force an update/re-import of tag and receiver deployment metadata {#forceMeta}
Tag and receiver metadata are automatically merged with tag detections when data are downloaded. However, if metadata have been updated since your initial download, you can force re-import of the metadata when updating a database by running:
```{r tagme7, eval = FALSE}
sql.motus <- tagme(projRecv = proj.num, forceMeta = TRUE)
```
## Import full tag and receiver metadata {#metadata}
When you use `tagme()` to download or update your .motus file, you are provided with the metadata for:
1. any tags registered to your project which have detections;
2. tags from other projects which are associated with ambiguous detections (see Chapter \@ref(dataCleaning)) in your data;
3. receivers that your tags and any ambiguous tags were detected on.
In many instances, you will want access to the full metadata for all tags and receivers across the network, e.g., to determine how many of your deployed tags were not detected, or to plot the location of stations with and without detections. The `metadata()` function can be used to add the complete Motus metadata to your .motus file. The metadata function only needs to be run once, but we suggest that you re-import the metadata occasionally to ensure that you have the most recent and up-to-date information.
Running the metadata function as follows will add the appropriate metadata from across the network (all tags and all receivers) to the `recvDeps` and `tagDeps` tables in your .motus file:
```{r metadata1, eval = FALSE}
# access all tag and receiver metadata for all projects in the network.
metadata(sql.motus)
```
Alternatively, you can load metadata for a specific project(s) using:
```{r metadata2, eval = FALSE}
# access tag and receiver metadata associated with project 176
metadata(sql.motus, projectIDs = 176)
# access tag and receiver metadata associated with projects 176 and 1
metadata(sql.motus, projectIDs = c(176, 1))
```
## Ensure that you have the correct database version {#checkVersion}
When you call the tagme function to load the sqlite database, the version of the R package used to download the data is stored in an admInfo table. Over time, changes will be made to the functionality of the R package that may require adding new tables, views or fields to the database. If your version of the database does not match the version of the R package, some of the examples contained in this book may not work. The following call will check that your database has been updated to the version matching the current version of the `motus` R package. If your database does not match the most current version of the R package, use `tagme()` with `update = TRUE` to update your database to the correct format. Refer to Appendix B if the `checkVersion()` call returns a warning.
```{r dbVersion, eval = FALSE}
checkVersion(sql.motus)
```
## R object naming convention
Throughout this chapter and the rest of the book, we name R objects according to their structure and the source of the data contained in the object. So, SQLite objects will be prefixed with `sql.`, virtual table objects will be prefixed with `tbl.`, and dataframe objects will be prefixed with `df.`; the rest of the name will include the name of the .motus table that the data originates from. Throughout the rest of the book we will be relying on and referencing the naming formats below; please ensure that you are familiar with these before continuing to the next chapter. The following code assumes you have already downloaded the sample data and do not need to update it; if you have not downloaded the data, see section \@ref(downloadData) for instructions on initial download:
```{r namingConvention, eval = FALSE}
# SQLite R object, which links to the .motus file:
sql.motus <- tagme(176, update = TRUE, dir = "./data")
# virtual table object of the alltags table in the sample.motus file:
tbl.alltags <- tbl(sql.motus, "alltags")
df.alltags <- tbl.alltags %>%
collect() %>%
as.data.frame() %>% # dataframe ("flat") object of alltags table
mutate(ts = as_datetime(ts, tz = "UTC", origin = "1970-01-01"))
```
## Summary: download, access, and export data {#importSummary}
Throughout the book we will predominantly be working with the `alltags` table in flat format. Although described in detail above, here we include a quick summary of how you download, access, and convert the sample data for use in the rest of the book. In later sections of the book, we include additional recommended modifications and filtering of unnecessary variables, more information is available in section \@ref(loadDetectionsCleaning) in Chapter \@ref(dataCleaning).
```{r summaryDownload, eval = FALSE}
# set proj.num to 176 for sample data
proj.num <- 176
# - download and load detection data from the sample project and save in data folder
# - login and password for sample data is "motus.sample"
# - if you are accessing already-downloaded data, use new = FALSE; if you don't
# want to update your data, also set update = FALSE
sql.motus <- tagme(proj.num, new = TRUE, update = TRUE, dir = "./data/")
# access the "alltags" table within the SQLite file
tbl.alltags <- tbl(sql.motus, "alltags")
# convert "tbl.alltags" to a flat dataframe and change numeric time to a datetime object
df.alltags <- tbl.alltags %>%
collect() %>%
as.data.frame() %>%
mutate(ts = as_datetime(ts, tz = "UTC", origin = "1970-01-01"))
```
For your own data we suggest creating a script with the following workflow:
1. download and/or update your data
2. select variables of interest for the table you are working with (typically `alltags`)
3. include any initial cleaning
4. save the resulting data as an .rds file as described in section \@ref(exportDetections). We suggest using RDS instead of CSV, because the RDS format preserves the underlying structure of the data (e.g. POSIX times stay as POSIX times). If you want to export your data to another program, then a CSV format might be preferred.
We caution that producing a flat file using the full suite of fields can use a lot of memory, and can slow R down considerably when dealing with large datasets. For some combinations of data sets and computers, it may be impossible to directly use data frames in R. If that is the case, then this is the point in your workflow where you should carefully consider the information you need from within your data set (for example, how it is aggregated) and simplify it. You can always return to this script and creating a new RDS file with different variables, or aggregated at a different scale.
To read in a saved RDS file you can run:
```{r readRDS1, eval = TRUE}
# reads in your file "df.alltags.rds" saved in the data folder
df.alltags.saved <- readRDS("./data/df_alltags.rds")
```
In the next chapter we will check for missing metadata.
```{r, include = FALSE}
# Clean up!
file.remove("./data/df_alltags.rds")
file.remove("./data/df_alltags.csv")
```