-
Notifications
You must be signed in to change notification settings - Fork 2
/
Gundog.GPS.Filter.R
694 lines (603 loc) · 47.4 KB
/
Gundog.GPS.Filter.R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
#############################################Gundog.GPS.Filter#############################################
#https://www.researchgate.net/profile/Richard-Gunner-2/research
#https://orcid.org/0000-0002-2054-9944
####A R function to screen for GPS anomalies###
##Key features##
##Specifically, this script is divided into 2 main components -->
###Component 1 = User defined movement thresholds to identify anomalous fixes###
#There are three user-defined thresholds: i) [R1] = A threshold value of distance (units in meters) between two sets of coordinates - the raw fixes the and median filtered equivalent. Locational data above this threshold are deemed outliers
# ii) [R2] = Erroneous 'spikes'- These are identified by computing the outgoing and incoming speed (units in meters per second) and absolute turning angle (units ranging 0-180°) between every combination of three consecutive GPS fixes. High outgoing and incoming speed coupled with an acute (small) vertex angle indicates erronous fixes ('spikes')
# iii) [R3] = A threshold value of maximum speed using a user-defined stepping range - defined as the interval between each retained fix: A five-fix stepping range is thus the distance and time difference (s) computed between every fifth fix prior to calculating the resultant speed value (m/s). If this value exceeds the threshold, then in this example, it is the fifth fix that is depicted as the outlier (usually, the stepping range just needs to be left at the default value of 1)
#This means there are 3 important thresholds - i) [R1] Instantaneous distance (m) away from surrounding general fixes (median position)
# - ii) [R2] The tandem thresholds of outgoing and incoming speeds (m/s) and absolute turning angle (0-180°) between 3 consecutive fixes.
# - iii) [R3] Instantaneous speed threshold (m/s), using a given stepping range
##Unique to this script?##
#Some previous methods for GPS filtering have calculated metrics such as the above (distance moved, speed and turn angles) using various pre-defined static windows (e.g., Bjorneraas et al. 2010) and the user chooses sutibale thresholds to evaluate GPS fix uncertainty.
#This script is unique in that the window resets per user-defined maximum GPS drop-out. For example, if GPS was scheduled as 1 fix/5 mins, and there was a period of 20 minutes without a fix, and the drop-out threshold was set as 600 s (10 mins), then the window and the various running computations would not overlap either side of the drop-out. This ensures measures of fix inaccuracy are more standardized in time and space.
#Moreover, some GPS devices records fixes in pre-set 'bursts'. For example, every five minutes, there could be a burst of six consecutive fixes at 1 Hz; one fix per second for six seconds
#This script offers four options to pre-process coordinates within each burst:
# i) Retain one median longitude and latitude value per burst
# ii) Retain one mean longitude and latitude value per burst
# iii) Only keep the last value per burst
# iv) Do nothing (use this option if GPS does not record in bursts or if the fixes within each burst are not standardised to 1 s apart)
#Component 1 method order:
#1) The script starts by pre-processing burst data if required (see above). Then the user-defined maximum GPS drop-out time (units in seconds) triggers an increment in group number each time the GPS drop-out time is reached/surpassed
#2) Per group number, a user-defined sliding window (number of consecutive fixes) is used to compute a center-aligned running median of the longitude and latitude coordinates (following processing of burst data if required) and then the Haversine distance between each 'current' GPS fix and the equivalent running median value is computed. Values are extended at the tails of the window (fill = "extend). If the length of the running median window is less than the length of the given group number, then the running median window.length will automatically change to the length of the given group number, unless the group number length is less than 3, in which case, no median fixes are computed.
#3) R1 computed
#4) R2 computed between every combination of three consecutive GPS fixes per group number
#5) R3 computed (this will be the same as R2's outgoing speed if the stepping range is left at the default value of one). The stepping range resets per unique group number.
###Component 2 = Isolation forest models as a method of unsupervised anomaly detection###
#This script also implements Isolation forest models based on the above metrics of; [R1] Distance from median fixes , [R2] Outgoing/incoming speed and absolute turning angle between 3 consecutive fixes, and [R3] Maximum speed. For explanation and examples, see https://sealavi.github.io/Outliers-in-animal-movement-data/. Shauhin Alavi [https://github.com/sealavi] conceived this idea for use in GPS filtering.
#Isolation forest runs multiple times, once as a multidimensional case including all metrics [R1,R2 & R3], and once with each metric as single dimensions. Essentially, this approach grows random decision trees, and increasingly partitions and isolates data using random threshold values. More isolated observations are identified as anomalous.
#The user supplies a quantile anomaly score (default = 99.5% quantile) The distribution of anomaly scores at or higher than this quantile are deemed outliers for each dimensional case.
#The user can then inspect the resultant data frame of potential outliers determined from the user-defined threshold limits and the Isolation forest approach to evaluate filtering performance
################################################################################
##User-defined inputs -->
############################################################
#TS --> Timestamp (as.POSIXct object) (e.g., '2022-08-06 21:46:12')
#Longitude --> GPS longitude coordinates (decimal format)
#Latitude --> GPS latitude coordinates (decimal format)
#Drop.out --> Maximum GPS drop-out time (units in s) used to compute group numbers from which the various running windows/stepping ranges listed above resets per increment (default value = 300)
#Window.length --> Sliding window length (number of consecutive fixes) used to compute center-aligned running median values of longitude and latitude values (for the R1 threshold) (default value = 10)
#Burst.method --> Option to pre-process burst data: "median", "mean", "last", or "none" (default = "last")
#Dist.thresh --> Threshold value (m) for R1
#Angle.speed --> Threshold value (m/s) for both incoming and outgoing speeds between 3 consecutive fixes used in evaluating R2
#Angle.thresh --> The absolute angle vertex between 3 consecutive fixes used in evaluating R2, for this, the 'Ang.vertex' threshold is passed if <= to the 'Angle.thresh' value
#Max.speed --> Threshold value (m/s) for R3
#Speed.step --> The stepping range between fixes for 'Max.speed' calculation (e.g., if default value of 1 is changed to 5, then speed is computed between every 5 fixes per group number)
#I.F_conf = --> Quantile anomaly score (default = 0.995)
#plot --> If set to TRUE (default), then the following plots are given:
#plot 1 = 'Thresholds' --> Four histograms showing the density distributions of metrics used in evaluating R1, R2 & R3 thresholds. Dashed ablines denote the 95 (green) and 99 (red) percentile, relative to the user-defined threshold values (blue)
#plot 2 = 'Plot' --> Two side-by-side ggplotly interactive plots of GPS track:
#[left plot] coloured according to whether fix was determined 'Not anomalous' (no thresholds surpassed), 'Possible outlier' (one or two of R1, R2 & R3 were surpassed), or 'Unanimous outlier' (all thresholds surpassed) --> Based on component 1 user-defined thresholds
#[right plot] coloured according to whether fix was determined 'Not anomalous' 'Possible outlier' or 'Unanimous outlier' --> Based on component 2 user-defined thresholds Isolation forest models. If quantile anomaly scores from all metrics considered together are above the 'I.F_conf value', then fix considered 'Unanimous outlier', for single dimensional cases; 'Possible outlier'. Otherwise, 'Not anomalous'.
#If plot = TRUE, then a 3-component list is returned, with 'Thresholds', 'Plot', and 'df' (the latter being the resultant data frame).
#If plot = FALSE, then just 'df' is returned as a data frame
################################################################################
##Data frame outputs -->
############################################################
#Observation --> The row number number of the original input data (subsequent to filtering e.g., of the burst sequences)
#Timestamp --> Input timestamp
#Time.diff --> Time difference (s)
#Longitude
#Latitude
#Fix.number --> Consecutive numbers of retained fixes
#Window.group --> Integer value increasing by one for each 'new group' made according to the 'Drop.out' value
#Dist.from.median --> Distance that each fix is a way from the median fix value (calculated according to the 'Window.length' input)
#R1 --> TRUE or FALSE (if TRUE, 'Dist.from.median' >= 'Dist.thresh')
#Ang.vertex --> The absolute vertex angle between 3 fixes (°)
#Outgoing.speed --> Outgoing speed of the 3 fixes (m/s)
#Incoming.speed --> Incoming speed of the 3 fixes (m/s)
#R2 --> TRUE or FALSE (if TRUE, then both 'Outgoing.speed' and 'Incoming.speed' are >= 'Angle.speed', and the 'Ang.vertex' <= 'Angle.thresh')
#Maximum.speed --> Speed (m/s) between fixes of the given user-defined stepping range (the 'Speed.step')
#R3 --> TRUE or FALSE (if TRUE, Maximum.speed >= 'Max.speed')
#Verdict --> 'Not anomalous' (no thresholds of R1, R2, or R3 are reached/surpassed), 'Possible outlier' (one or two thresholds of R1, R2, or R3 are reached/surpassed), 'Unanimous outlier' (all thresholds reached/surpassed)
#I.F_overall_anomaly_score --> Overall Isolation Forest anomaly score from multidimensional case including all metrics in tandem [R1,R2 & R3]
#I.F_anomaly_score_R1 --> Isolation Forest anomaly score just considering single case of R1
#I.F_anomaly_score_R2 --> Isolation Forest anomaly score just considering single case of R2
#I.F_anomaly_score_R3 --> Isolation Forest anomaly score just considering single case of R3
#Verdict_IF --> 'Not anomalous' (anomaly scores, both in single cases, and multidimensional case of R1, R2, and R3, are less than the input quantile ('I.F_conf' - default = 0.995) of there respective distributions), 'Possible outlier' (One or a combination of R1, R2, or R3 anomaly scores are equal to or greater than the input quantile ('I.F_conf' - default = 0.995) of there respective distributions), 'Unanimous outlier' (the distribution of anomaly scores at or higher than the input quantile ('I.F_conf' - default = 0.995) when considered as the multidimensional case [R1, R2, & R3 considered together]
##Based on the verdicts within the returned data frame (and plots, if plot = TRUE), the user can then decide to filter out the necessary fixes. The 'observation' column is given, if ever resultant columns are required to be merged back into the 'original' df (based on matching row numbers)
#############################################Gundog.GPS.Filter#############################################
###START OF FUNCTION###
Gundog.GPS.Filter = function(TS, Longitude, Latitude, Drop.out = 300, Window.length = 10, Burst.method = "last", Dist.thresh = 50, Max.speed = 1, Speed.step = 1, Angle.speed = 1, Angle.thresh = 30, I.F_conf = 0.995, plot = TRUE){
################################################################################################################################################
###Required packages###
if (!require('zoo')){ install.packages('zoo', dependencies = TRUE, type="source")} ; suppressMessages(require("zoo"))
if (!require('dplyr')){ install.packages('dplyr', dependencies = TRUE, type="source")} ; suppressMessages(require("dplyr"))
if (!require('tidyr')){ install.packages('tidyr', dependencies = TRUE, type="source")} ; suppressMessages(require("tidyr"))
if (!require('ggplot2')){ install.packages('ggplot2', dependencies = TRUE, type="source")} ; suppressMessages(require("ggplot2"))
if (!require('plotly')){ install.packages('plotly', dependencies = TRUE, type="source")} ; suppressMessages(require("plotly"))
if (!require('solitude')){ install.packages('solitude', dependencies = TRUE, type="source")} ; suppressMessages(require("solitude"))
#Check that required packages are installed on the system
areinstaled=data.frame(installed.packages())
if(all(c("zoo","dplyr","tidyr","ggplot2", "plotly", "solitude")%in%areinstaled$Package)==FALSE){
required_packages=c("zoo","dplyr","tidyr","ggplot2", "plotly", "solitude")
missing_packages=c("zoo","dplyr","tidyr","ggplot2", "plotly", "solitude")%in%areinstaled$Package
stop(paste("The following packages are not installed:", required_packages[which(missing_packages==FALSE)], sep = " "))
}
###Input argument checking###
options(digits.secs = 3) #Specify the number of decimal places of the fractional seconds to show if relevant
is.POSIXct = function(x) inherits(x, "POSIXct") #Function to check variable is of POSIXct type
if(is.POSIXct(TS) == FALSE) { #Ensure TS is of type POSIXct (otherwise terminate function)
stop("TS must be of type POSIXct", .call = FALSE)
}
if(length(unique(TS)) != length(TS)){ #Ensure TS does not contain duplicate timestamps (otherwise terminate function)
stop("TS must not contain duplicates", .call = FALSE)
}
if(min(Longitude, na.rm = TRUE) < -180 || max(Longitude, na.rm = TRUE) > 180) { #Ensure coordinates are in decimal format
stop("Longitude must be between -180 and 180 degrees", call. = FALSE)
}
if(min(Latitude, na.rm = TRUE) < -180 || max(Latitude, na.rm = TRUE) > 180) { #Ensure coordinates are in decimal format
stop("Latitude must be between -90 and 90 degrees", call. = FALSE)
}
if(length(Drop.out) > 1 || is.numeric(Drop.out) == FALSE || Drop.out != round(Drop.out)){ #Ensure a single (whole number) numeric value is supplied
stop("Drop.out threshold must be one numeric value", .call = FALSE)
}
if(length(Window.length) > 1 || is.numeric(Window.length) == FALSE || Window.length != round(Window.length)){ #Ensure a single (whole number) numeric value is supplied
stop("Window.length threshold must be one numeric value", .call = FALSE)
}
if(length(Burst.method) > 1 || Burst.method %in% c("median", "mean", "last", "none") == FALSE ){ #Ensure a valid option is selected
stop("Pick one of the pre-select options for Burst.method", .call = FALSE)
}
if(length(Dist.thresh) > 1 || is.numeric(Dist.thresh) == FALSE){ #Ensure a single numeric value is supplied
stop("Dist.thresh threshold must be one numeric value", .call = FALSE)
}
if(length(Max.speed) > 1 || is.numeric(Max.speed) == FALSE){ #Ensure a single numeric value is supplied
stop("Max.speed threshold must be one numeric value", .call = FALSE)
}
if(length(Angle.speed) > 1 || is.numeric(Angle.speed) == FALSE){ #Ensure a single numeric value is supplied
stop("Angle.speed threshold must be one numeric value", .call = FALSE)
}
if(length(Angle.thresh) > 1 || is.numeric(Angle.thresh) == FALSE){ #Ensure a single numeric value is supplied
stop("Angle.thresh threshold must be one numeric value", .call = FALSE)
}
################################################################################################################################################
####Required functions###
#Haversine distance formula
disty = function(long1, lat1, long2, lat2) { #Longitude and Latitude supplied in degrees
long1 = long1 * pi/180 ; long2 = long2 * pi/180 ; lat1 = lat1 * pi/180 ; lat2 = lat2 * pi/180 #Function converts to radians
a = sin((lat2 - lat1) / 2) * sin((lat2 - lat1) / 2) + cos(lat1) * cos(lat2) * sin((long2 - long1) / 2) * sin((long2 - long1) / 2)
c = 2 * atan2(sqrt(a), sqrt(1 - a))
d1 = 6378137 * c
return(d1)
}
#Bearing function --> returns degrees - Great circular bearing between 2D positions
beary = function(long1, lat1, long2, lat2) { #Longitude and Latitude supplied in degrees
long1 = long1 * pi/180 ; long2 = long2 * pi/180 ; lat1 = lat1 * pi/180 ; lat2 = lat2 * pi/180 #Function converts to radians
a = sin(long2 - long1) * cos(lat2)
b = cos(lat1) * sin(lat2) - sin(lat1) * cos(lat2) * cos(long2 - long1)
c = ((atan2(a, b) / pi) * 180) #Units returned in degrees (-180 to +180 degree scale)
return(c)
}
#Set to 5 decimal places (for plotting long/lat coords purposes)
scaleFUN <- function(x) sprintf("%.5f", x)
################################################################################################################################################
###Create the initial data frame and pre-process the data###
Observation = rep(1:length(TS)) #Row number (used for indexing and merging data frames)
df = data.frame(Observation, TS, Longitude, Latitude) ; colnames(df) = c("Observation", "Timestamp", "Longitude", "Latitude")
#Remove NA fixes
df$Longitude = ifelse(df$Longitude == 0, NA, df$Longitude) #In case missing coordinates are filled as zeros, replace with NA's
df$Latitude = ifelse(df$Latitude == 0, NA, df$Latitude) #In case missing coordinates are filled as zeros, replace with NA's
df = df[!with(df, is.na(Longitude) | is.na(Latitude)) ,]
#Create a time difference (s) between values
df = df %>% mutate(Time.diff = as.numeric(c(0, difftime(Timestamp, dplyr::lag(Timestamp), units = "secs")[-1]))) %>% ungroup()
#Make a group column which increments by one each GPS burst. This is going to be used to pre-process each burst
df$Fix.number = 1
x = 1
for(i in 1:nrow(df)){
if(df$Time.diff[i] > 1){
x = x+1
}
df$Fix.number[i] = x
}
#Method to process each burst
if(Burst.method != "none"){
if(Burst.method == "median"){
df = df %>% group_by(Fix.number) %>% mutate(Latitude = median(Latitude)) %>% ungroup()
df = df %>% group_by(Fix.number) %>% mutate(Longitude = median(Longitude)) %>% ungroup()
}
if(Burst.method == "mean"){
df = df %>% group_by(Fix.number) %>% mutate(Latitude = mean(Latitude)) %>% ungroup()
df = df %>% group_by(Fix.number) %>% mutate(Longitude = mean(Longitude)) %>% ungroup()
}
if(Burst.method == "last"){
df = df %>% group_by(Fix.number) %>% mutate(Latitude = last(Latitude)) %>% ungroup()
df = df %>% group_by(Fix.number) %>% mutate(Longitude = last(Longitude)) %>% ungroup()
}
}
#Remove duplicated processed GPS fixes per burst
if(Burst.method != "none"){
df.sub = df[!duplicated(df$Fix.number), ]
df.sub = df.sub %>% mutate(Time.diff = as.numeric(c(0, difftime(Timestamp, dplyr::lag(Timestamp), units = "secs")[-1]))) %>% ungroup() #Redo time difference between rows after subset
}else{
df.sub = df
}
#Make a group column ('Window.group') which increments by one each time the GPS drops-out (missing fixes) >= the user-defined 'Drop.out' threshold (s)
x = 1
Time.diff = df.sub$Time.diff
Window.group = rep(0, length(Time.diff))
for (i in 1:length(Window.group)) {
if(Time.diff[i] >= Drop.out){
x = x+1
}
Window.group[i] = x
}
df.sub$Window.group = Window.group
################################################################################################################################################
#User-defined thresholds component of script
###Calculate distance between raw fixes and the median equivalent --> R1###
#Median Longitude and Latitude values per Window.group. If the length of the Window.group < Window.length, then the window.length will change to the length of the Window.group, unless the Window.group < 3, in which case, no median fixes are computed.
df.sub = df.sub %>% group_by(Window.group) %>% mutate(Window.group.length = max(sequence(rle(Window.group)$lengths))) %>% ungroup() #Number of consecutive rows per Window.group
df.sub$Window.group.length = ifelse(df.sub$Window.group.length >= Window.length, Window.length, df.sub$Window.group.length) #If the number of consecutive rows per Window.group is >= Window.length, then make this value the window.length
df.sub = df.sub %>% group_by(Window.group) %>% mutate(Longitude.med = ifelse(Window.group.length < 3, NA, #If the number of consecutive rows per Window.group is not a minimum of three then the median fix is not calculated. If the number of consecutive rows per Window.group is > 3 and less than the Window.length, then the running median window shortens to this value. Otherwise, the user-defined Window.length is used
rollapply(Longitude, align = "center", width = Window.group.length, FUN = median, fill = NA))) %>% tidyr::fill(Longitude.med, .direction = "updown") %>% ungroup() #Replace last Non-NA value per Window.group in both directions
df.sub = df.sub %>% group_by(Window.group) %>% mutate(Latitude.med = ifelse(Window.group.length < 3, NA, #If the number of consecutive rows per Window.group is not a minimum of three then the median fix is not calculated. If the number of consecutive rows per Window.group is > 3 and less than the Window.length, then the running median window shortens to this value. Otherwise, the user-defined Window.length is used
rollapply(Latitude, align = "center", width = Window.group.length, FUN = median, fill = NA))) %>% tidyr::fill(Latitude.med, .direction = "updown") %>% ungroup() #Replace last Non-NA value per Window.group in both directions
df.sub$Dist.from.median = with(df.sub, disty(Longitude, Latitude, Longitude.med, Latitude.med)) #Distance between raw fixes and the median equivalent
#Evaluating the threshold
df.sub$R1 = ifelse(df.sub$Dist.from.median >= Dist.thresh, TRUE, FALSE) #R1 = TRUE if GPS fix identified as erroneous
###Calculate Erroneous 'spikes' by identifying outgoing and incoming speed (m/s) and absolute turning angle (0-180°) between every combination of three consecutive GPS fixes within the user-defined sliding window length per group --> R2###
#Shift longitude and latitude rows forwards (lag) and backwards (lead) by one row per group
df.sub = df.sub %>% group_by(Window.group) %>% mutate(Longitude.lag = dplyr::lag(Longitude, n = 1, default = NA)) %>% ungroup()
df.sub = df.sub %>% group_by(Window.group) %>% mutate(Latitude.lag = dplyr::lag(Latitude, n = 1, default = NA)) %>% ungroup()
df.sub = df.sub %>% group_by(Window.group) %>% mutate(Longitude.lead = dplyr::lead(Longitude, n = 1, default = NA)) %>% ungroup()
df.sub = df.sub %>% group_by(Window.group) %>% mutate(Latitude.lead = dplyr::lead(Latitude, n = 1, default = NA)) %>% ungroup()
#Calculate bearing (angle) between 'Longitude'/'Latitude', 'Longitude.lag'/'Latitude.lag', and 'Longitude.lead'/'Latitude.lead' (angle from central fix to 'pre-fix' and from central fix to 'post-fix' across the three candidate fixes)
df.sub = df.sub %>% group_by(Window.group) %>% mutate(Ang.lag = beary(Longitude, Latitude, Longitude.lag, Latitude.lag)) %>% ungroup()
df.sub = df.sub %>% group_by(Window.group) %>% mutate(Ang.lead = beary(Longitude, Latitude, Longitude.lead, Latitude.lead)) %>% ungroup()
df.sub$Ang.lag = ifelse(df.sub$Ang.lag < 0, df.sub$Ang.lag + 360, df.sub$Ang.lag) #Because above formula outputs within the scale of -180 to +180 degrees --> This ensures output is 0 to 360 degrees
df.sub$Ang.lead = ifelse(df.sub$Ang.lead < 0, df.sub$Ang.lead + 360, df.sub$Ang.lead) #Because above formula outputs within the scale of -180 to +180 degrees --> This ensures output is 0 to 360 degrees
df.sub$Ang.vertex = df.sub$Ang.lead - df.sub$Ang.lag #Turning angle (0-180°) between every combination of three consecutive GPS fixes
df.sub$Ang.vertex = ifelse(df.sub$Ang.vertex < -180, (df.sub$Ang.vertex + 360), df.sub$Ang.vertex) #Ensure difference does not exceed 180 degrees in either circular direction
df.sub$Ang.vertex = ifelse(df.sub$Ang.vertex > 180, (df.sub$Ang.vertex - 360), df.sub$Ang.vertex) #Ensure difference does not exceed 180 degrees in either circular direction
df.sub$Ang.vertex = abs(df.sub$Ang.vertex) #Make angle absolute
#Calculate the incoming and outgoing speeds either side of angle vertex
df.sub = df.sub %>% group_by(Window.group) %>% mutate(Dist.lag = disty(Longitude, Latitude, Longitude.lag, Latitude.lag)) %>% ungroup()
df.sub = df.sub %>% group_by(Window.group) %>% mutate(Dist.lead = disty(Longitude, Latitude, Longitude.lead, Latitude.lead)) %>% ungroup()
#Lead of time difference
#Create lag and lead time difference (s) between values
df.sub = df.sub %>% group_by(Window.group) %>% mutate(Time.diff.lag = as.numeric(c(NA, difftime(Timestamp, dplyr::lag(Timestamp), units = "secs")[-1]))) %>% ungroup()
df.sub = df.sub %>% group_by(Window.group) %>% mutate(Time.diff.lead = dplyr::lead(Time.diff.lag, n = 1, default = NA)) %>% ungroup()
#Outgoing and incoming speed (m/s)
df.sub = df.sub %>% group_by(Window.group) %>% mutate(Outgoing.speed = Dist.lag / Time.diff.lag) %>% ungroup()
df.sub = df.sub %>% group_by(Window.group) %>% mutate(Incoming.speed = Dist.lead / Time.diff.lead) %>% ungroup()
#Evaluating the threshold
df.sub$R2 = ifelse(df.sub$Outgoing.speed >= Angle.speed & df.sub$Incoming.speed >= Angle.speed & df.sub$Ang.vertex <= Angle.thresh, TRUE, FALSE) #R2 = TRUE if GPS fix identified as erroneous
##Calculate Erroneous fixes if maximum speed threshold calculated using a user-defined stepping range is exceed --> R3###
#Shift longitude and latitude values backwards by the specified stepping range (Speed.step)
df.sub = df.sub %>% group_by(Window.group) %>% mutate(Longitude.lag.2 = dplyr::lag(Longitude, n = Speed.step, default = NA)) %>% ungroup()
df.sub = df.sub %>% group_by(Window.group) %>% mutate(Latitude.lag.2 = dplyr::lag(Latitude, n = Speed.step, default = NA)) %>% ungroup()
#Calculate time difference between the specified stepping range (Speed.step)
df.sub = df.sub %>% group_by(Window.group) %>% mutate(Time.diff.lag.2 = as.numeric(c(NA, difftime(Timestamp, dplyr::lag(Timestamp, n = Speed.step, default = NA), units = "secs")[-1]))) %>% ungroup()
#Calculate the speed across the specified 'Speed.step' stepping range
df.sub$Maximum.speed = with(df.sub, disty(Longitude, Latitude, Longitude.lag.2, Latitude.lag.2) / Time.diff.lag.2)
#Evaluating the threshold
df.sub$R3 = ifelse(df.sub$Maximum.speed >= Max.speed, TRUE, FALSE) #R3 = TRUE if GPS fix identified as erroneous
#Replace possible NAs in R1, R2 & R3 to FALSE
df.sub$R1 = ifelse(is.na(df.sub$R1) == TRUE, FALSE, df.sub$R1)
df.sub$R2 = ifelse(is.na(df.sub$R2) == TRUE, FALSE, df.sub$R2)
df.sub$R3 = ifelse(is.na(df.sub$R3) == TRUE, FALSE, df.sub$R3)
#If at least one threshold is passed, then 'Possible outlier', if all thresholds surpassed, then 'Unanimous outlier', if no thresholds surpassed, then 'Not anomalous'
df.sub$Verdict = ifelse(df.sub$R1 == TRUE & df.sub$R2 == TRUE & df.sub$R3 == TRUE, "Unanimous outlier", ifelse(df.sub$R1 == TRUE | df.sub$R2 == TRUE | df.sub$R3 == TRUE, "Possible outlier", "Not anomalous"))
#df.sub$Verdict = as.character(factor(df.sub$Verdict, levels = c("Not anomalous", "Possible outlier", "Unanimous outlier"))) #Ensure factor levels are ordered for plotting purposes
#Remove redundant columns / vectors
rm(Observation, x, Window.group)
rm(df)
df.sub = df.sub[, c('Observation', 'Timestamp', 'Time.diff', 'Longitude', 'Latitude', 'Fix.number', 'Window.group', 'Dist.from.median', 'R1', 'Ang.vertex', 'Outgoing.speed', 'Incoming.speed', 'R2', 'Maximum.speed', 'R3', 'Verdict')]
################################################################################################################################################
################################################################################################################################################
###Isolation forest component of function###
##check if default isolation forest sample size settings need to be adjusted
if(nrow(df.sub) < 256){
sample_size = nrow(df.sub)
}else{
sample_size = 256
}
#Begin anomaly detection
message("Initiating Isolation forest anomaly detection")
if(all(diff((df.sub$Timestamp)) > 0) == FALSE){
stop("Timestamps may be out of order - Ensure that timestamps are in ascending order")
}
#All variables together
indexs = with(df.sub, which(complete.cases(Dist.from.median, Ang.vertex, Outgoing.speed, Incoming.speed, Maximum.speed) == TRUE)) #Row numbers with complete cases across the columns of interest (non-NA values)
COLS = c("Dist.from.median", "Ang.vertex", "Outgoing.speed", "Incoming.speed", "Maximum.speed") #Columns of interest
COLS = which(names(df.sub) %in% COLS == TRUE) #Index of where columns occur in 'df.sub'
subset = df.sub[indexs, COLS] #Subset just columns of interest
iforest = isolationForest$new(sample_size = sample_size)
invisible(capture.output(iforest$fit(subset)))
scores = iforest$predict(subset)
df.sub$I.F_overall_anomaly_score = NA
df.sub$I.F_overall_anomaly_score[indexs] = scores$anomaly_score
#Just 'Dist.from.median' [R1]
indexs = with(df.sub, which(complete.cases(Dist.from.median) == TRUE)) #Row numbers with non-NA values
subset = as.data.frame(df.sub$Dist.from.median[indexs]) #Subset just column of interest
iforest = isolationForest$new(sample_size = sample_size)
invisible(capture.output(iforest$fit(subset)))
scores = iforest$predict(subset)
df.sub$I.F_anomaly_score_R1 = NA
df.sub$I.F_anomaly_score_R1[indexs] = scores$anomaly_score
#Just 'Ang.vertex', 'Outgoing.speed' and 'Incoming.speed' [R2]
indexs = with(df.sub, which(complete.cases(Ang.vertex, Outgoing.speed, Incoming.speed) == TRUE)) #Row numbers with complete cases across the columns of interest (non-NA values)
COLS = c("Ang.vertex", "Outgoing.speed", "Incoming.speed") #Columns of interest
COLS = which(names(df.sub) %in% COLS == TRUE) #Index of where columns occur in 'df.sub'
subset = df.sub[indexs, COLS] #Subset just columns of interest
iforest = isolationForest$new(sample_size = sample_size)
invisible(capture.output(iforest$fit(subset)))
scores = iforest$predict(subset)
df.sub$I.F_anomaly_score_R2 = NA
df.sub$I.F_anomaly_score_R2[indexs] = scores$anomaly_score
#Just 'Maximum.speed' [R3]
indexs = with(df.sub, which(complete.cases(Maximum.speed) == TRUE)) #Row numbers with non-NA values
subset = as.data.frame(df.sub$Maximum.speed[indexs]) #Subset just column of interest
iforest = isolationForest$new(sample_size = sample_size)
invisible(capture.output(iforest$fit(subset)))
scores = iforest$predict(subset)
df.sub$I.F_anomaly_score_R3 = NA
df.sub$I.F_anomaly_score_R3[indexs] = scores$anomaly_score
#Which rows correspond to being >= I.F_conf quantile (for R1, R2, R3, and combined)
R1_outliers = which(df.sub$I.F_anomaly_score_R1 >= as.numeric(quantile(df.sub$I.F_anomaly_score_R1, I.F_conf, na.rm = TRUE)))
R2_outliers = which(df.sub$I.F_anomaly_score_R2 >= as.numeric(quantile(df.sub$I.F_anomaly_score_R2, I.F_conf, na.rm = TRUE)))
R3_outliers = which(df.sub$I.F_anomaly_score_R3 >= as.numeric(quantile(df.sub$I.F_anomaly_score_R3, I.F_conf, na.rm = TRUE)))
overall_outliers = which(df.sub$I.F_overall_anomaly_score >= as.numeric(quantile(df.sub$I.F_overall_anomaly_score, I.F_conf, na.rm = TRUE)))
R1_outliers = R1_outliers[-which(R1_outliers %in% overall_outliers == TRUE)]
R2_outliers = R2_outliers[-which(R2_outliers %in% overall_outliers == TRUE)]
R3_outliers = R3_outliers[-which(R3_outliers %in% overall_outliers == TRUE)]
#Label as "Not anomalous", "Possible outlier", "Unanimous outlier", according to whether none, some or all thresholds were surpassed
df.sub$Verdict_IF = "Not anomalous"
df.sub$Verdict_IF[R1_outliers] = "Possible outlier"
df.sub$Verdict_IF[R2_outliers] = "Possible outlier"
df.sub$Verdict_IF[R3_outliers] = "Possible outlier"
df.sub$Verdict_IF[overall_outliers] = "Unanimous outlier"
###Plot results###
if(plot == TRUE){
#summary plots - 'Thresholds'
old.par <- par(mar = c(0, 0, 0, 0))
#(1) Distribution of Distance from median (R1)
par(old.par)
par(mfrow = c(2,2))
h = hist(df.sub$Dist.from.median, breaks="Scott", plot=TRUE, main = "Distance from median (m)",
xlab = "Distance from median (m) [R1]", ylab = "Density", cex.lab = 1.2, freq = FALSE)
text(Dist.thresh, max(h$density), labels = paste("R1 =", Dist.thresh), col = "blue", cex = 0.8) # User defined 'Dist.thresh [R1]'
text(quantile(df.sub$Dist.from.median, 0.95, na.rm = TRUE), max(h$density)/1.7, labels = "95%", col = "green", cex = 0.8) # 95%
text(quantile(df.sub$Dist.from.median, 0.99, na.rm = TRUE), max(h$density)/3.2, labels = "99%", col = "red", cex = 0.8) # 99%
clip(x1 = min(h$breaks),
x2 = max(h$breaks),
y1 = min(h$density),
y2 = max(h$density) - max(h$density)/10)
abline(v = Dist.thresh, col="blue", lwd=1.5, lty=2) # User defined 'Dist.thresh [R1]'
clip(x1 = 0,
x2 = max(h$breaks),
y1 = 0,
y2 = max(h$density)/2)
abline(v = quantile(df.sub$Dist.from.median, 0.95, na.rm = TRUE), col="green", lwd=1.5, lty=2) #95% quantile
clip(x1 = 0,
x2 = max(h$breaks),
y1 = 0,
y2 = max(h$density)/4)
abline(v = quantile(df.sub$Dist.from.median, 0.99, na.rm = TRUE), col="red", lwd=1.5, lty=2) #99 % quantile
#Zoom in
par(fig = c(0.2, 0.49, 0.65, 0.99), new = T)
h = hist(df.sub$Dist.from.median, breaks="Scott", plot=TRUE, main = "",
xlab = "", ylab = "", freq = FALSE, xlim = c(0, quantile(df.sub$Dist.from.median, 0.99, na.rm = TRUE)))
if(Dist.thresh < quantile(df.sub$Dist.from.median, 0.99, na.rm = TRUE)){
text(Dist.thresh, max(h$density), labels = paste("R1 =", Dist.thresh), col = "blue", cex = 0.8) # User defined 'Dist.thresh [R1]'
}
text(quantile(df.sub$Dist.from.median, 0.95, na.rm = TRUE), max(h$density)/1.7, labels = "95%", col = "green", cex = 0.8) # 95%
text(quantile(df.sub$Dist.from.median, 0.99, na.rm = TRUE), max(h$density)/3.2, labels = "99%", col = "red", cex = 0.8) # 99%
if(Dist.thresh < quantile(df.sub$Dist.from.median, 0.99, na.rm = TRUE)){
clip(x1 = 0,
x2 = max(h$breaks),
y1 = 0,
y2 = max(h$density) - max(h$density)/10)
}
abline(v = Dist.thresh, col="blue", lwd=1.5, lty=2) # User defined 'Dist.thresh [R1]'
clip(x1 = 0,
x2 = max(h$breaks),
y1 = 0,
y2 = max(h$density)/2)
abline(v = quantile(df.sub$Dist.from.median, 0.95, na.rm = TRUE), col="green", lwd=1.5, lty=2) #95% quantile
clip(x1 = 0,
x2 = max(h$breaks),
y1 = 0,
y2 = max(h$density)/4)
abline(v = quantile(df.sub$Dist.from.median, 0.99, na.rm = TRUE), col="red", lwd=1.5, lty=2) #99 % quantile
#(2) Distribution of Angle between 3 fixes (R2)
par(fig = c(0.5, 1, 0.5, 1), new = T)
h = hist(df.sub$Ang.vertex, breaks = 60, plot=TRUE, main = "Angle between 3 fixes (°)",
xlab = "Angle between 3 fixes (°) [R2]", ylab = "Density", cex.lab = 1.2, freq = FALSE, xaxt = "n")
axis(1, at=seq(0,180,by=20), labels=seq(0,180,by=20))
text(Angle.thresh, max(h$density), labels = paste("R2 =", Angle.thresh), col = "blue", cex = 0.8) # User defined 'Angle.thresh [R1]'
text(quantile(df.sub$Ang.vertex, 0.05, na.rm = TRUE), max(h$density)/1.7, labels = "0.5%", col = "green", cex = 0.8) # 0.5%
text(quantile(df.sub$Ang.vertex, 0.01, na.rm = TRUE), max(h$density)/3.2, labels = "0.01%", col = "red", cex = 0.8) # 0.01%
clip(x1 = 0,
x2 = max(h$breaks),
y1 = 0,
y2 = max(h$density) - max(h$density)/10)
abline(v = Angle.thresh, col="blue", lwd=1.5, lty=2) # User defined 'Angle.thresh [R2]'
clip(x1 = 0,
x2 = max(h$breaks),
y1 = 0,
y2 = max(h$density)/2)
abline(v = quantile(df.sub$Ang.vertex, 0.05, na.rm = TRUE), col="green", lwd=1.5, lty=2) #95% quantile
clip(x1 = 0,
x2 = max(h$breaks),
y1 = 0,
y2 = max(h$density)/4)
abline(v = quantile(df.sub$Ang.vertex, 0.01, na.rm = TRUE), col="red", lwd=1.5, lty=2) #99 % quantile
#(3) Distribution of Outgoing/Incoming speed (R2)
par(fig = c(0, 0.5, 0, 0.5), new = T)
h = hist(c(df.sub$Outgoing.speed, df.sub$Incoming.speed), breaks="Scott", plot=TRUE, main = "Outgoing/Incoming speed (m/s)",
xlab = "Outgoing/Incoming speed (m/s) [R2]", ylab = "Density", cex.lab = 1.2, freq = FALSE)
text(Angle.speed, max(h$density), labels = paste("R2 =", Angle.speed), col = "blue", cex = 0.8) # User defined 'Angle.speed [R2]'
text(quantile(c(df.sub$Outgoing.speed, df.sub$Incoming.speed), 0.95, na.rm = TRUE), max(h$density)/1.7, labels = "95%", col = "green", cex = 0.8) # 95%
text(quantile(c(df.sub$Outgoing.speed, df.sub$Incoming.speed), 0.99, na.rm = TRUE), max(h$density)/3.2, labels = "99%", col = "red", cex = 0.8) # 99%
clip(x1 = 0,
x2 = max(h$breaks),
y1 = 0,
y2 = max(h$density) - max(h$density)/10)
abline(v = Angle.speed, col="blue", lwd=1.5, lty=2) # User defined 'Dist.thresh [R1]'
clip(x1 = 0,
x2 = max(h$breaks),
y1 = 0,
y2 = max(h$density)/2)
abline(v = quantile(c(df.sub$Outgoing.speed, df.sub$Incoming.speed), 0.95, na.rm = TRUE), col="green", lwd=1.5, lty=2) #95% quantile
clip(x1 = 0,
x2 = max(h$breaks),
y1 = 0,
y2 = max(h$density)/4)
abline(v = quantile(c(df.sub$Outgoing.speed, df.sub$Incoming.speed), 0.99, na.rm = TRUE), col="red", lwd=1.5, lty=2) #99 % quantile
#Zoom in
par(fig = c(0.2, 0.49, 0.15, 0.49), new = T)
h = hist(c(df.sub$Outgoing.speed, df.sub$Incoming.speed), breaks="Scott", plot=TRUE, main = "",
xlab = "", ylab = "", freq = FALSE, xlim = c(0, quantile(c(df.sub$Outgoing.speed, df.sub$Incoming.speed), 0.99, na.rm = TRUE)))
if(Angle.speed < quantile(c(df.sub$Outgoing.speed, df.sub$Incoming.speed), 0.99, na.rm = TRUE)){
text(Angle.speed, max(h$density), labels = paste("R2 =", Angle.speed), col = "blue", cex = 0.8) # User defined 'Angle.speed [R2]'
}
text(quantile(c(df.sub$Outgoing.speed, df.sub$Incoming.speed), 0.95, na.rm = TRUE), max(h$density)/1.7, labels = "95%", col = "green", cex = 0.8) # 95%
text(quantile(c(df.sub$Outgoing.speed, df.sub$Incoming.speed), 0.99, na.rm = TRUE), max(h$density)/3.2, labels = "99%", col = "red", cex = 0.8) # 99%
if(Angle.speed < quantile(c(df.sub$Outgoing.speed, df.sub$Incoming.speed), 0.99, na.rm = TRUE)){
clip(x1 = 0,
x2 = max(h$breaks),
y1 = 0,
y2 = max(h$density) - max(h$density)/10)
abline(v = Angle.speed, col="blue", lwd=1.5, lty=2) # User defined 'Dist.thresh [R1]'
}
clip(x1 = 0,
x2 = max(h$breaks),
y1 = 0,
y2 = max(h$density)/2)
abline(v = quantile(c(df.sub$Outgoing.speed, df.sub$Incoming.speed), 0.95, na.rm = TRUE), col="green", lwd=1.5, lty=2) #95% quantile
clip(x1 = 0,
x2 = max(h$breaks),
y1 = 0,
y2 = max(h$density)/4)
abline(v = quantile(c(df.sub$Outgoing.speed, df.sub$Incoming.speed), 0.99, na.rm = TRUE), col="red", lwd=1.5, lty=2) #99 % quantile
#(4) Distribution of Maximum speed (R3)
par(fig = c(0.5, 1, 0, 0.5), new = T)
h = hist(df.sub$Maximum.speed, breaks="Scott", plot=TRUE, main = "Maximum speed (m/s)",
xlab = "Maximum speed (m/s) [R3]", ylab = "Density", cex.lab = 1.2, freq = FALSE)
text(Max.speed, max(h$density), labels = paste("R3 =", Max.speed), col = "blue", cex = 0.8) # User defined 'Max.speed [R3]'
text(quantile(df.sub$Maximum.speed, 0.95, na.rm = TRUE), max(h$density)/1.7, labels = "95%", col = "green", cex = 0.8) # 95%
text(quantile(df.sub$Maximum.speed, 0.99, na.rm = TRUE), max(h$density)/3.2, labels = "99%", col = "red", cex = 0.8) # 99%
clip(x1 = 0,
x2 = max(h$breaks),
y1 = 0,
y2 = max(h$density) - max(h$density)/10)
abline(v = Max.speed, col="blue", lwd=1.5, lty=2) # User defined 'Dist.thresh [R1]'
clip(x1 = 0,
x2 = max(h$breaks),
y1 = 0,
y2 = max(h$density)/2)
abline(v = quantile(c(df.sub$Outgoing.speed, df.sub$Incoming.speed), 0.95, na.rm = TRUE), col="green", lwd=1.5, lty=2) #95% quantile
clip(x1 = 0,
x2 = max(h$breaks),
y1 = 0,
y2 = max(h$density)/4)
abline(v = quantile(c(df.sub$Outgoing.speed, df.sub$Incoming.speed), 0.99, na.rm = TRUE), col="red", lwd=1.5, lty=2) #99 % quantile
#Zoom in
par(fig = c(0.7, 0.99, 0.15, 0.49), new = T)
h = hist(c(df.sub$Outgoing.speed, df.sub$Incoming.speed), breaks="Scott", plot=TRUE, main = "",
xlab = "", ylab = "", freq = FALSE, xlim = c(0, quantile(df.sub$Maximum.speed, 0.99, na.rm = TRUE)))
if(Max.speed < quantile(df.sub$Maximum.speed, 0.99, na.rm = TRUE)){
text(Max.speed, max(h$density), labels = paste("R3 =", Max.speed), col = "blue", cex = 0.8) # User defined 'Angle.speed [R2]'
}
text(quantile(df.sub$Maximum.speed, 0.95, na.rm = TRUE), max(h$density)/1.7, labels = "95%", col = "green", cex = 0.8) # 95%
text(quantile(df.sub$Maximum.speed, 0.99, na.rm = TRUE), max(h$density)/3.2, labels = "99%", col = "red", cex = 0.8) # 99%
if(Max.speed < quantile(df.sub$Maximum.speed, 0.99, na.rm = TRUE)){
clip(x1 = 0,
x2 = max(h$breaks),
y1 = 0,
y2 = max(h$density) - max(h$density)/10)
abline(v = Angle.speed, col="blue", lwd=1.5, lty=2) # User defined 'Dist.thresh [R1]'
}
clip(x1 = 0,
x2 = max(h$breaks),
y1 = 0,
y2 = max(h$density)/2)
abline(v = quantile(df.sub$Maximum.speed, 0.95, na.rm = TRUE), col="green", lwd=1.5, lty=2) #95% quantile
clip(x1 = 0,
x2 = max(h$breaks),
y1 = 0,
y2 = max(h$density)/4)
abline(v = quantile(df.sub$Maximum.speed, 0.99, na.rm = TRUE), col="red", lwd=1.5, lty=2) #99 % quantile
Thresholds <- recordPlot() #Save plot
###ggplot Results of filtering (user-defined thresholds)###
p1 = ggplot(df.sub, aes(x = Longitude, y = Latitude,
text = paste('Timestamp: ', Timestamp,
'</br> Fix number: ', Fix.number,
'</br> Window group: ', Window.group,
'</br> Distance from median (m): ', round(Dist.from.median, 1),
'</br> Angle between 3 fixes (°): ', round(Ang.vertex, 1),
'</br> Outgoing speed of 3 fixes (m/s): ', round(Outgoing.speed, 2),
'</br> Incoming speed of 3 fixes (m/s): ', round(Incoming.speed, 2),
'</br> Maximum speed (m/s): ', round(Maximum.speed, 2),
'</br> R1: ', R1,
'</br> R2: ', R2,
'</br> R3: ', R3)))+
geom_path(aes(group=1), size = 0.3, alpha = 0.5, color = "grey30")+
geom_point(aes(color = Verdict), alpha = 0.8)+
ggtitle("Gundog GPS filter")+
xlab("Longitude")+
ylab("Latitude")+
scale_color_manual(name = "Outlier detection:",
labels = c("Not anomalous" ,"Possible outlier", "Unanimous outlier"),
values = c("green", "darkorange2" ,"red"),
na.translate = FALSE,
drop = FALSE)+
coord_equal(ratio = 1) + #+ scale_x_continuous(labels=scaleFUN)+scale_y_continuous(labels=scaleFUN)
theme_bw()+
theme(axis.text.x = element_text(color = "black", size = 12),
axis.text.y = element_text(color = "black", size = 12),
axis.title.x = element_text(color = "black", size = 16),
axis.title.y = element_text(color = "black", size = 16),
plot.title = element_text(hjust = 0.5),
legend.position = "none")
###ggplot Results of filtering (Isolation forest)####
p2 = ggplot(df.sub, aes(x = Longitude, y = Latitude,
text = paste('Timestamp: ', Timestamp,
'</br> Fix number: ', Fix.number,
'</br> Window group: ', Window.group,
'</br> Distance from median (m): ', round(Dist.from.median, 1),
'</br> Angle between 3 fixes (°): ', round(Ang.vertex, 1),
'</br> Outgoing speed of 3 fixes (m/s): ', round(Outgoing.speed, 2),
'</br> Incoming speed of 3 fixes (m/s): ', round(Incoming.speed, 2),
'</br> Maximum speed (m/s): ', round(Maximum.speed, 2),
'</br> I.F_anomaly_score_R1: ', round(I.F_anomaly_score_R1, 2),
'</br> I.F_anomaly_score_R2: ', round(I.F_anomaly_score_R2, 2),
'</br> I.F_anomaly_score_R3: ', round(I.F_anomaly_score_R3, 2))))+
geom_path(aes(group=1), size = 0.3, alpha = 0.5, color = "grey30")+
geom_point(aes(color = Verdict_IF), alpha = 0.8)+
ggtitle("Gundog GPS filter")+
xlab("Longitude")+
ylab("Latitude")+
scale_color_manual(name = "Outlier detection:",
labels = c("Not anomalous" ,"Possible outlier", "Unanimous outlier"),
values = c("green", "darkorange2" ,"red"),
na.translate = FALSE,
drop = FALSE)+
coord_equal(ratio = 1) + #+ scale_x_continuous(labels=scaleFUN)+scale_y_continuous(labels=scaleFUN)
theme_bw()+
theme(axis.text.x = element_text(color = "black", size = 12),
axis.text.y = element_text(color = "black", size = 12),
axis.title.x = element_text(color = "black", size = 16),
axis.title.y = element_text(color = "black", size = 16),
plot.title = element_text(hjust = 0.5),
legend.position = "none")
#Make Interactive and plot user-defined, and Isolation Forests results side-by-side
fig <- subplot(ggplotly(p1), ggplotly(p2))
#Add sub headings
annotations = list(
list(
x = 0.2,
y = 1.0,
text = "User-defined threholds",
xref = "paper",
yref = "paper",
xanchor = "center",
yanchor = "bottom",
showarrow = FALSE
),
list(
x = 0.8,
y = 1,
text = "Isolation forest",
xref = "paper",
yref = "paper",
xanchor = "center",
yanchor = "bottom",
showarrow = FALSE
))
fig <- fig %>%layout(annotations = annotations)
print(fig)
return(list(Thresholds = Thresholds, Plot = fig, df = as.data.frame(df.sub)))
par(mfrow = c(1,1)) #Return plotting parameters back
}else{ return(df = as.data.frame(df.sub))
} #If no plotting, just return the data frame
}
#END OF FUNCTION
#For example.... (assuming original GPS data set is called 'Drogon')
#df.sub = Gundog.GPS.Filter(TS = Drogon$study.local.timestamp, Longitude = Drogon$location.long, Latitude = Drogon$location.lat, Drop.out = 300, Window.length = 10, Burst.method = "median", Dist.thresh = 50, Max.speed = 0.85, Speed.step = 1, Angle.speed = 0.85, Angle.thresh = 30, plot = TRUE, I.F_conf = 0.995)
#And to recover the data frame from the returned list...
#GPS.df = df.sub[["df"]]