Skip to content

Performance Evaluation

Stephanie Tsuei edited this page Jan 31, 2022 · 2 revisions

We benchmarked the performance of our system in terms of ATE (Absolute Trajectory Error), RPE (Relative Pose Error), and computational cost against other top-performing open-source implementations, i.e., OKVIS [Leutenegger et al.], VINS-Mono [Qin et al.] and ROVIO [Bloesch et al.], on publicly available datasets. Our implementation achieves comparable accuracy at a fraction of the computational cost. On a desktop PC equipped with an Intel Core i7 CPU @ 3.6 GHz, our system runs at around 140 Hz at low CPU consumption rate. As a comparison, OKVIS and VINS-Mono runs at around 20 Hz, and ROVIO runs at around 60 Hz. The runtime of our system can be further improved by better utilizing CPU cache and memory.

Algorithm Categories

OKVIS and VINS-Mono are optimization based, which means they operate on keyframes in an iterative manner, which in general results in more accurate pose estimates at the price of higher latency and computational cost. ROVIO and XIVO are filtering based, which are causal and much cheaper in terms of computatioanl cost. Yet, they produce pose estimates comparable to optimization based methods.

Besides, OKVIS runs on stereo images, whereas the other three methods only use monocular images.

Computational Cost

We benchmarked the runtime of OKVIS, VINS-Mono, ROVIO and XIVO on a desktop machine equipped with an Intel Core i7 CPU @ 3.6 GHz. The table below shows the runtime of the feature processing and state update modules.

Module OKVIS (Stereo+Keyframe) VINS-Mono (Keyframe) ROVIO XIVO
Feature detection & matching 15ms 20ms 1ms* 3 ms
State update 42ms 50m 13ms 4 ms

* ROVIO is a 'direct' method that skips the feature matching step and directly uses the photometric error as the innovation term in EKF update step. Since it uses Iterative Extended Kalman Filter (IEKF) for state update, it's slower than our EKF-based method.

OKVIS and VINS-Mono (marked with Keyframe) perform iterative nonlinear least square on keyframes for state estimation, and thus are much slower in the state update step.

Accuracy

We compared the performance of our system in terms of ATE and RPE on two publicly available datasets: TUM-VI and EuRoC. We achieve comparable pose estimation accuracy at a fraction of the computational cost of the top-performing open-source implementations.

TUM-VI

The following table shows the performance on 6 indoor sequences where ground-truth poses are available. The numbers for OKVIS, VINS-Mono, and ROVIO are taken from the TUM-VI benchmark paper. The evaluation script of XIVO can be found in misc/run_all.sh.

Sequence length OKVIS (Stereo+Keyframe) VINS-Mono (Keyframe) ROVIO XIVO
room1 156m 0.06m 0.07m 0.16m 0.06m
room2 142m 0.11m 0.07m 0.33m 0.11m
room3 135m 0.07m 0.11m 0.15m 0.16m
room4 68m 0.03m 0.04m 0.09m 0.07m
room5 131m 0.07m 0.20m 0.12m 0.11m
room6 67m 0.04m 0.08m 0.05m 0.05m

Table 1. RMSE ATE in meters. Methods marked with Keyframe are keyframe-based, others are recursive approaches.

Sequence OKVIS (Stereo+Keyframe) VINS-Mono (Keyframe) ROVIO XIVO
room1 0.013m/0.43o 0.015m/0.44o 0.029m/0.53o 0.020m/0.53o
room2 0.015m/0.62o 0.017m/0.63o 0.030m/0.67o 0.048m/0.72o
room3 0.012m/0.64o 0.023m/0.63o 0.027m/0.66o 0.069m/0.74o
room4 0.012m/0.57o 0.015m/0.41o 0.022m/0.61o 0.022m/0.64o
room5 0.012m/0.47o 0.026m/0.47o 0.031m/0.60o 0.025m/0.57o
room6 0.012m/0.49o 0.014m/0.44o 0.019m/0.50o 0.022m/0.53o

Table 2. RMSE RPE in translation (meters) and rotation (degrees). Methods marked with Keyframe are keyframe-based, others are recursive approaches.

EuRoC

Benchmark results on the EuRoC dataset will be available soon.