Performance Evaluation

We benchmarked the performance of our system in terms of ATE (Absolute Trajectory Error), RPE (Relative Pose Error), and computational cost against other top-performing open-source implementations, i.e., OKVIS [Leutenegger et al.], VINS-Mono [Qin et al.] and ROVIO [Bloesch et al.], on publicly available datasets. Our implementation achieves comparable accuracy at a fraction of the computational cost. On a desktop PC equipped with an Intel Core i7 CPU @ 3.6 GHz, our system runs at around 140 Hz at low CPU consumption rate. As a comparison, OKVIS and VINS-Mono runs at around 20 Hz, and ROVIO runs at around 60 Hz. The runtime of our system can be further improved by better utilizing CPU cache and memory.

Algorithm Categories

OKVIS and VINS-Mono are optimization based, which means they operate on keyframes in an iterative manner, which in general results in more accurate pose estimates at the price of higher latency and computational cost. ROVIO and XIVO are filtering based, which are causal and much cheaper in terms of computatioanl cost. Yet, they produce pose estimates comparable to optimization based methods.

Besides, OKVIS runs on stereo images, whereas the other three methods only use monocular images.

Computational Cost

We benchmarked the runtime of OKVIS, VINS-Mono, ROVIO and XIVO on a desktop machine equipped with an Intel Core i7 CPU @ 3.6 GHz. The table below shows the runtime of the feature processing and state update modules.

Module	OKVIS (Stereo+Keyframe)	VINS-Mono (Keyframe)	ROVIO	XIVO
Feature detection & matching	15ms	20ms	1ms^*	3 ms
State update	42ms	50m	13ms	4 ms

* ROVIO is a 'direct' method that skips the feature matching step and directly uses the photometric error as the innovation term in EKF update step. Since it uses Iterative Extended Kalman Filter (IEKF) for state update, it's slower than our EKF-based method.

OKVIS and VINS-Mono (marked with Keyframe) perform iterative nonlinear least square on keyframes for state estimation, and thus are much slower in the state update step.

Accuracy

We compared the performance of our system in terms of ATE and RPE on two publicly available datasets: TUM-VI and EuRoC. We achieve comparable pose estimation accuracy at a fraction of the computational cost of the top-performing open-source implementations.

TUM-VI

The following table shows the performance on 6 indoor sequences where ground-truth poses are available. The numbers for OKVIS, VINS-Mono, and ROVIO are taken from the TUM-VI benchmark paper. The evaluation script of XIVO can be found in misc/run_all.sh.

Sequence	length	OKVIS (Stereo+Keyframe)	VINS-Mono (Keyframe)	ROVIO	XIVO
room1	156m	0.06m	0.07m	0.16m	0.06m
room2	142m	0.11m	0.07m	0.33m	0.11m
room3	135m	0.07m	0.11m	0.15m	0.16m
room4	68m	0.03m	0.04m	0.09m	0.07m
room5	131m	0.07m	0.20m	0.12m	0.11m
room6	67m	0.04m	0.08m	0.05m	0.05m

Table 1. RMSE ATE in meters. Methods marked with Keyframe are keyframe-based, others are recursive approaches.

Sequence	OKVIS (Stereo+Keyframe)	VINS-Mono (Keyframe)	ROVIO	XIVO
room1	0.013m/0.43^o	0.015m/0.44^o	0.029m/0.53^o	0.020m/0.53^o
room2	0.015m/0.62^o	0.017m/0.63^o	0.030m/0.67^o	0.048m/0.72^o
room3	0.012m/0.64^o	0.023m/0.63^o	0.027m/0.66^o	0.069m/0.74^o
room4	0.012m/0.57^o	0.015m/0.41^o	0.022m/0.61^o	0.022m/0.64^o
room5	0.012m/0.47^o	0.026m/0.47^o	0.031m/0.60^o	0.025m/0.57^o
room6	0.012m/0.49^o	0.014m/0.44^o	0.019m/0.50^o	0.022m/0.53^o

Table 2. RMSE RPE in translation (meters) and rotation (degrees). Methods marked with Keyframe are keyframe-based, others are recursive approaches.

EuRoC

Benchmark results on the EuRoC dataset will be available soon.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance Evaluation

Algorithm Categories

Computational Cost

Accuracy

TUM-VI

EuRoC

Background

User Guide

Clone this wiki locally