Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(doTransform): 10-100x faster transform of pointcloud #432

Open
wants to merge 1 commit into
base: noetic-devel
Choose a base branch
from

Conversation

steinmn
Copy link

@steinmn steinmn commented Dec 5, 2019

It seems like generating an Eigen::Vector3f and transforming it for each iteration is very inefficient. In our case, this mod took transform-time of a point-cloud from 90-100ms to 1-2ms.

Verified to give same result as old doTransform using:

sensor_msgs::PointCloud2ConstIterator<float> x_old(cloud1, "x");
sensor_msgs::PointCloud2ConstIterator<float> y_old(cloud1, "y");
sensor_msgs::PointCloud2ConstIterator<float> z_old(cloud1, "z");

sensor_msgs::PointCloud2ConstIterator<float> x_new(cloud2, "x");
sensor_msgs::PointCloud2ConstIterator<float> y_new(cloud2, "y");
sensor_msgs::PointCloud2ConstIterator<float> z_new(cloud2, "z");

std::vector<double> compare_vector;
for (; x_new != x_new.end(); ++x_new, ++y_new, ++z_new, ++x_old, ++y_old, ++z_old) {
compare_vector.push_back(*x_new - *x_old);
compare_vector.push_back(*y_new - *y_old);
compare_vector.push_back(*z_new - *z_old);
}
double max_diff = *max_element(compare_vector.begin(), compare_vector.end());
double min_diff = *min_element(compare_vector.begin(), compare_vector.end());

ROS_INFO("Biggest differences: %f, %f", max_diff, min_diff);

Not sure how/where to put this test-code in a proper test, so I'll leave it here until I get some feedback.

It seems like generating an Eigen::Vector3f and transforming it for each iteration is very inefficient. In our case, this mod took transform-time of a point-cloud from 90-100ms to 1-2ms.

Verified to give same result as old doTransform using:

```
sensor_msgs::PointCloud2ConstIterator<float> x_old(cloud1, "x");
sensor_msgs::PointCloud2ConstIterator<float> y_old(cloud1, "y");
sensor_msgs::PointCloud2ConstIterator<float> z_old(cloud1, "z");

sensor_msgs::PointCloud2ConstIterator<float> x_new(cloud2, "x");
sensor_msgs::PointCloud2ConstIterator<float> y_new(cloud2, "y");
sensor_msgs::PointCloud2ConstIterator<float> z_new(cloud2, "z");

std::vector<double> compare_vector;
for (; x_new != x_new.end(); ++x_new, ++y_new, ++z_new, ++x_old, ++y_old, ++z_old) {
compare_vector.push_back(*x_new - *x_old);
compare_vector.push_back(*y_new - *y_old);
compare_vector.push_back(*z_new - *z_old);
}
double max_diff = *max_element(compare_vector.begin(), compare_vector.end());
double min_diff = *min_element(compare_vector.begin(), compare_vector.end());

ROS_INFO("Biggest differences: %f, %f", max_diff, min_diff);
```

Not sure how/where to put this test-code in a proper test, so I'll leave it here until I get some feedback.
@yoshipon
Copy link

Hi :) Thank you for making this PR.
I'm not a maintainer of this repository but I have a suggestion about this PR.

The main problem of the original code is that Eigen::Vector3f is generated and initialized inside the iteration.

It may be possible to speed up the original code by initializing the Eigen::Vector3f outside the iterations:

Eigen::Vector3f point, point_in;
for(; x_in != x_in.end(); ++x_in, ++y_in, ++z_in, ++x_out, ++y_out, ++z_out) {
    point_in.x() = *x_in;
    point_in.y() = *y_in;
    point_in.z() = *z_in;
    point = t * point_in;

Using Eigen::Vector3f could be faster than using std::vector or writing directory because the Eigen supports automatic SIMD (e.g., SSE, AVX) operation.

@steinmn
Copy link
Author

steinmn commented Dec 10, 2019

Hi
Thanks for the feedback :)
I did a quick test with your suggestion, and this yielded very similar results to the old code (90-100ms). (used code:)

Eigen::Vector3f point, point_in;
for (; x_in != x_in.end(); ++x_in, ++y_in, ++z_in, ++x_out, ++y_out, ++z_out) {
    point_in.x() = *x_in;
    point_in.y() = *y_in;
    point_in.z() = *z_in;
    point = t * point_in;
    *x_out = point.x();
    *y_out = point.y();
    *z_out = point.z();
}

It seems the "point = t * point_in"-line is the bottleneck, clocking in at ~5000ns, with the other lines inside the loop around 100-200ns each. In comparison, each of the three codelines in the loop in my committed code take around 50-100ns. (Of course, individual time measurements should probably be taken with a grain of salt when the durations are this small, but the orders of magnitude should be reliable)

I'm not familiar enough with parallel computing to say anything about how that might affect the performance of the different approaches, so I'll leave that discussion for someone who unlike me knows what they're talking about.

@tfoote tfoote self-requested a review January 4, 2020 01:42
@yoshipon
Copy link

I'm very sorry for my late response.

Thank you for your benchmark.
I didn't know the significant overhead of Eigen.

@steinmn steinmn changed the base branch from melodic-devel to noetic-devel September 27, 2021 07:40
ooeygui pushed a commit to ms-iot/geometry2 that referenced this pull request Oct 12, 2022
@ahcorde ahcorde added the noetic label Feb 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants