refactor(doTransform): 10-100x faster transform of pointcloud #432

steinmn · 2019-12-05T10:50:48Z

It seems like generating an Eigen::Vector3f and transforming it for each iteration is very inefficient. In our case, this mod took transform-time of a point-cloud from 90-100ms to 1-2ms.

Verified to give same result as old doTransform using:

sensor_msgs::PointCloud2ConstIterator<float> x_old(cloud1, "x");
sensor_msgs::PointCloud2ConstIterator<float> y_old(cloud1, "y");
sensor_msgs::PointCloud2ConstIterator<float> z_old(cloud1, "z");

sensor_msgs::PointCloud2ConstIterator<float> x_new(cloud2, "x");
sensor_msgs::PointCloud2ConstIterator<float> y_new(cloud2, "y");
sensor_msgs::PointCloud2ConstIterator<float> z_new(cloud2, "z");

std::vector<double> compare_vector;
for (; x_new != x_new.end(); ++x_new, ++y_new, ++z_new, ++x_old, ++y_old, ++z_old) {
compare_vector.push_back(*x_new - *x_old);
compare_vector.push_back(*y_new - *y_old);
compare_vector.push_back(*z_new - *z_old);
}
double max_diff = *max_element(compare_vector.begin(), compare_vector.end());
double min_diff = *min_element(compare_vector.begin(), compare_vector.end());

ROS_INFO("Biggest differences: %f, %f", max_diff, min_diff);

Not sure how/where to put this test-code in a proper test, so I'll leave it here until I get some feedback.

It seems like generating an Eigen::Vector3f and transforming it for each iteration is very inefficient. In our case, this mod took transform-time of a point-cloud from 90-100ms to 1-2ms. Verified to give same result as old doTransform using: ``` sensor_msgs::PointCloud2ConstIterator<float> x_old(cloud1, "x"); sensor_msgs::PointCloud2ConstIterator<float> y_old(cloud1, "y"); sensor_msgs::PointCloud2ConstIterator<float> z_old(cloud1, "z"); sensor_msgs::PointCloud2ConstIterator<float> x_new(cloud2, "x"); sensor_msgs::PointCloud2ConstIterator<float> y_new(cloud2, "y"); sensor_msgs::PointCloud2ConstIterator<float> z_new(cloud2, "z"); std::vector<double> compare_vector; for (; x_new != x_new.end(); ++x_new, ++y_new, ++z_new, ++x_old, ++y_old, ++z_old) { compare_vector.push_back(*x_new - *x_old); compare_vector.push_back(*y_new - *y_old); compare_vector.push_back(*z_new - *z_old); } double max_diff = *max_element(compare_vector.begin(), compare_vector.end()); double min_diff = *min_element(compare_vector.begin(), compare_vector.end()); ROS_INFO("Biggest differences: %f, %f", max_diff, min_diff); ``` Not sure how/where to put this test-code in a proper test, so I'll leave it here until I get some feedback.

yoshipon · 2019-12-10T03:12:54Z

Hi :) Thank you for making this PR.
I'm not a maintainer of this repository but I have a suggestion about this PR.

The main problem of the original code is that Eigen::Vector3f is generated and initialized inside the iteration.

It may be possible to speed up the original code by initializing the Eigen::Vector3f outside the iterations:

Eigen::Vector3f point, point_in;
for(; x_in != x_in.end(); ++x_in, ++y_in, ++z_in, ++x_out, ++y_out, ++z_out) {
    point_in.x() = *x_in;
    point_in.y() = *y_in;
    point_in.z() = *z_in;
    point = t * point_in;

Using Eigen::Vector3f could be faster than using std::vector or writing directory because the Eigen supports automatic SIMD (e.g., SSE, AVX) operation.

steinmn · 2019-12-10T09:10:28Z

Hi
Thanks for the feedback :)
I did a quick test with your suggestion, and this yielded very similar results to the old code (90-100ms). (used code:)

Eigen::Vector3f point, point_in;
for (; x_in != x_in.end(); ++x_in, ++y_in, ++z_in, ++x_out, ++y_out, ++z_out) {
    point_in.x() = *x_in;
    point_in.y() = *y_in;
    point_in.z() = *z_in;
    point = t * point_in;
    *x_out = point.x();
    *y_out = point.y();
    *z_out = point.z();
}

It seems the "point = t * point_in"-line is the bottleneck, clocking in at ~5000ns, with the other lines inside the loop around 100-200ns each. In comparison, each of the three codelines in the loop in my committed code take around 50-100ns. (Of course, individual time measurements should probably be taken with a grain of salt when the durations are this small, but the orders of magnitude should be reliable)

I'm not familiar enough with parallel computing to say anything about how that might affect the performance of the different approaches, so I'll leave that discussion for someone who unlike me knows what they're talking about.

yoshipon · 2020-05-20T09:50:13Z

I'm very sorry for my late response.

Thank you for your benchmark.
I didn't know the significant overhead of Eigen.

Signed-off-by: Shivam Pandey <[email protected]>

tfoote self-requested a review January 4, 2020 01:42

steinmn changed the base branch from melodic-devel to noetic-devel September 27, 2021 07:40

ooeygui pushed a commit to ms-iot/geometry2 that referenced this pull request Oct 12, 2022

Fixed errors due to missing header link. (ros#432)

c36999b

Signed-off-by: Shivam Pandey <[email protected]>

ahcorde added the noetic label Feb 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(doTransform): 10-100x faster transform of pointcloud #432

refactor(doTransform): 10-100x faster transform of pointcloud #432

steinmn commented Dec 5, 2019

yoshipon commented Dec 10, 2019

steinmn commented Dec 10, 2019

yoshipon commented May 20, 2020

refactor(doTransform): 10-100x faster transform of pointcloud #432

Are you sure you want to change the base?

refactor(doTransform): 10-100x faster transform of pointcloud #432

Conversation

steinmn commented Dec 5, 2019

yoshipon commented Dec 10, 2019

steinmn commented Dec 10, 2019

yoshipon commented May 20, 2020