Why is stride not needed when calling dataset.write()? #3

pwm1234 · 2015-10-02T21:54:07Z

I am impressed by this project. You have a nice clean implementation, and I look forward to trying it out. But in studying the code and comparing it with other code (in particular the opencv mat to eigen conversion function), I am wondering if there is a problem with your save method where you call dataset.write. When using the .data() buffer of an eigen object, I think one needs to take into account the strides, both the inner and outer strides, but particularly the outer. The particular code from opencv I am referring to is at opencv/eigen.hpp at master · Itseez/opencv in the eigen2cv function. Here is the relevant snippet:

    Mat _src(src.rows(), src.cols(), DataType<_Tp>::type,
             (void*)src.data(), src.stride()*sizeof(_Tp));
    _src.copyTo(dst);

I appreciate your time and feedback.

The text was updated successfully, but these errors were encountered:

garrison · 2015-10-02T22:00:55Z

Thanks for taking the time to get in touch, and for your valuable feedback (and for the kind words!).

This is actually why I first copy the matrix to row-major form before attempting to save. That way, I believe I am in control of the strides. Do you see a specific reason why this may not work in general? Obviously, copying the matrix before saving is suboptimal (it would be nice to be able to save without making a copy), but I have not figured out a simple way to do this in general.

pwm1234 · 2015-10-02T22:07:36Z

Even though you copy, you could still have alignment issues that will cause padding to be placed at the end of each row. It may not be likely, but I think this is certainly possible. As for avoiding the copy, the eigen2cv method avoids it by using stride and swapping rows and columns.

Of course, my knowledge at this point is theoretical. I have not played with any code, and I do not have concrete examples or demonstrations to discuss yet.

garrison · 2015-10-02T22:13:15Z

Very good to know. I will leave this issue open for now. Any additional investigation (whether or not it includes an actual demonstration of failure) would be most welcome.

pwm1234 · 2015-10-05T12:53:29Z

Where you will encounter an outer stride that is not "natural" (a term I found wading through some eigen code, which I think means the data is stored in contiguous memory; i.e., the outer stride is number of columns for row major and number of rows for column major), is when using a block submatrix or a map. Either case may give you a matrix whose elements are not stored contiguously. When this occurs, you can copy the input matrix to get its storage into contiguous memory.

Yes, copying is inefficient, but since you are copying the input matrix before saving, I think your code is correct. I looked at eigen a little more closely, and from what I was able to discover, when allocating a memory block they do not pad a matrix between rows (or columns) to keep them on a particular boundary. So the outerStride appears to be the number of columns (or rows) and your code that uses .data() along with .rows() and .columns() for the copy is okay.

You could avoid the copy but it would take more care. If .Flags say the input matrix row major, the inner stride is 1 and outer stride is the number of columns, the the .data() is ready for memcpy. (If column major, then switch rows and columns.) This would likely be a worthwhile optimization since it would work common inputs.

If I have time to play with it I may give you a pull request.
Thanks,
Phil

pwm1234 · 2015-10-05T23:39:37Z

I had some spare time and I looked through the Eigen forum and found several posts by moderators that said the memory layout of a Matrix is always contiguous. Here is one of them:

Re: Several questions: memory layout, R-value, and back-end.
Fri Jul 08, 2011 6:53 am

Yes, the memory layout of a Matrix<> object is always contiguous. .data() gives you the address of the first element, and .outerStride() the number of elements between two columns (or rows) which is always equal to either rows() of cols() (depends on the storage order of course, the default is column major). data() and data()+size() are valid begin/end iterators.

So, I think your code, while inefficient because it is making a copy, is correct.

If/when I get more spare time, I may play around with avoiding the copy.

garrison · 2015-10-16T02:42:07Z

I'm going to close this, since I believe you have solved this issue :)

pwm1234 · 2015-10-16T05:43:31Z

Not solved entirely. I would like to fix that compilation error and all of those warnings I introduced. I only built with Visual Studio 2013, so I did not see them. I have easy access to gcc 4.8.2, so I will try that and see if I can clean up. Thanks for your time.

garrison · 2015-10-17T15:53:13Z

That would be awesome, thanks again.

pwm1234 · 2015-10-20T13:20:36Z

I have addressed the compilation error and warnings. I also have a preliminary CMakeLists.txt for the unit tests. I am cleaning it up and should have a new pull request for you in the next day or two.

garrison closed this as completed Oct 16, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why is stride not needed when calling dataset.write()? #3

Why is stride not needed when calling dataset.write()? #3

pwm1234 commented Oct 2, 2015

garrison commented Oct 2, 2015

pwm1234 commented Oct 2, 2015

garrison commented Oct 2, 2015

pwm1234 commented Oct 5, 2015

pwm1234 commented Oct 5, 2015

garrison commented Oct 16, 2015

pwm1234 commented Oct 16, 2015

garrison commented Oct 17, 2015

pwm1234 commented Oct 20, 2015

Why is stride not needed when calling dataset.write()? #3

Why is stride not needed when calling dataset.write()? #3

Comments

pwm1234 commented Oct 2, 2015

garrison commented Oct 2, 2015

pwm1234 commented Oct 2, 2015

garrison commented Oct 2, 2015

pwm1234 commented Oct 5, 2015

pwm1234 commented Oct 5, 2015

garrison commented Oct 16, 2015

pwm1234 commented Oct 16, 2015

garrison commented Oct 17, 2015

pwm1234 commented Oct 20, 2015