Extensions facilitating flexible usage #984

tdegeus · 2018-07-19T15:06:57Z

I am planning to stop the development of multi-dimensional arrays in my own library cppmat and transfer to xtensor. (I am continuing the vector/tensor functionality, but now as an extension to xtensor: this could partly solve issue #561)

As an 'advanced’ user who has had to develop his own class I do have ideas that xtensor would benefit from. We could iterate here on these ideas, and potentially merge some of the features cppmat into xtensor.

Negative (periodic) indices

I have enabled negative indices in cppmat, in fact even periodic indexing, which is really nice in the quick-development phase. The way to (potentially) do this is to:

I = (n+(i%n)) % n;

where n should in this case be shape[0].

As I understand that this is a 'costly' operation that could be avoided I have overloaded operator() for ints, where it does the conversion, and for size_t for ordinary use where you avoid the conversion. See cppmat, lines 514-822.

Omitting periodic indices could avoid one modulo. Allowing periodic indices are nice though, so one could allow periodic indices by requiring the user to manually enable this.

Negative axes references

In addition to the previous point it would be nice to be able to for example do

xt::mean(...,{-1});

to average over the last axis. This should be simple, and (almost) never mess with efficiency. But it does make the end code more readable: it focusses on the details that are important in that context. Also, it allows to easily make end code flexible.

Quick shape access

A common operation is this:

#include <iostream>
#include <xtensor/xarray.hpp>
#include <xtensor/xio.hpp>

int main()
{
  xt::xarray<double> a = xt::zeros<double>({3,3,3});

  for ( size_t i = 0 ; i < a.shape()[0] ; ++i )
    for ( size_t j = 0 ; j < a.shape()[1] ; ++j )
      for ( size_t k = 0 ; k < a.shape()[2] ; ++k )
        std::cout << a(i,j,k) << std::endl;
}

while it would be simpler to do:

#include <iostream>
#include <xtensor/xarray.hpp>
#include <xtensor/xio.hpp>

int main()
{
  xt::xarray<double> a = xt::zeros<double>({3,3,3});

  for ( size_t i = 0 ; i < a.shape(0) ; ++i )
    for ( size_t j = 0 ; j < a.shape(1) ; ++j )
      for ( size_t k = 0 ; k < a.shape(2) ; ++k )
        std::cout << a(i,j,k) << std::endl;
}

This can be achieved with a simple overload of the shape function:

size_t xarray<T>::shape(size_t axis);

See cppmat, lines 351-393. Here you can see that I have two more overloads:

(1) For int axis, allowing a.shape(-1). (See previous point.)

(2) A templated overload, which is quite nice. It allows:

#include <iostream>
#include <xtensor/xarray.hpp>
#include <xtensor/xio.hpp>

int main()
{
  xt::xarray<double> a = xt::zeros<double>({3,3,3});

  for ( int i = 0 ; i < a.shape<int>(0) ; ++i )
    for ( int j = 0 ; j < a.shape<int>(1) ; ++j )
      for ( int k = 0 ; k < a.shape<int>(2) ; ++k )
        std::cout << a(i,j,k) << std::endl;
}

Which is again quite useful if one works with periodicity.

One could take this even further and have an overload:

std::vector<size_t> shape = a.shape<std::vector<size_t>>();

Overloading based on rank/size

It is often quite handy and efficient to have overloads based on rank. For example, for a tensor contraction:

xt::xtensor<double, 0> ddot(const xt::xtensor<double, 2> &a, const xt::xtensor<double, 2> &b);
xt::xtensor<double, 2> ddot(const xt::xtensor<double, 4> &a, const xt::xtensor<double, 2> &b);
xt::xtensor<double, 2> ddot(const xt::xtensor<double, 2> &a, const xt::xtensor<double, 4> &b);

More cleaver overloads (allowing for example partially unrolling of loops) could also be based on fixed size or not.

Missing functions

What I missed at first sight is:

xt::average(...): to do weighted average, see numpy.average and cppmat, lines 3044-3081: here you see that it can simply use sum.
xt::argsort(...): to return the indices that would sort the array, see numpy.argsort.
size_t a.rank(): get the rank of the array.

Sparse matrices

This is something that is often needed in scientific computing. This is often paired with the need to either diagonalise a matrix or solve a linear system. In my opinion, this would really close the circle.

One could provide interfaces to existing libraries here, potentially with extension modules that library writes or users could provide.

Symmetric and diagonal matrices

In cppmat I have symmetric and diagonal square matrices. This offers great speed-ups in tensor algebra, as you feed the program with quite some knowledge. But at the same time, the library maintenance burden becomes much more significant.

Single include

It would be much easier to

#include <xtensor/xtensor.hpp>

to get most of the functionality.

Documentation

Although there is already quite a bit of documentation there are a couple of things that could be improved:

A quick-start guide with small full-working examples complete with explicit compiler instruction and CMakeLists.txt files. This will get a potential user of the first bump, and increases significantly the chance of him/her staying committed.
A cheat sheet. From numpy to xtensor is really great! But somehow it took me a long time to find. But in addition a small cheat sheet just for C++ a-la Eigen would help you get an overview. This is currently still quite a challenge.
One of the most frequent things you do is allocate an array. Currently this is not very broadly documented. It would help to have some description to do zeros, empty, ... and how to get those things for fixed sized arrays. (Could go on the cheat sheet.)
At a quick glance I don't understand casting between fixed size, fixed rank, and dynamic arrays. (Could go on the cheat sheet.)
Currently some of the examples on the site don't work. In fact even the very first doesn't compile after downloading xtensor from conda.

Side remarks / wishes

For those cases that efficiency is not important and you don't want to think too much a bool array<T>::inBounds(i,j,...) function could be nice, that allows you to try if an index is out-of-bounds.
I would by default allocate shape, strides, ... on the stack by hard-coding a maximum number of dimensions (6-10 or so). This would avoid naive users to by default use the least efficient array.

Again, we should iterate here. I am willing to help where I can, as switching to xtensor would make my research more robust.

The text was updated successfully, but these errors were encountered:

wolfv · 2018-07-19T16:21:18Z

Just quick answers:

Periodicity: would you mind having a special operator (such as a.periodic(1,-3,4) or another name) for this functionality? Otherwise, yes, we should have an int overload for this kind of operation that either does a ix < 0 ? shape - ix : ix or compile time enabled the modulo access
shape: totally agree, we should add at least your first suggestion (shape(i) accessors).
overloading based on shape: Actually your example should already work (but not for the general case whihc would also support xtensor_fixed rank based overloading) for that we need another base class (prototyped here: add xexpression_shaped experiment #867)
argsort is in the process of being implemented (Argsort from mmatl #977) currently fails on MSVC :/
average agreed, need to do this
rank() == dimension() in this library (similar to ndim in numpy, i think).

Thanks for all your feedback so far! Looking forward to be working with you.

tdegeus · 2018-07-19T16:24:37Z

I forgot one point:

Constructor

What confuses me a lot is the constructor. I get that it is really great and what NumPy does. But still...

What is allowed is

std::vector<size_t> shape = { 3, 2, 4 };
xt::xarray<double> a(shape);

Now since {...} initialises the std::vector I thought (in line with the syntax of reshape), let's do the same in one line:

xt::xarray<double> a({ 3, 2, 4 });

But that is really not the same thing! It confuses me as I would think that the {...} initialisation of xarray would only work that same as for std::vector, i.e. only like this:

xt::xarray<double> a = { 3, 2, 4 };

This could be dealt with in the documentation. Still, being able to allocate in only one line would help. But I guess this is:

xt::xarray<double> a = xt::empty<double>({ 3, 2, 4 });

This is one of the first things a user deals with, it would be great to have on the cheat sheet.

Personally I would have used the following syntax

// allocate an 'empty' array of shape [3,2,4]
xt::xarray<double> a({ 3, 2, 4 });

// initialize a 'list' [3.0, 2.0, 4.0]
xt::xarray<double> a = xt::array({ 3, 2, 4 });
// or of course
xt::xarray<double> a = {3, 2., 4.};

which, in my opinion, would be equally close the NumPy's syntax.

tdegeus · 2018-07-19T16:29:39Z

Great to hear your feedback so far. To answer the only part that needs answering:

A a.periodic(1,-3,4) is perfectly acceptable, also because periodicity is more of an edge-case, while negative indices are quite common.
Overloading based on shape. Great, I guess we should document an example.

wolfv · 2018-07-20T13:26:20Z

I've started continuing the work from #867

it will allow for two types of overloads:
- xexpression_dimensioned<X, N> where N selects the number of compile time dimensions
- xexpression_shaped<X, shape_type> where shape_type selects a specific shape (and could, for example, be used to select a fixed_shape
with this refactoring we can also more easily support new shape accessing functions (such as your proposed shape(size_t index) or shape<xxx>(size_t index) as we'll have one base class (xexpression_shaped) that will contain all shape functions.

The PR needs a bit more love though (it will totally fail all tests right now), just wanted to give you a quick update.

tdegeus changed the title ~~Extension for more flexible usage~~ Extensions for more flexible usage Jul 19, 2018

tdegeus changed the title ~~Extensions for more flexible usage~~ Extensions facilitating flexible usage Jul 19, 2018

This was referenced Mar 14, 2019

Shape access #1431

Closed

Negative axes references #1432

Open

Sparse matrices #1433

Closed

Documentation #1434

Closed

inBounds #1435

Closed

tdegeus closed this as completed Mar 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extensions facilitating flexible usage #984

Extensions facilitating flexible usage #984

tdegeus commented Jul 19, 2018 •

edited

Loading

wolfv commented Jul 19, 2018

tdegeus commented Jul 19, 2018 •

edited

Loading

tdegeus commented Jul 19, 2018

wolfv commented Jul 20, 2018

Extensions facilitating flexible usage #984

Extensions facilitating flexible usage #984

Comments

tdegeus commented Jul 19, 2018 • edited Loading

Negative (periodic) indices

Negative axes references

Quick shape access

Overloading based on rank/size

Missing functions

Sparse matrices

Symmetric and diagonal matrices

Single include

Documentation

Side remarks / wishes

wolfv commented Jul 19, 2018

tdegeus commented Jul 19, 2018 • edited Loading

Constructor

tdegeus commented Jul 19, 2018

wolfv commented Jul 20, 2018

tdegeus commented Jul 19, 2018 •

edited

Loading

tdegeus commented Jul 19, 2018 •

edited

Loading