-
Notifications
You must be signed in to change notification settings - Fork 402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assignment of xt::zeros needs major speed-up #702
Comments
Hi, Assigning to a container or a view after it has been initialized involves a temporary to avoid aliasing. Could you test with the template <class V>
void xarray_init_zeros(benchmark::State& state)
{
auto array = xt::xarray<V>::from_shape({SIZE});
for (auto _ : state)
{
xt::noalias(array) = xt::zeros<V>({SIZE});
benchmark::DoNotOptimize(array.raw_data());
}
}
BENCHMARK_TEMPLATE(xarray_init_zeros, float);
template <class V>
void dynamic_view_init_zeros(benchmark::State& state)
{
auto array = xt::xarray<V>::from_shape({SIZE});
auto view = xt::dynamic_view(data, xt::slice_vector{xt::all()});
for (auto _ : state)
{
xt::noalias(view) = xt::zeros<V>({SIZE});
benchmark::DoNotOptimize(array.raw_data());
}
}
BENCHMARK_TEMPLATE(dynamic_view_init_zeros, float); Regarding the assignment |
To the contrary, I was shocked when I first learned that assignment of an |
In fact, the rhs can be a complicated expression involving views, concatenations, computations, views on combinations of those. The current mechanism is very general, and there is no systematic way of ruling out aliasing. Even the lhs can have a "view semantics", which complicates the picture. There are cases where we can probably rule out aliasing like when the rhs is a generator like
In extensor, Special cases aren't special enough to break the rules. Now, the assignment behavior depends on what the
The fact that assignment from a generator goes through the whole broadcasting logic (compared to a simple std::fill) still comes at a cost, but we can certainly improve this. |
Thanks for the quick answer, I really appreciate this. I understand the logic behind your assignment semantics, but I'm sorry to say that I'm not convinced of the premises. Vigra fundamentally rests on the assumption that algorithms cannot know (and should not care) if they are called with arrays or views. This is important because views are very common (for example, many algorithms are applied to each channel individually). This premise has two implications:
In vigra, these requirements are fulfilled by construction, because arrays are implemented as subclasses of views. In contrast, xtensor fulfills neither, and this is a big problem for me. Do you have any suggestion how to resolve this dilemma? |
Please cite practical use cases where a resize to 0-d is actually the desired outcome of a scalar assignment. Otherwise, "practicality beats purity" kicks in (at least this is how I understand the Zen). |
Here is how I do it in vigra: all arrays, views, and expressions provide a function |
The problem is the remaining 20%. You end up with a scheme where the user does not have a simple rule to determine whether he has to use
If assigning a 0d array / scalar to an array was broadcasting to the array instead of resizing it to a 0d array, the copy constructor and assignment operators would have different behaviors: double s = 1.;
xt::xarray<double> a(s);
// a is a 0d array
xt::xarray<double> b = {{0., 1., 2.}, {3., 4., 5.}};
b = s;
// b is still a 2d array And that is a really bug-prone behavior.
I agree with you on the second point (and we working hard to fix that), on the first point I think |
I don't see the problem.
Can you give specific use cases where the programmer knows that |
For that reason, vigra doesn't provide a constructor taking a scalar. The constructor's first argument is always the shape, or an array expression that provides a method array_t<double> a(2.0); is an error, whereas array_t<double> a{2.0}; would invoke an initializer_list that creates an array of size 1 -- but this syntax cannot be confused with assignment. For the record: anyone I asked wants the assignment |
numpy: >>> import numpy as np
>>> x = np.array([[[1, 2], [3, 4]], [[10, 20], [30, 40]]])
>>> x [:, :, 1] = [-1, -2]
>>> x
array([[[ 1, -1],
[ 3, -2]],
[[10, -1],
[30, -2]]]) xtensor: >>> auto x = xt::xarray<double> ({{{1, 2},{3, 4}},{{10, 20}, {30, 40}}});
>>> auto v = xt::view(x, xt::all(), xt::all(), 1);
>>> xt::xtensor<double, 1> t = {-1, -2};
>>> v = t;
>>> std::cout << x << std::endl;
{{{ 1., -1.},
{ 3., -2.}},
{{ 10., -1.},
{ 30., -2.}}}
It would basically break xtensor's consistency to treat scalars differently for assignment, so I don't think that this can change. |
With that solution, the inconsistency is now between scalars and 0-d expression: xarray<double> a = {{0., 1., 2.}, {3., 4., 5.}};
xarray<double> b = a;
b = sum(a);
// => b is a 0d array
b = a;
double s = sum(a);
b = s;
// => b is still a 2d array This is still bug-prone. This decision (i.e. assigning scalars as 0d expressions) has not been taken lightly, we want to ensure consistency between different ways of writing the "same" code and avoid vicious bugs when refactoring (i.e I want to cache the result of a 0d expression in a scalar because now it is used many times while before lazy computing was ok). So this is something that we won't change. I'll add a paragraph explaining all of this in the documentation of xtensor. However, we can think of a way to simplify the filling of an N-dimensional expression without resizing it:
xarray<double> a = {{0., 1., 2. }, {3., 4., 5.}};
a = 1_br;
// => a is a 2d array containing 1 everywhere |
I implemented @JohanMabille's use case and printed the shapes: xarray<double> a = {{0., 1., 2.}, {3., 4., 5.}};
xarray<double> b = a;
std::cerr << "shape before: (" << b.shape()[0] << ", " << b.shape()[1] <<
")\ncontents:\n" << b << "\n";
b = sum(a);
std::cerr << "shape after: (" << b.shape()[0] << ", " << b.shape()[1] <<
")\ncontents:\n" << b << "\n"; The output is
|
Continuing on @JohanMabille's use case, when I use a view instead of an array xarray<double> a = {{0., 1., 2.}, {3., 4., 5.}};
xarray<double> b = a;
auto c = view(b, all(), all());
std::cerr << "shape before: (" << c.shape()[0] << ", " << c.shape()[1] <<
")\ncontents:\n" << c << "\n";
c = sum(a);
std::cerr << "shape after: (" << c.shape()[0] << ", " << c.shape()[1] <<
")\ncontents:\n" << c << "\n"; I get (as expected)
but the behavior differs between arrays and views. I can absolutely not live with this inconsistency, because vigra's generic algorithms must work identically, regardless of their arguments being arrays or views. I see only one (admittedly radical) solution: arrays adopt the assignment semantics of views, i.e. assignments cannot resize the LHS (with one exception: when the LHS does not yet contain data, e.g. after default construction). |
Don't hold your breath though 😄 . Regarding the different behavior on views and containers, we don't consider this as an inconsistency, and from the very start we have separated the two semantics (just like One things you can probably do if you want a view semantics is take a view on an array with no slice. Soon(ish), strided views should have exactly the same performances as arrays. |
That means, every algorithm must begin with auto in_view = make_view(in); etc. -- not a pretty design, and easy to forget. |
Yeah, we were posting an answer and that. So no bug, phew. |
It is technically impossible in Python to assign a scalar to an array, because the expression From the top of my head, I can't come up with other semantic differences between numpy's arrays and views. What differences are you referring to? |
Containers should behave like Assigning or moving any expression to an xarray should behave like assigning an xarray to an xarray: it should resize. Otherwise it would be inconsistent. |
That's because you're reading the arguments one by one instead of taking them as a whole thing. So to be clear:
Then you have 3 means to handle that last point:
So the analogy with numpy justifies the broadcasting when assigning to a view while resizing when assigning to an array, consistency between scalars and 0-D expression justifies the way we handle scalar assignment. Thus, as already said before, we won't change this behavior. That being said, we can consider ways of simplifying the filling of a N-dimensional expression without resizing it:
|
You forgot to mention the fourth possibility: Change arrays to adopt view semantics (i.e. scalars and 0-D expressions are always broadcast to the LHS), and the system is consistent. |
No, it breaks consistency with STL containers, as explained two comments above.
|
When you work on speeding-up expressions on views, also include view creation itself, i.e. functions |
No, I don't want to do this:
I think this is not violated when arrays have view semantics. |
This was a generic "you". There are other users. If you want something like an array that has a view semantics, you can make one that is a valid expression for xtensor by inheriting from xview_semantics and xstrided_container. There are only a couple of constructor to implement.
We strongly disagree. This was a deliberate conscious choice. |
I repeatedly asked for uses cases from your side, but got none. |
I think that we are in the land of sea-lioning here. |
This looks like what I'll try next. I didn't know that it would guarantee view semantic as opposed to array semantic. |
Another note on this issue: for #666 I was planning to overload assign for the special case of a RHS of a single reducer, and then reroute the reducer to use the faster, strided-loop reducers instead of the lazy ones. |
So I've added a benchmark for xt::zeros and assignment to a container is now as fast as using std::fill directly:
as you can see, on arange there is still some work outstanding. We'll spend some cycles to add |
* state.reset() now also resets pose and speed and its xtensors * iterate over control_history_ and call reset() * use xt::noalias() when zeroing in reset() funcs to actually assign zero and clean out old state. "Assigning to a container or a view after it has been initialized involves a temporary to avoid aliasing." See xtensor-stack/xtensor#702 (comment) Signed-off-by: Mike Wake <[email protected]>
I benchmarked resetting an
xarray
to zero and found thatarray = xt::zeros<float>(shape)
is 27x slower thanstd::fill()
, whereas I expected to see no difference. Here are the numbers for gcc-7:(Edit: I added results for dynamic view, which are even worse - 54x slower.)
(Edit2: in releases 0.10 to 0.14,
array = xt::zeros<float>(shape)
was "only" 16x slower.)Code:
(BTW, why is straightforward
array = 0
not supported?)The text was updated successfully, but these errors were encountered: