Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add gather/scatter methods for simd #27

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 51 additions & 0 deletions experimental/bits/simd.h
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@

#include "simd_detail.h"
#include "numeric_traits.h"
#include <array>
#include <bit>
#include <bitset>
#ifdef _GLIBCXX_DEBUG_UB
Expand Down Expand Up @@ -4966,6 +4967,15 @@ template <typename _Tp, typename _Abi>
_Impl::_S_load(_Flags::template _S_apply<simd>(__mem), _S_type_tag))
{}

// gather constructor
template <typename _Up, typename _Flags>
_GLIBCXX_SIMD_ALWAYS_INLINE
simd(const _Up* __mem, const __int_for_sizeof_t<_Up>* __idx, _Flags)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • This is a non-standard (as not specified in an ISO standard) interface. I believe we therefore should implement gather & scatter interfaces only as free functions (possibly hidden friends) or static member functions. And the function name must be a reserved name.
  • I think we want the index parameter to always be of type simd. Turning an array into simd isn't too hard. And the expectation for gather & scatter is that indexes are computed via simd objects.
  • Do you know whether alignment of the memory pointer matters for any gather / scatter hardware? Because if not then the _Flags parameter is unnecessary. I'd drop it.
  • A constexpr scale parameter has been useful in Vc 1. I'm not sure whether we want it as public API at some point, though.
template <size_t _Scale = 1, typename _From, typename _Idx, typename _IdxAbi>
  _GLIBCXX_SIMD_ALWAYS_INLINE static _GLIBCXX_SIMD_CONSTEXPR
  simd
  __gather(const _From* __mem, const simd<_Idx, _IdxAbi>& __idx, _SizeConstant<_Scale> __scale = {})
  {
    return simd([&](auto __i) { return static_cast<value_type>(__mem[__idx[__i] * __scale]); });
  }

I also just implemented a complete _S_gather implementation for _SimdImplX86 (AVX512 and AVX2) and fallback in _SimdImplBuiltin. I didn't implement anything else, because I only use it internally to implement a lookup table for my std::exp(simd) implementation. It's not pushed anywhere yet. But I'm closing in on getting the patch ready.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. simd construct with memory loading has supported, dose the construct with gather need to be the extension?
  2. In my mind, _Flags for __mem of gather is necessary

Copy link
Author

@yaozhongxiao yaozhongxiao Feb 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you have implemented _S_gather, will I work based on your implementation be better?

: _M_data(
_Impl::_S_gather(_Flags::template _S_apply<simd>(__mem),
__idx, _S_type_tag))
{}

// loads [simd.load]
template <typename _Up, typename _Flags>
_GLIBCXX_SIMD_ALWAYS_INLINE void
Expand All @@ -4984,6 +4994,47 @@ template <typename _Tp, typename _Abi>
_S_type_tag);
}

// gather [simd.gather]
template <typename _Up, typename _Flags>
_GLIBCXX_SIMD_ALWAYS_INLINE void
gather(const _Vectorizable<_Up>* __mem,
const std::array<int, size()>& __idx, _Flags)
{
_M_data = static_cast<decltype(_M_data)>(
_Impl::_S_gather(_Flags::template _S_apply<simd>(__mem), __idx.data(),
_S_type_tag));
}

template <typename _Up, typename _Flags>
_GLIBCXX_SIMD_ALWAYS_INLINE void
gather(const _Vectorizable<_Up>* __mem,
const __int_for_sizeof_t<_Up>* __idx, _Flags)
{
_M_data = static_cast<decltype(_M_data)>(
_Impl::_S_gather(_Flags::template _S_apply<simd>(__mem), __idx,
_S_type_tag));
}

// scatter [simd.scatter]
template <typename _Up, typename _Flags>
_GLIBCXX_SIMD_ALWAYS_INLINE void
scatter(_Vectorizable<_Up>* __mem, std::array<int, size()>& __idx,
_Flags) const
{
_Impl::_S_scatter(_M_data, _Flags::template _S_apply<simd>(__mem),
__idx.data(), _S_type_tag);
}

// scatter [simd.scatter]
template <typename _Up, typename _Flags>
_GLIBCXX_SIMD_ALWAYS_INLINE void
scatter(_Vectorizable<_Up>* __mem, const __int_for_sizeof_t<_Up>* __idx,
_Flags) const
{
_Impl::_S_scatter(_M_data, _Flags::template _S_apply<simd>(__mem),
__idx, _S_type_tag);
}

// scalar access
_GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR reference
operator[](size_t __i)
Expand Down
59 changes: 55 additions & 4 deletions experimental/bits/simd_builtin.h
Original file line number Diff line number Diff line change
Expand Up @@ -1431,6 +1431,54 @@ template <typename _Abi>
});
}

// _S_gather {{{2
template <typename _Tp, typename _Up>
_GLIBCXX_SIMD_INTRINSIC static _SimdMember<_Tp>
_S_gather(const _Up* __mem, const __int_for_sizeof_t<_Up>* __idx,
_TypeTag<_Tp>) noexcept
{
constexpr size_t _Np = _S_size<_Tp>;
return __generate_vector<_Tp, _SimdMember<_Tp>::_S_full_size>([&](
auto __i) constexpr {
return static_cast<_Tp>(__i < _Np ? __mem[__idx[__i]] : 0);
});
}
// _S_gather
template <typename _Tp, typename _Up>
_GLIBCXX_SIMD_INTRINSIC static _SimdMember<_Tp>
_S_gather(const _Up* __mem, const _SimdMember<_Tp>& __idx,
_TypeTag<_Tp>) noexcept
{
constexpr size_t _Np = _S_size<_Tp>;
return __generate_vector<_Tp, _SimdMember<_Tp>::_S_full_size>([&](
auto __i) constexpr {
return static_cast<_Tp>(__i < _Np ? __mem[__idx[__i]] : 0);
});
} // }}}

// _S_scatter {{{2
template <typename _Tp, typename _Up>
_GLIBCXX_SIMD_INTRINSIC static void
_S_scatter(_SimdMember<_Tp> __v, _Up* __mem,
const __int_for_sizeof_t<_Up>* __idx, _TypeTag<_Tp>) noexcept
{
constexpr size_t _Np = _S_size<_Tp>;
__execute_n_times<_Np>([&](auto __i) constexpr {
__mem[__idx[__i]] = static_cast<_Up>(__v[__i]);
});
}
// _S_scatter
template <typename _Tp, typename _Up>
_GLIBCXX_SIMD_INTRINSIC static void
_S_scatter(_SimdMember<_Tp> __v, _Up* __mem,
const _SimdMember<_Tp>& __idx, _TypeTag<_Tp>) noexcept
{
constexpr size_t _Np = _S_size<_Tp>;
__execute_n_times<_Np>([&](auto __i) constexpr {
__mem[__idx[__i]] = static_cast<_Up>(__v[__i]);
});
} // }}}

// _S_load {{{2
template <typename _Tp, typename _Up>
_GLIBCXX_SIMD_INTRINSIC static _SimdMember<_Tp>
Expand Down Expand Up @@ -2284,7 +2332,8 @@ template <typename _Abi>
const auto __absn = __vector_bitcast<_Ip>(_SuperImpl::_S_abs(__x));
const auto __maxn
= __vector_bitcast<_Ip>(__vector_broadcast<_Np>(__finite_max_v<_Tp>));
return __absn <= __maxn;
return _MaskImpl::template _S_convert<_Tp>(
_MaskImpl::_S_to_bits(__as_wrapper<_Np>(__absn <= __maxn)));
yaozhongxiao marked this conversation as resolved.
Show resolved Hide resolved
#endif
}

Expand Down Expand Up @@ -2342,11 +2391,13 @@ template <typename _Abi>
const auto __minn
= __vector_bitcast<_Ip>(__vector_broadcast<_Np>(__norm_min_v<_Tp>));
#if __FINITE_MATH_ONLY__
return __absn >= __minn;
return _MaskImpl::template _S_convert<_Tp>(
_MaskImpl::_S_to_bits(__as_wrapper<_Np>(__absn >= __minn)));
#else
const auto __maxn
= __vector_bitcast<_Ip>(__vector_broadcast<_Np>(__finite_max_v<_Tp>));
return __minn <= __absn && __absn <= __maxn;
return _MaskImpl::template _S_convert<_Tp>(_MaskImpl::_S_to_bits(
__as_wrapper<_Np>(__minn <= __absn && __absn <= __maxn)));
yaozhongxiao marked this conversation as resolved.
Show resolved Hide resolved
#endif
}

Expand Down Expand Up @@ -2837,7 +2888,7 @@ template <typename _Abi>

// smart_reference access {{{2
template <typename _Tp, size_t _Np>
static constexpr void _S_set(_SimdWrapper<_Tp, _Np>& __k, int __i,
static constexpr void _S_set(_SimdWrapper<_Tp, _Np>& __k, size_t __i,
yaozhongxiao marked this conversation as resolved.
Show resolved Hide resolved
bool __x) noexcept
{
if constexpr (is_same_v<_Tp, bool>)
Expand Down
59 changes: 59 additions & 0 deletions experimental/bits/simd_fixed_size.h
Original file line number Diff line number Diff line change
Expand Up @@ -395,6 +395,15 @@ template <typename _Tp, typename _Abi0, typename... _Abis>
return second.template _M_simd_at<_Np - 1>();
}

template <size_t _Offset>
_GLIBCXX_SIMD_INTRINSIC constexpr auto _M_tuple_at() const
{
if constexpr (_Offset == 0)
return first;
else
return second.template _M_tuple_at<_Offset - simd_size_v<_Tp, _Abi0>>();
}

Comment on lines +398 to +406
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the name I expected it to do the same as _M_at. Maybe call it _M_tuple_with_element<N>. However, I think you can't use this function anyway; see below.
Also, what if _Offset > 0 but _Offset < simd_size_v<_Tp, _Abi0>?

template <size_t _Offset = 0, typename _Fp>
_GLIBCXX_SIMD_INTRINSIC static constexpr _SimdTuple
_S_generate(_Fp&& __gen, _SizeConstant<_Offset> = {})
Expand Down Expand Up @@ -1331,6 +1340,56 @@ template <int _Np>
});
}

// _S_gather {{{2
template <typename _Tp, typename _Up>
static inline _SimdMember<_Tp>
_S_gather(const _Up* __mem, const __int_for_sizeof_t<_Up>* __idx,
_TypeTag<_Tp>) noexcept
{
return _SimdMember<_Tp>::_S_generate([&](auto __meta) {
return __meta._S_gather(__mem, &__idx[__meta._S_offset],
_TypeTag<_Tp>());
});
}

// _S_gather {{{2
template <typename _Tp, typename _Up>
static inline _SimdMember<_Tp>
_S_gather(const _Up* __mem, const _SimdMember<_Tp>& __idx,
_TypeTag<_Tp>) noexcept
{
return _SimdMember<_Tp>::_S_generate([&](auto __meta) {
return __meta._S_gather(
__mem, __idx.template _M_tuple_at<__meta._S_offset>(),
_TypeTag<_Tp>());
});
}
Comment on lines +1356 to +1366
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you intend the types like this? I.e. the value_type of the index vector equals the value_type of the returned / gathered vector? I.e. you can't gather _SimdMember<float> unless __idx is of type _SimdMember<float>. I'd expect int for the latter.
Once you allow different types for gathered vector and index vector you will get size mismatches: e.g. gather floats using 64-bit int index vector. Or double / int, or float / int on AVX w/o AVX2. Then the _M_tuple_at access will fail. I hate this case, but I already handled it for some of the math functions. You should be able to find inspiration there. 😉


// _S_scatter {{{2
template <typename _Tp, typename _Up>
static inline void
_S_scatter(const _SimdMember<_Tp>& __v, _Up* __mem,
const __int_for_sizeof_t<_Up>* __idx, _TypeTag<_Tp>) noexcept
{
__for_each(__v, [&](auto __meta, auto __native) {
__meta._S_scatter(__native, __mem, &__idx[__meta._S_offset],
_TypeTag<_Tp>());
});
}

// _S_scatter {{{2
template <typename _Tp, typename _Up>
static inline void
_S_scatter(const _SimdMember<_Tp>& __v, _Up* __mem,
const _SimdMember<_Tp>& __idx, _TypeTag<_Tp>) noexcept
Comment on lines +1383 to +1384
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same type question as above. Did you want different types for __v and __idx?

Copy link
Author

@yaozhongxiao yaozhongxiao Feb 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I try to support different types in the following manner, but it seems that is not best way after reading our comments.
static inline _SimdMember<_Tp>
_S_gather(const _Up* __mem, const _SimdMember<__int_for_sizeof_t<_Tp>>& __idx,
_TypeTag<_Tp>) noexcept

{
__for_each(__v, __idx,
[&](auto __meta, auto __v_tuple, auto __idx_tuple) {
__meta._S_scatter(__v_tuple, __mem, __idx_tuple,
_TypeTag<_Tp>());
});
}

// _S_load {{{2
template <typename _Tp, typename _Up>
static inline _SimdMember<_Tp> _S_load(const _Up* __mem,
Expand Down
36 changes: 36 additions & 0 deletions experimental/bits/simd_scalar.h
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,42 @@ struct _SimdImplScalar
_TypeTag<_Tp>)
{ return __gen(_SizeConstant<0>()); }

// _S_gather {{{2
template <typename _Tp, typename _Up>
_GLIBCXX_SIMD_INTRINSIC static _Tp
_S_gather(const _Up* __mem, const __int_for_sizeof_t<_Up>* __idx,
_TypeTag<_Tp>) noexcept
{
return static_cast<_Tp>(__mem[__idx[0]]);
}

// _S_gather
template <typename _Tp, typename _Up>
_GLIBCXX_SIMD_INTRINSIC static _Tp
_S_gather(const _Up* __mem, const _Tp& __idx, _TypeTag<_Tp>) noexcept
{
return static_cast<_Tp>(__mem[__idx]);
} // }}}

// _S_scatter {{{2
template <typename _Tp, typename _Up>
_GLIBCXX_SIMD_INTRINSIC static void
_S_scatter(const _Tp& __v, _Up* __mem,
[[maybe_unused]] const __int_for_sizeof_t<_Up>* __idx,
_TypeTag<_Tp>) noexcept
{
__mem[__idx[0]] = static_cast<_Up>(__v);
}

// _S_scatter
template <typename _Tp, typename _Up>
_GLIBCXX_SIMD_INTRINSIC static void
_S_scatter(_Tp& __v, _Up* __mem, [[maybe_unused]] const _Tp& __idx,
_TypeTag<_Tp>) noexcept
{
__mem[__idx] = static_cast<_Up>(__v);
} // }}}

// _S_load {{{2
template <typename _Tp, typename _Up>
_GLIBCXX_SIMD_INTRINSIC static _Tp _S_load(const _Up* __mem,
Expand Down
45 changes: 45 additions & 0 deletions tests/simd.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1273,3 +1273,48 @@ TEST_TYPES(V, algorithms, all_test_types)
COMPARE(min(a, b), V{0});
COMPARE(max(a, b), V{1});
}


TEST_TYPES(V, gather, all_test_types)
{
using namespace std::experimental::parallelism_v2;
using T = typename V::value_type;

// generate simd value data
std::vector<T> vec_data(V::size(), 0);
__execute_n_times<V::size()>(
[&](size_t __i) { vec_data[__i] = static_cast<T>(__i); });
// generate index
using index_type = __int_for_sizeof_t<T>;
std::vector<index_type> index(V::size(), 0);
__execute_n_times<V::size()>(
[&](auto __i) { index[__i] = ((__i + 3) % V::size()); });

V x(vec_data.data(), index.data(), vector_aligned);
V y([&](auto __i) { return vec_data[index[__i]]; });

COMPARE(x, y);
}

TEST_TYPES(V, scatter, all_test_types)
{
using namespace std::experimental::parallelism_v2;
using T = typename V::value_type;

// generate simd value data
std::vector<T> vec_data(V::size(), 0);
__execute_n_times<V::size()>(
[&](size_t __i) { vec_data[__i] = static_cast<T>(__i); });
// generate index
using index_type = __int_for_sizeof_t<T>;
std::vector<index_type> index(V::size(), 0);
__execute_n_times<V::size()>(
[&](auto __i) { index[__i] = ((__i + 3) % V::size()); });

V x(vec_data.data(), index.data(), vector_aligned);

std::vector<T> vec_out(V::size(), 0);
x.scatter(&vec_out[0], index.data(), vector_aligned);
V y(vec_out.data(), vector_aligned);
COMPARE(vec_data, vec_out) << x << " => " << y;
}