-
Notifications
You must be signed in to change notification settings - Fork 152
Description
I am dealing with a codebase which copies data from a float_v type object to an float[] array. The code checks if there is sufficient size, say for 4 for __m128, to cast it to float_v or use gather with mask for partial load.
In that code I see the implementation as such, in a basic sample form:
const std::size_t data_size{7} ;
float data[ data_size ] = {0.0f, 1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f} ;
std::size_t index{5} ;
const float_v tmp_data = float_v::Zero();;
const std::size_t float_vLen = float_v::Size;
if( (index+float_vLen) < data_size) {
reinterpret_cast<float_v&>(data[index]) = tmp_data;
} else {
const uint_v indices(uint_v::IndexesFromZero());
(reinterpret_cast<float_v&>(data[index])).gather(reinterpret_cast<const float*>(&tmp_data), indices, simd_cast<float_m>(indices<(data_size - index)));
}Problem
Is the gather implementation in the else condition safe? I have seen mentions that casting from float* to __m128* is okay but the other way around is not safe. See for reference this stackoverflow answer and this stack overflow post
I feel like I should scatter or store to a float[4] array and then pass its address instead. Or is the current implementation safe?
Another question, which would be really helpful if you could answer:
I am not sure about the cast to reference type float_v& in the above implementation. It doesn't "feel" right. I can think of four different ways to load:
1.
alignas(16) float float_arr[4] = {0, 1, 2, 3} ;
float_v load_simd.load(float_arr) ; alignas(16) float float_arr[4] = {0, 1, 2, 3} ;
float_v gather_simd.gather(float_arr, uint_v::IndexesFromZero()) ; alignas(16) float float_arr[4] = {0, 1, 2, 3} ;
float_v* cast_ptr_simd = reinterpret_cast<float_v*>(float_arr) ; alignas(16) float float_arr[4] = {0, 1, 2, 3} ;
float_v& cast_ref_simd = reinterpret_cast<float_v&>(float_arr) ; Which would be the "best way" or good practice from your expert perspective?