load_[un]aligned and store_[un]aligned free functions only work on the largest vector width
I am not sure if this is a bug or feature, but there is no way to do a load/store of a vector which is smaller than the native vector width (e.g. a 128-bit batch<float, 4> on AVX) using the free function interface from xsimd/xsimd.hpp.
The way the current interface works, one has to use the constructor/method syntax for this kind of loads and stores.
It's something we've been thinking of for a while, but we didn't have the time to implement it yet. This indeed requires some changes in the interface of xsimd/xsimd.hpp.
We plan to implement that for next release.
With explicit architecture parameters, it is now possible to create a batch<float, sse4_2> even if AVX is actually available (because AVX implies SSE). There's no high-level support for converting those to batch<double, avx2> though, although that can be done using intrinsics like _mm256_cvtps_pd using batches as argument (batches have an implicit conversion to their register type).