Jackalope / jen (public) (License: GPLv3 or later version) (since 2018-10-24) (hash sha1)
----> ABOUT:

3D rendering and computing framework based on Vulkan API.

Libraries:
- simdcpp submodule (see my simdcpp repo)
- jmath submodule (see my jmath repo)
- mesh (constexpr generation of cubes, spheres, icosahedrons subdivisions)
- atlas (1D lines and 2D rectangles cutting)
- jlib submodule (see my jlib repo)
- jrf submodule (see my jrf repo)
- vkw (Vulkan API C++ wrapper)
Modules:
- compute (run compute shaders on gpu)
- graphics (draw models with clustered forward rendering and onscreen text)
- resource manager (load meshes, models, textures, scene data from
files and create related objects in graphics module)

----> INSTALLING:

To download all the parts of this framework it's enough to launch
git clone with recursive flag:

$ git clone —recursive ssh://rocketgit@ssh.rocketgit.com/user/Jackalope/jen

After this look at git tags:

$ git tag

It is recommended to use a tagged version instead of the latest commit,
because the first commit after the tagged one mostly includes incompatible
parts of future changes for the next version.

$ git checkout v0.1.0

----> DEPENDENCIES:

To use JEN as CMake subdirectory and successfully build programs with it
you need to make sure you have all of its dependencies:
- compiler: Clang or GCC, support for C++17. Clang 10+ or GCC 9+ is recommended,
compiling on Windows OS is tricky and requires something like MinGW with MSYS,
there are also some complications to go through to make dependencies work;
- GLFW3 library, supported version is 3.2.1;
- FreeType library, if graphics module will be used;
- Vulkan API headers, and optional validation layers to debug sneaky problems,
you also need Vulkan support in your graphics driver to run compiled programs;
- LibZip can be necessary, if JRF is used to read zip files;
- CMake, for obvious reasons;
- glslangValidator to compile shader for the graphics module.

CMake must be able to find GLFW3, Vulkan and FreeType (for graphics)
with find_package().

----> HOW TO USE IT:

To use JEN, you need to add it as a subdirectory:

add_subdirecroty(${PATH_TO_JEN})

There are several configuration options:
- JEN_MODULE_COMPUTE - turn compute module on for compiling and including;
- JEN_MODULE_GRAPHICS - turn graphics module on ...;
- JEN_MULTITHREADED_DRAW_FRAME - draw_frame function will use thread pool queue
instead of linear executing;
- JEN_MODULE_RESOURCE_MANAGER - resource manager module ON, if graphics is ON;
- JEN_VLK_VALIDATION - enable Vulkan Validation Layers to debug some errors
related to JEN. This will often produce false-positive,
as well as true-positive errors.

Look in CMakeLists.txt at JenExamples repo for details on how to use and
configure JEN automatically:

$ git clone ssh://rocketgit@ssh.rocketgit.com/user/Jackalope/JenExamples

Also I recommend to compile and run examples to make sure it works correctly.

----> SUPPORTED HARDWARE:

JEN has not been tested well, because it requires running it on large amount of
different hardware to do so. It must work with mesa driver and modern
Intel i965 GPUs as well as AMD GPUs.


----> DOCUMENTATION:

You can generate Doxygen documentation, to do so
turn on any of JEN_DOXYGEN_* options and run documentation target in cmake:

$ cmake -G %1 -DJEN_DOXYGEN_HTML=ON -DJEN_DOXYGEN_LATEX=ON
$ cmake —build —target documentation

Resource manager is not documented because it still requires large enhancements.
List of commits:
Subject Hash Author Date (UTC)
simd library moved to simdcpp submodule, also updates to match new version 36fa65052847fbb258e0ceaf2a2c3fab40e5c3a7 Jackalope 2020-04-21 23:19:24
jlib update d7d711f8b289b2a84e58c66d20c0871c21d17350 Jackalope 2020-04-15 09:54:57
hiding clang-10 new warnings 6d3a1a1dbc928d9ed75024848f06f541d55e1580 Jackalope 2020-04-15 09:54:39
vkw new Vulkan API result values 0eed6a08e659a2e58ed35d9870022d235f0d87d4 Jackalope 2020-04-15 09:54:04
removed temporary fence, validation layers still complaining ba2b7277f0db6e549e68619f8f61ecc50066343d Jackalope 2020-04-15 09:41:02
device queues had incorrect orders in memory layout 28bf6b0e793ae988c390571baa5e8654200f6b42 Jackalope 2020-04-15 07:10:19
updated new vulkan enum names a7f4e2fdd01884df5469abda3520ca3901a44532 Jackalope 2020-04-15 07:09:55
fix gradient values in simplex noise 75ee283d9f06b69936c9db3d1eabe5c876b418cf Jackalope 2020-04-03 04:00:15
conditional device queues loading based on ModulesMask 952bb3a4d1ae3c867309b136543bbc7a072684bb Jackalope 2020-04-02 02:44:36
fixed issue when gpu_transfer is sleeping while jobs queued 7af01807ffbc1ffae785745378f832303bcf79dc Jackalope 2020-03-31 07:40:10
real_coordinate_to_face_simd correct face values 7fb655197ff078fcb11b144025e08f9bedaa3b5d Jackalope 2020-03-29 14:41:29
chenged cube faces enumeration order a02f523b115a2ada261f6b4e65042925ad1b042c Jackalope 2020-03-29 14:24:52
gpu data destroy queue infinite loop bug fix eae02d74cb1c0cfa780d2038c12c6c568e9845d4 Jackalope 2020-03-29 09:05:16
jlib update 3137705c28c8dcbda3c742361cb31a040cf172cd Jackalope 2020-03-29 07:07:37
cube TOP and BOTTOM faces correct coordinates ad27c4322f9103d03891aadee452f52c7ec7eed1 Jackalope 2020-03-29 07:04:29
wrong atlas result value in assertion fixed a4776dc8935865640a12036be41b79084d762f82 Jackalope 2020-03-28 12:56:37
removed #pragma once from main file 1054379a52420eba68f62305675c3928dbd6a4ed Jackalope 2020-03-28 12:56:02
some identation fixes a7f2ba135bb6c8195682564727c1bd39721b8b17 Jackalope 2020-03-25 13:25:29
mesh library refactoring 4dd908c51d97035ed26cb1c413685c1a8ab15c77 Jackalope 2020-03-25 13:24:43
removed old unmaintained directory 75b147dd4dc5d3592da21a707ac301cd46eaaf3d Jackalope 2020-03-19 19:35:03
Commit 36fa65052847fbb258e0ceaf2a2c3fab40e5c3a7 - simd library moved to simdcpp submodule, also updates to match new version
Author: Jackalope
Author date (UTC): 2020-04-21 23:19
Committer name: Jackalope
Committer date (UTC): 2020-04-21 23:19
Parent(s): d7d711f8b289b2a84e58c66d20c0871c21d17350
Signer:
Signing key:
Signing status: N
Tree: 3f0acfa3a7acd6c5c3610e40f6fe462776f33709
File Lines added Lines deleted
.gitmodules 3 0
CMakeLists.txt 2 1
libs/mesh/cube.h 15 22
libs/mesh/polyhedron/icosahedron_quad_tesselated.h 5 5
libs/simd/CMakeLists.txt 0 11
libs/simd/def.h 0 219
libs/simd/simd.cpp 0 85
libs/simd/simd.h 0 31
libs/simd/simd_avx2.inl 0 168
libs/simd/simd_no.inl 0 144
libs/simd/simd_sse4.2.inl 0 142
libs/simdcpp 1 0
src/CMakeLists.txt 1 1
File .gitmodules changed (mode: 100755) (index 99ab83c..d3ad60b)
16 16 [submodule "libs/jrf"] [submodule "libs/jrf"]
17 17 path = libs/jrf path = libs/jrf
18 18 url = ssh://rocketgit@ssh.rocketgit.com/user/Jackalope/jrf url = ssh://rocketgit@ssh.rocketgit.com/user/Jackalope/jrf
19 [submodule "libs/simdcpp"]
20 path = libs/simdcpp
21 url = ssh://rocketgit@ssh.rocketgit.com/user/Jackalope/simdcpp
File CMakeLists.txt changed (mode: 100644) (index fced97a..d8511c8)
... ... configure_file(resources/fonts/IBMPlexMono.ttf
73 73 ${DST}/fonts/IBMPlexMono.ttf COPYONLY) ${DST}/fonts/IBMPlexMono.ttf COPYONLY)
74 74
75 75 add_subdirectory(resources/shaders) add_subdirectory(resources/shaders)
76 add_subdirectory(libs/simd)
76 add_subdirectory(libs/simdcpp)
77 77 add_subdirectory(libs/math) add_subdirectory(libs/math)
78 78 add_subdirectory(libs/mesh) add_subdirectory(libs/mesh)
79 79 add_subdirectory(libs/jrf) add_subdirectory(libs/jrf)
 
... ... set(JEN_INCLUDE_DIRS
92 92 ${GLFW_INCLUDE_DIRS} ${GLFW_INCLUDE_DIRS}
93 93 ${FREETYPE_INCLUDE_DIRS} ${FREETYPE_INCLUDE_DIRS}
94 94
95 ${SIMDCPP_INCLUDE_DIRS}
95 96 ${MATH_INCLUDE_DIRS} ${MATH_INCLUDE_DIRS}
96 97 ${JRF_INCLUDE_DIRS} ${JRF_INCLUDE_DIRS}
97 98 ${JLIB_INCLUDE_DIRS} ${JLIB_INCLUDE_DIRS}
File libs/mesh/cube.h changed (mode: 100644) (index 3f02c65..2f316a5)
3 3 #include "base.h" #include "base.h"
4 4 #include <math/vector.h> #include <math/vector.h>
5 5 #include <math/matrix.h> #include <math/matrix.h>
6 #include <simd/simd.h>
6 #include <simdcpp/simd.h>
7 7 #include <limits> #include <limits>
8 8
9 9 #define CONDITION_ARGSF bool texture_coord, bool normals #define CONDITION_ARGSF bool texture_coord, bool normals
 
... ... namespace mesh::cube
71 71
72 72 namespace mesh::cube namespace mesh::cube
73 73 { {
74 template<typename float_t> [[nodiscard]] constexpr inline
74 template<typename float_t> [[nodiscard]] constexpr
75 75 math::vec3<float_t> face_coordinate_to_real(Face face, float_t x, float_t y) { math::vec3<float_t> face_coordinate_to_real(Face face, float_t x, float_t y) {
76 76 using vec3 = math::vec3<float_t>; using vec3 = math::vec3<float_t>;
77 77 switch (face) { switch (face) {
 
... ... namespace mesh::cube
85 85 } }
86 86 } }
87 87
88 template<typename T>
89 [[nodiscard]] constexpr inline Face real_coordinate_to_face(T x, T y, T z) {
88 template<typename T> [[nodiscard]] constexpr
89 Face real_coordinate_to_face(T x, T y, T z) {
90 90 const Face faceX = x >= 0 ? Face::EAST : Face::WEST; const Face faceX = x >= 0 ? Face::EAST : Face::WEST;
91 91 const Face faceY = y >= 0 ? Face::BOTTOM: Face::TOP; const Face faceY = y >= 0 ? Face::BOTTOM: Face::TOP;
92 92 const Face faceZ = z >= 0 ? Face::NORTH : Face::SOUTH; const Face faceZ = z >= 0 ? Face::NORTH : Face::SOUTH;
 
... ... namespace mesh::cube
100 100 return abs_z > abs_y ? faceZ : faceY; return abs_z > abs_y ? faceZ : faceY;
101 101 } }
102 102
103 template<simd::Level level> [[nodiscard]] constexpr inline
104 simd::pint_simd<level> real_coordinate_to_face_simd(
105 simd::pint_simd<level> x,
106 simd::pint_simd<level> y,
107 simd::pint_simd<level> z)
108 {
109 using pi = simd::pint_simd<level>;
110
103 template<simd::Level L> [[nodiscard]] constexpr
104 simd::pint<L> real_coordinate_to_face_simd(
105 simd::pint<L> x, simd::pint<L> y, simd::pint<L> z) {
106 using pi = simd::pint<L>;
111 107 const pi zero = 0; const pi zero = 0;
112
113 const auto faceX = (x < zero).blendv(pi(EAST ), pi(WEST ));
114 const auto faceY = (y < zero).blendv(pi(BOTTOM), pi(TOP ));
115 const auto faceZ = (z < zero).blendv(pi(NORTH ), pi(SOUTH));
116
108 const auto faceX = (x < zero).blend(pi(WEST ), pi(EAST ));
109 const auto faceY = (y < zero).blend(pi(TOP ), pi(BOTTOM));
110 const auto faceZ = (z < zero).blend(pi(SOUTH), pi(NORTH ));
117 111 const auto abs_x = x.abs(); const auto abs_x = x.abs();
118 112 const auto abs_y = y.abs(); const auto abs_y = y.abs();
119 113 const auto abs_z = z.abs(); const auto abs_z = z.abs();
120
121 const auto faceZX = (abs_z > abs_x).blendv(faceX,faceZ);
122 const auto faceZY = (abs_z > abs_y).blendv(faceY,faceZ);
123 return (abs_x > abs_y).blendv(faceZY,faceZX);
114 const auto faceZX = (abs_z > abs_x).blend(faceZ,faceX);
115 const auto faceZY = (abs_z > abs_y).blend(faceZ,faceY);
116 return (abs_x > abs_y).blend(faceZX,faceZY);
124 117 } }
125 118
126 template<typename T> constexpr inline
119 template<typename T> constexpr
127 120 math::matrix<3, 3, T> rotation(Face face) { math::matrix<3, 3, T> rotation(Face face) {
128 121 if (face == 0) if (face == 0)
129 122 return math::scale<3>(1); return math::scale<3>(1);
File libs/mesh/polyhedron/icosahedron_quad_tesselated.h changed (mode: 100644) (index 0a2bf24..7fc143d)
1 1 #pragma once #pragma once
2 2 #include "icosahedron_quad.h" #include "icosahedron_quad.h"
3 3
4 #include <simd/simd.h>
4 #include <simdcpp/simd.h>
5 5 #include <templates/tree/red_black_key.h> #include <templates/tree/red_black_key.h>
6 6
7 7 #define TEMPLATE_ARGS template<bool texture_coord, bool normals, typename floating_t, typename index_t> #define TEMPLATE_ARGS template<bool texture_coord, bool normals, typename floating_t, typename index_t>
 
... ... namespace mesh::icosahedron_quad
46 46 return normalize(quads<T>[quad_i].vertices[0] + dH + dD + dUP); return normalize(quads<T>[quad_i].vertices[0] + dH + dD + dUP);
47 47 } }
48 48
49 template<simd::Level l> [[nodiscard]] inline
50 math::vec3<simd::pfloat_simd<l>> quad_coordinate_to_real(
51 unsigned int quad_i, simd::pfloat_simd<l> u, simd::pfloat_simd<l> v)
49 template<simd::Level l> [[nodiscard]]
50 math::vec3<simd::pfloat<l>> quad_coordinate_to_real(
51 unsigned int quad_i, simd::pfloat<l> u, simd::pfloat<l> v)
52 52 { {
53 53 using namespace math; using namespace math;
54 54 using namespace simd; using namespace simd;
55 using pf = simd::pfloat_simd<l>;
55 using pf = simd::pfloat<l>;
56 56
57 57 pf diagonal_ratio_x2 = (u + v); pf diagonal_ratio_x2 = (u + v);
58 58 pf horizontal_ratio = (v - u) / pf(2.f); pf horizontal_ratio = (v - u) / pf(2.f);
File libs/simd/CMakeLists.txt deleted (index eb5e4c7..0000000)
1 cmake_minimum_required(VERSION 3.5)
2
3 add_library(SIMD STATIC
4
5 def.h
6 simd.cpp
7 simd_avx2.inl
8 simd_sse4.2.inl
9 simd_no.inl
10 simd.h
11 )
File libs/simd/def.h deleted (index af786ef..0000000)
1 #pragma once
2
3 #include <cstdint>
4 #include <type_traits>
5 #include <cstddef>
6 #include <mm_malloc.h>
7 #include <limits>
8
9 namespace simd
10 {
11 enum Level
12 {
13 NO_SIMD,
14 SSE2, SSE4_1, SSE4_2,
15 AVX2, AVX512, MAX = AVX512
16 };
17
18 enum Alignment
19 {
20 NO_SIMD_ALIGNMENT = 4,
21 SSE2_ALIGNMENT = 16, SSE4_1_ALIGNMENT = 16, SSE4_2_ALIGNMENT = 16,
22 #ifdef __AVX2__
23 AVX2_ALIGNMENT = 32, AVX512_ALIGNMENT = 32
24 #else
25 AVX2_ALIGNMENT = NO_SIMD_ALIGNMENT, AVX512_ALIGNMENT = NO_SIMD_ALIGNMENT
26 #endif
27 };
28
29
30 Level level();
31
32 inline Level LEVEL()
33 {
34 static Level l = level();
35 #ifdef __MINGW64__ //FIXME mingw does something wrong
36 if (l >= simd::Level::AVX2)
37 l = SSE4_2;
38 #endif
39 return l;
40 }
41
42 template<typename MASK, int alignment>
43 struct mask;
44
45 template<typename MASK, int alignment>
46 struct get_mask
47 {
48 using T = mask<int32_t, alignment>;
49 };
50
51 template<typename PACKED, typename single, int alignment>
52 struct alignas(alignment) packed
53 {
54 constexpr static const unsigned int COUNT = sizeof (PACKED) / sizeof (single);
55
56 using P = PACKED;
57
58 inline packed();
59 inline packed(const single&);
60 inline explicit packed(const single*);
61
62 template<typename U = PACKED>
63 packed(const PACKED &p, typename std::enable_if<not std::is_same<U, single>::value>::type* = nullptr) : _(p) {}
64
65 [[nodiscard]] inline void *operator new(size_t size) { return _mm_malloc(size, alignment); }
66
67 void operator delete (void *p) { _mm_free(p); }
68
69 inline void extract_u(single *dst) const;
70 inline void extract (single *dst) const { extract_u(dst); }
71
72 template<typename P_O, typename single_other>
73 [[nodiscard]] inline static packed convert(const packed<P_O, single_other, alignment>&);
74
75 template<typename P_O, typename single_other>
76 [[nodiscard]] inline static packed cast (const packed<P_O, single_other, alignment>&);
77
78 [[nodiscard]] inline packed operator ~ () const;
79 [[nodiscard]] inline packed operator & (const packed&) const;
80 [[nodiscard]] inline packed operator ^ (const packed&) const;
81 [[nodiscard]] inline packed operator | (const packed&) const;
82
83 [[nodiscard]] inline packed and_not (const packed&) const;
84
85 [[nodiscard]] inline packed operator + (const packed&) const;
86 [[nodiscard]] inline packed operator - (const packed&) const;
87 [[nodiscard]] inline packed operator * (const packed&) const;
88 [[nodiscard]] inline packed operator / (const packed&) const;
89
90 inline packed& operator += (const packed &p) { *this = *this + p; return *this; }
91 inline packed& operator -= (const packed &p) { *this = *this - p; return *this; }
92 inline packed& operator *= (const packed &p) { *this = *this * p; return *this; }
93 inline packed& operator /= (const packed &p) { *this = *this / p; return *this; }
94
95 template<typename int_t>
96 [[nodiscard]] inline packed operator << (int_t) const;
97 template<typename int_t>
98 [[nodiscard]] inline packed operator >> (int_t) const;
99
100 //auto = mask
101 [[nodiscard]] inline typename get_mask<PACKED, alignment>::T operator == (const packed&) const;
102 [[nodiscard]] inline typename get_mask<PACKED, alignment>::T operator <= (const packed&) const;
103 [[nodiscard]] inline typename get_mask<PACKED, alignment>::T operator >= (const packed&) const;
104 [[nodiscard]] inline typename get_mask<PACKED, alignment>::T operator < (const packed&) const;
105 [[nodiscard]] inline typename get_mask<PACKED, alignment>::T operator > (const packed&) const;
106
107 [[nodiscard]] static inline packed mul_add (const packed&, const packed&, const packed&);
108 [[nodiscard]] static inline packed mul_sub (const packed&, const packed&, const packed&);
109 [[nodiscard]] static inline packed nmul_add (const packed&, const packed&, const packed&);
110
111 [[nodiscard]] inline packed mul_add (const packed &a, const packed &b) const { return mul_add(a,b,*this); }
112 [[nodiscard]] inline packed mul_sub (const packed &a, const packed &b) const { return mul_sub(a,b,*this); }
113 [[nodiscard]] inline packed nmul_add (const packed &a, const packed &b) const { return nmul_add(a,b,*this); }
114
115 [[nodiscard]] inline packed floor() const;
116 [[nodiscard]] inline packed rsqrt() const;
117
118 [[nodiscard]] inline packed abs() const;
119
120 [[nodiscard]] inline single row_max() const
121 {
122 //TODO
123
124 if (COUNT == 1)
125 {
126 float max;
127 extract(&max);
128 return max;
129 }
130
131 float v[COUNT];
132 extract(v);
133 for (unsigned int i = 1; i < COUNT; ++i)
134 if (v[0] < v[i])
135 v[0] = v[i];
136
137 return v[0];
138 }
139
140 [[nodiscard]] inline single row_min() const
141 {
142 //TODO
143
144 if (COUNT == 1)
145 {
146 float min;
147 extract(&min);
148 return min;
149 }
150
151 float v[COUNT];
152 extract(v);
153 for (unsigned int i = 1; i < COUNT; ++i)
154 if (v[0] > v[i])
155 v[0] = v[i];
156
157 return v[0];
158 }
159
160
161 template<typename PACKED_INT>
162 [[nodiscard]] static inline packed permute(const packed &a, const PACKED_INT &i);
163
164 [[nodiscard]] static inline packed hash(const packed &seed, const packed &x, const packed &y, const packed &z)
165 {
166 packed hash = z ^ (y ^ (x ^ seed));
167 hash = (hash * hash * packed(60493)) * hash;
168 return (hash >> 13) ^ hash;
169 }
170
171 PACKED _;
172 } __attribute__((aligned(alignment)));
173
174 template<typename PACKED_INT, int alignment> using pint = simd::packed<PACKED_INT, int32_t, alignment>;
175 template<typename PACKED_FLOAT, int alignment> using pfloat = simd::packed<PACKED_FLOAT, float, alignment>;
176
177 template<typename MASK, int alignment>
178 struct alignas(alignment) mask
179 {
180 using pint = packed<MASK, int32_t, alignment>;
181
182 inline mask(const MASK& m) : _(m) {}
183 inline mask(const pint &i) : _(i._) {}
184
185 [[nodiscard]] inline mask operator & (const mask &m) const { return pint(_) & pint(m._); }
186 [[nodiscard]] inline mask operator | (const mask &m) const { return pint(_) | pint(m._); }
187 [[nodiscard]] inline mask and_not (const mask &m) const { return pint(_).and_not(pint(m._)); }
188
189 inline mask operator ~ () const { return ~pint(_); }
190
191
192 template<typename P_T, typename S_T>
193 [[nodiscard]] inline packed<P_T, S_T, alignment> operator & (const packed<P_T,S_T, alignment> &p) const;
194
195 template<typename PACKED>
196 using pfloat = packed<PACKED, float, alignment>;
197
198 template<typename P>
199 [[nodiscard]] inline pfloat<P> add (const pfloat<P> &a, const pfloat<P> &b) const
200 {
201 return a + (pfloat<P>::cast(pint(_)) & b);
202 }
203
204 template<typename P>
205 [[nodiscard]] inline pfloat<P> sub (const pfloat<P> &a, const pfloat<P> &b) const
206 {
207 return a - (pfloat<P>::cast(pint(_)) & b);
208 }
209
210 [[nodiscard]] inline pint add (const pint &a, const pint &b) const { return a + (pint{_} & b); }
211 [[nodiscard]] inline pint sub (const pint &a, const pint &b) const { return a - (pint{_} & b); }
212
213 template<typename PF>
214 [[nodiscard]] inline pfloat<PF> blendv (const pfloat<PF> &if_false, const pfloat<PF> &if_true);
215 [[nodiscard]] inline pint blendv (const pint &if_false, const pint &if_true);
216
217 MASK _;
218 } __attribute__((aligned(alignment)));
219 }
File libs/simd/simd.cpp deleted (index 72b03ed..0000000)
1 #include "def.h"
2
3 namespace simd
4 {
5 #ifdef _WIN32
6
7 #include <intrin.h>
8
9 inline void cpuid(int32_t out[4], int32_t x)
10 {
11 __cpuidex(out, x, 0);
12 }
13 inline uint64_t xgetbv(unsigned int x)
14 {
15 return _xgetbv(x);
16 }
17
18 #else
19
20 #include <cpuid.h>
21
22 inline void cpuid(int32_t out[4], int32_t x)
23 {
24 __cpuid_count(x, 0, out[0], out[1], out[2], out[3]);
25 }
26 inline uint64_t xgetbv(unsigned int index)
27 {
28 uint32_t eax, edx;
29 __asm__ __volatile__("xgetbv" : "=a"(eax), "=d"(edx) : "c"(index));
30 return (uint64_t(edx) << 32) | eax;
31 }
32 #endif
33
34
35 Level level()
36 {
37 //https://github.com/Mysticial/FeatureDetector
38
39 int cpuInfo[4];
40 cpuid(cpuInfo, 0);
41
42 int &nIds = cpuInfo[0];
43
44 if (nIds < 0x00000001)
45 return NO_SIMD;
46
47 cpuid(cpuInfo, 0x00000001);
48
49 if ((cpuInfo[3] & 1 << 26) == 0) // SSE2
50 return NO_SIMD;
51
52 if ((cpuInfo[2] & 1 << 19) == 0) // SSE41
53 return SSE2;
54 // AVX
55 bool cpuXSaveSuport = (cpuInfo[2] & 1 << 26) != 0;
56 bool osAVXSuport = (cpuInfo[2] & 1 << 27) != 0;
57 bool cpuAVXSuport = (cpuInfo[2] & 1 << 28) != 0;
58
59 if (cpuXSaveSuport && osAVXSuport && cpuAVXSuport)
60 {
61 uint64_t xcrFeatureMask = xgetbv(0 /* = XCR_XFEATURE_ENABLED*/);
62 if ((xcrFeatureMask & 0x6) != 0x6)
63 return SSE4_1;
64 }
65 else return SSE4_1;
66
67 if (nIds < 0x00000007) // AVX2 FMA3
68 return SSE4_1;
69
70 cpuid(cpuInfo, 0x00000007);
71
72 bool cpuAVX2Support = (cpuInfo[1] & 1 << 5) != 0;
73
74 if (!cpuAVX2Support)
75 return SSE4_1;
76 // AVX512
77 bool cpuAVX512Support = (cpuInfo[1] & 1 << 16) != 0;
78 bool oxAVX512Support = (xgetbv(0 /* = XCR_XFEATURE_ENABLED*/) & 0xe6) == 0xe6;
79
80 if (!cpuAVX512Support || !oxAVX512Support)
81 return AVX2;
82
83 return AVX512;
84 }
85 }
File libs/simd/simd.h deleted (index 48a3965..0000000)
1 #pragma once
2
3 #include "def.h"
4 #include "simd_avx2.inl"
5 #include "simd_sse4.2.inl"
6 #include "simd_no.inl"
7
8 namespace simd
9 {
10 template<Level level>
11 using pfloat_simd = typename
12 std::conditional<Level::AVX2 <= level, pfloat_avx2, typename
13 std::conditional<Level::SSE4_1 <= level, pfloat_sse4, pfloat_nosimd>::type>::type;
14
15 template<Level level>
16 using pint_simd = typename
17 std::conditional<Level::AVX2 <= level, pint_avx2, typename
18 std::conditional<Level::SSE4_1 <= level, pint_sse4, pint_nosimd>::type>::type;
19
20 template<Level level>
21 using mask_simd = typename
22 std::conditional<Level::AVX2 <= level, mask_avx2, typename
23 std::conditional<Level::SSE4_1 <= level, mask_sse4, mask_nosimd>::type>::type;
24
25 constexpr const int ALIGNMENT[] =
26 {
27 NO_SIMD_ALIGNMENT,
28 SSE2_ALIGNMENT, SSE4_1_ALIGNMENT, SSE4_2_ALIGNMENT,
29 AVX2_ALIGNMENT, AVX512_ALIGNMENT
30 };
31 }
File libs/simd/simd_avx2.inl deleted (index 1efceee..0000000)
1 #pragma once
2
3 #ifdef __AVX2__
4
5 #include "def.h"
6
7 #include <immintrin.h>
8
9 #include <cstring>
10
11 namespace simd
12 {
13 using mask_avx2 = mask <__m256i, AVX2_ALIGNMENT>;
14 using pfloat_avx2 = packed<__m256, float, AVX2_ALIGNMENT>;
15 using pint_avx2 = packed<__m256i, int32_t, AVX2_ALIGNMENT>;
16
17 template<> struct get_mask<__m256, AVX2_ALIGNMENT> { using T = mask_avx2; };
18 template<> struct get_mask<__m256i, AVX2_ALIGNMENT> { using T = mask_avx2; };
19
20 #define PFLOAT(return) template<> inline return pfloat_avx2::
21 #define PFLOAT2(return) template<> template<> [[nodiscard]] inline return pfloat_avx2::
22
23 #define PINT(return) template<> inline return pint_avx2::
24 #define PINT2(return) template<> template<> [[nodiscard]] inline return pint_avx2::
25
26
27 PFLOAT() packed() : _(_mm256_setzero_ps()) {}
28 PFLOAT() packed(const float &f) : _(_mm256_set1_ps(f)) {}
29 #if 0
30 PFLOAT() packed(const float *p) : _(_mm256_loadu_ps(p)) {}
31 PFLOAT(void) extract(float *p) const { _mm256_storeu_ps(p, _); }
32 #else
33 PFLOAT() packed(const float *p) : _(_mm256_load_ps(p)) {}
34 PFLOAT(void) extract(float *p) const { _mm256_store_ps(p, _); }
35 #endif
36 PFLOAT(void) extract_u(float *p) const { _mm256_storeu_ps(p, _); }
37
38
39 PINT() packed(const int32_t &i) : _(_mm256_set1_epi32(i)) {}
40
41 PFLOAT2(pfloat_avx2) convert(const pint_avx2 &v) { return _mm256_cvtepi32_ps (v._); }
42 PFLOAT2(pfloat_avx2) cast (const pint_avx2 &v) { return _mm256_castsi256_ps(v._); }
43
44 PINT2(pint_avx2) convert(const pfloat_avx2 &v) { return _mm256_cvtps_epi32 (v._); }
45 PINT2(pint_avx2) cast (const pfloat_avx2 &v) { return _mm256_castps_si256(v._); }
46
47 PFLOAT(mask_avx2) operator < (const pfloat_avx2 &v) const
48 {
49 return pint_avx2::cast(pfloat_avx2(_mm256_cmp_ps(_, v._, _CMP_LT_OS)));
50 }
51 PFLOAT(mask_avx2) operator > (const pfloat_avx2 &v) const
52 {
53 return pint_avx2::cast(pfloat_avx2(_mm256_cmp_ps(_, v._, _CMP_GT_OS)));
54 }
55 PFLOAT(mask_avx2) operator <= (const pfloat_avx2 &v) const
56 {
57 return pint_avx2::cast(pfloat_avx2(_mm256_cmp_ps(_, v._, _CMP_LE_OS)));
58 }
59 PFLOAT(mask_avx2) operator >= (const pfloat_avx2 &v) const
60 {
61 return pint_avx2::cast(pfloat_avx2(_mm256_cmp_ps(_, v._, _CMP_GE_OS)));
62 }
63 PFLOAT(mask_avx2) operator == (const pfloat_avx2 &v) const
64 {
65 return pint_avx2::cast(pfloat_avx2(_mm256_cmp_ps(_, v._, _CMP_EQ_OS)));
66 }
67
68 PFLOAT(pfloat_avx2) operator + (const pfloat_avx2 &f) const { return _mm256_add_ps(_, f._); }
69 PFLOAT(pfloat_avx2) operator - (const pfloat_avx2 &f) const { return _mm256_sub_ps(_, f._); }
70 PFLOAT(pfloat_avx2) operator * (const pfloat_avx2 &f) const { return _mm256_mul_ps(_, f._); }
71 PFLOAT(pfloat_avx2) operator / (const pfloat_avx2 &f) const { return _mm256_div_ps(_, f._); }
72
73 PFLOAT(pfloat_avx2) operator & (const pfloat_avx2 &f) const { return _mm256_and_ps(_,f._); }
74 PFLOAT(pfloat_avx2) operator ^ (const pfloat_avx2 &f) const { return _mm256_xor_ps(_,f._); }
75
76 PFLOAT(pfloat_avx2) and_not (const pfloat_avx2 &f) const { return _mm256_andnot_ps(_,f._); }
77
78 PFLOAT(pfloat_avx2) mul_add (const pfloat_avx2 &a, const pfloat_avx2 &b, const pfloat_avx2 &c)
79 {
80 return _mm256_fmadd_ps (a._,b._,c._);
81 }
82 PFLOAT(pfloat_avx2) mul_sub (const pfloat_avx2 &a, const pfloat_avx2 &b, const pfloat_avx2 &c)
83 {
84 return _mm256_fmsub_ps (a._,b._,c._);
85 }
86 PFLOAT(pfloat_avx2) nmul_add (const pfloat_avx2 &a, const pfloat_avx2 &b, const pfloat_avx2 &c)
87 {
88 return _mm256_fnmadd_ps(a._,b._,c._);
89 }
90
91 PFLOAT2(pfloat_avx2) permute(const pfloat_avx2 &f, const pint_avx2 &i)
92 {
93 return _mm256_permutevar8x32_ps(f._, i._);
94 }
95 PFLOAT(pfloat_avx2) floor() const { return _mm256_floor_ps(_); }
96
97 PFLOAT(pfloat_avx2) rsqrt() const { return _mm256_rsqrt_ps(_); }
98
99 PFLOAT(pfloat_avx2) abs() const { return pfloat_avx2(-0.f).and_not(*this); }
100
101 #undef PFLOAT
102 #undef PFLOAT2
103
104
105 PINT() packed() : _(_mm256_setzero_si256()) {}
106 PINT() packed(const int32_t *p) : _(__m256i(_mm256_load_ps(reinterpret_cast<const float*>(p)))) {}
107
108 PINT(void) extract(int32_t *p) const { memcpy(p, &_, sizeof (_)); }
109
110 PINT(pint_avx2) operator + (const pint_avx2 &v) const { return _mm256_add_epi32(_, v._); }
111 PINT(pint_avx2) operator - (const pint_avx2 &v) const { return _mm256_sub_epi32(_, v._); }
112 PINT(pint_avx2) operator * (const pint_avx2 &v) const { return _mm256_mullo_epi32(_, v._); }
113
114 PINT(pint_avx2) operator ^ (const pint_avx2 &i) const { return _mm256_xor_si256(_,i._); }
115
116 PINT(pint_avx2) operator | (const pint_avx2 &i) const { return _mm256_or_si256 (_,i._); }
117 PINT(pint_avx2) operator & (const pint_avx2 &i) const { return _mm256_and_si256(_,i._); }
118 PINT(pint_avx2) and_not (const pint_avx2 &i) const { return _mm256_andnot_si256(_,i._); }
119
120 static_assert (0xffffffff == uint32_t(-1));
121 PINT(pint_avx2) operator ~ () const { return pint_avx2(_) ^ pint_avx2(-1); }
122
123 PINT(mask_avx2) operator > (const pint_avx2 &i) const { return _mm256_cmpgt_epi32(_, i._); }
124 PINT(mask_avx2) operator < (const pint_avx2 &i) const { return _mm256_cmpgt_epi32(i._, _); }
125 PINT(mask_avx2) operator == (const pint_avx2 &i) const { return _mm256_cmpeq_epi32(_, i._); }
126
127
128 PINT2(pint_avx2) operator << (int count) const { return _mm256_slli_epi32(_, count); }
129 PINT2(pint_avx2) operator >> (int count) const { return _mm256_srai_epi32(_, count); }
130
131 PINT(pint_avx2) abs() const { return _mm256_abs_epi32(_); }
132
133 #undef PINT
134 #undef PINT2
135
136 #define MASK(return) template<> [[nodiscard]] inline return mask_avx2::
137 #define MASK2(return) template<> template<> [[nodiscard]] inline return mask_avx2::
138
139 MASK2(pfloat_avx2) operator &(const pfloat_avx2 &f) const { return pfloat_avx2::cast(pint(_)) & f; }
140
141 MASK2(pfloat_avx2) blendv (const pfloat_avx2 &a, const pfloat_avx2 &b)
142 {
143 return _mm256_blendv_ps (a._, b._, __m256(_));
144 }
145
146 MASK(pint_avx2) blendv (const pint_avx2 &a, const pint_avx2 &b ) { return _mm256_blendv_epi8(a._, b._, _); }
147
148 #undef MASK
149 #undef MASK2
150
151 }
152
153 #else
154
155 #include "simd_no.inl"
156 namespace simd
157 {
158 using mask_avx2 = mask_nosimd;
159 using pfloat_avx2 = pfloat_nosimd;
160 using pint_avx2 = pint_nosimd;
161 }
162 #endif
163
164
165 #ifdef FN_COMPILE_AVX512
166 #error TODO AVX512
167 #include <x86intrin.h>
168 #endif
File libs/simd/simd_no.inl deleted (index 4a63826..0000000)
1 #pragma once
2
3 #include "def.h"
4 #include <cstring>
5 #include <cmath>
6
7 namespace simd
8 {
9 using mask_nosimd = mask<int32_t, NO_SIMD_ALIGNMENT>;
10 using pfloat_nosimd = packed<float, float, NO_SIMD_ALIGNMENT>;
11 using pint_nosimd = packed<int32_t, int32_t, NO_SIMD_ALIGNMENT >;
12
13 #define PFLOAT(return) template<> inline return pfloat_nosimd::
14 #define PFLOAT2(return) template<> template<> [[nodiscard]] inline return pfloat_nosimd::
15
16 #define PINT(return) template<> inline return pint_nosimd::
17 #define PINT2(return) template<> template<> [[nodiscard]] inline return pint_nosimd::
18
19
20 PFLOAT() packed() : _(0) {}
21 PFLOAT() packed(const float &f) : _(f) {}
22 PFLOAT() packed(const float *p) : _(*p) {}
23
24 PFLOAT(void) extract_u(float *p) const { *p = _; }
25
26 PFLOAT2(pfloat_nosimd) convert(const pint_nosimd &v) { return float(v._); }
27 PFLOAT2(pfloat_nosimd) cast (const pint_nosimd &v) { float f; memcpy(&f, &v, sizeof(float)); return f; }
28
29 PINT() packed(const int32_t &i) : _(i) {}
30
31 PINT2(pint_nosimd) convert(const pfloat_nosimd &v) { return int32_t (v._); }
32 PINT2(pint_nosimd) cast (const pfloat_nosimd &v) { int32_t i; memcpy(&i, &v, sizeof(float)); return i; }
33
34 PFLOAT(mask_nosimd) operator < (const pfloat_nosimd &v) const { return _ < v._ ? -1 : 0; }
35 PFLOAT(mask_nosimd) operator > (const pfloat_nosimd &v) const { return _ > v._ ? -1 : 0; }
36 PFLOAT(mask_nosimd) operator <= (const pfloat_nosimd &v) const { return _ <= v._ ? -1 : 0; }
37 PFLOAT(mask_nosimd) operator >= (const pfloat_nosimd &v) const { return _ >= v._ ? -1 : 0; }
38 PFLOAT(mask_nosimd) operator == (const pfloat_nosimd &v) const { return memcmp(&_, &v._, sizeof(_)) == 0 ? -1 : 0; }
39
40 PFLOAT(pfloat_nosimd) operator + (const pfloat_nosimd &f) const { return _ + f._; }
41 PFLOAT(pfloat_nosimd) operator - (const pfloat_nosimd &f) const { return _ - f._; }
42 PFLOAT(pfloat_nosimd) operator * (const pfloat_nosimd &f) const { return _ * f._; }
43 PFLOAT(pfloat_nosimd) operator / (const pfloat_nosimd &f) const { return _ / f._; }
44
45 union float_int { float f; int32_t i; };
46
47 PFLOAT(pfloat_nosimd) operator & (const pfloat_nosimd &f) const
48 {
49 float_int a{_}, b{f._};
50 float_int r;
51 r.i = a.i & b.i;
52 return r.f;
53 }
54 PFLOAT(pfloat_nosimd) operator ^ (const pfloat_nosimd &f) const
55 {
56 float_int a{_}, b{f._};
57 float_int r;
58 r.i = a.i ^ b.i;
59 return r.f;
60 }
61
62 PFLOAT(pfloat_nosimd) and_not (const pfloat_nosimd &f) const
63 {
64 float_int a{_}, b{f._};
65 float_int r;
66 r.i = ~ a.i & b.i;
67 return r.f;
68 }
69
70 PFLOAT(pfloat_nosimd) mul_add (const pfloat_nosimd &a, const pfloat_nosimd &b, const pfloat_nosimd &c)
71 {
72 return a * b + c;
73 }
74 PFLOAT(pfloat_nosimd) mul_sub (const pfloat_nosimd &a, const pfloat_nosimd &b, const pfloat_nosimd &c)
75 {
76 return a * b - c;
77 }
78 PFLOAT(pfloat_nosimd) nmul_add (const pfloat_nosimd &a, const pfloat_nosimd &b, const pfloat_nosimd &c)
79 {
80 return c - a * b;
81 }
82
83 PFLOAT(pfloat_nosimd) floor() const { return std::floor(_); }
84
85 PFLOAT(pfloat_nosimd) rsqrt() const
86 {
87 float_int num {_};
88
89 num.i = 0x5f3759df - (num.i >> 1);
90
91 float xhalf = 0.5f * _;
92 num.f = num.f*(1.5f - xhalf*num.f*num.f);
93 return num.f;
94 }
95
96 PFLOAT(pfloat_nosimd) abs() const { return std::abs(_); }
97
98 #undef PFLOAT
99 #undef PFLOAT2
100
101
102 PINT() packed() : _(0) {}
103 PINT() packed(const int32_t *p) : _(*p) {}
104
105 PINT(void) extract(int32_t *p) const { *p = _; }
106
107
108 PINT(pint_nosimd) operator + (const pint_nosimd &v) const { return _ + v._; }
109 PINT(pint_nosimd) operator - (const pint_nosimd &v) const { return _ - v._; }
110 PINT(pint_nosimd) operator * (const pint_nosimd &v) const { return _ * v._; }
111
112 PINT(pint_nosimd) operator ^ (const pint_nosimd &i) const { return _ ^ i._; }
113
114 PINT(pint_nosimd) operator | (const pint_nosimd &i) const { return _ | i._; }
115 PINT(pint_nosimd) operator & (const pint_nosimd &i) const { return _ & i._; }
116 PINT(pint_nosimd) and_not (const pint_nosimd &i) const { return ~_ & i._; }
117
118 static_assert (0xffffffff == uint32_t(-1));
119 PINT(pint_nosimd) operator ~ () const { return ~_; }
120
121 PINT(mask_nosimd) operator > (const pint_nosimd &i) const { return _ > i._ ? -1 : 0; }
122 PINT(mask_nosimd) operator < (const pint_nosimd &i) const { return _ < i._ ? -1 : 0; }
123 PINT(mask_nosimd) operator == (const pint_nosimd &i) const { return _ == i._ ? -1 : 0; }
124
125 PINT2(pint_nosimd) operator << (int count) const { return _ << count; }
126 PINT2(pint_nosimd) operator >> (int count) const { return _ >> count; }
127
128 PINT(pint_nosimd) abs() const { return std::abs(_); }
129
130 #undef PINT
131 #undef PINT2
132
133 #define MASK(return) template<> inline return mask_nosimd::
134 #define MASK2(return) template<> template<> inline return mask_nosimd::
135
136 MASK2(pfloat_nosimd) operator &(const pfloat_nosimd &f) const { return pfloat_nosimd::cast(pint_nosimd(_)) & f; }
137
138 MASK2(pfloat_nosimd) blendv (const pfloat_nosimd &a, const pfloat_nosimd &b) { return _ ? b : a; }
139
140 MASK(pint_nosimd) blendv (const pint_nosimd &a, const pint_nosimd &b) { return _ ? b : a; }
141
142 #undef MASK
143 #undef MASK2
144 }
File libs/simd/simd_sse4.2.inl deleted (index 04f1b09..0000000)
1 #pragma once
2
3 #if defined __SSE2__ or defined __SSE4_2__
4
5 #include "def.h"
6
7 #include <smmintrin.h>
8
9 #include <cstring>
10
11 namespace simd
12 {
13 constexpr const int SSE4_ALIGNMENT = 16;
14 using mask_sse4 = mask <__m128i, SSE4_ALIGNMENT>;
15 using pfloat_sse4 = packed<__m128, float, SSE4_ALIGNMENT>;
16 using pint_sse4 = packed<__m128i, int32_t, SSE4_ALIGNMENT>;
17
18 template<> struct get_mask<__m128, SSE4_ALIGNMENT> { using T = mask_sse4; };
19 template<> struct get_mask<__m128i, SSE4_ALIGNMENT> { using T = mask_sse4; };
20
21 #define PFLOAT(return) template<> inline return pfloat_sse4::
22 #define PFLOAT2(return) template<> template<> [[nodiscard]] inline return pfloat_sse4::
23
24 #define PINT(return) template<> inline return pint_sse4::
25 #define PINT2(return) template<> template<> [[nodiscard]] inline return pint_sse4::
26
27 PFLOAT() packed() : _(_mm_setzero_ps()) {}
28 PFLOAT() packed(const float &f) : _(_mm_set1_ps(f)) {}
29 PFLOAT() packed(const float *p) : _(_mm_load_ps(p)) {}
30
31 PFLOAT(void) extract(float *p) const { _mm_store_ps(p, _); }
32 PFLOAT(void) extract_u(float *p) const { _mm_storeu_ps(p, _); }
33
34 PINT() packed(const int32_t &i) : _(_mm_set1_epi32(i)) {}
35
36 PFLOAT2(pfloat_sse4) convert(const pint_sse4 &v) { return _mm_cvtepi32_ps (v._); }
37 PFLOAT2(pfloat_sse4) cast (const pint_sse4 &v) { return _mm_castsi128_ps(v._); }
38
39 PINT2(pint_sse4) convert(const pfloat_sse4 &v) { return _mm_cvtps_epi32 (v._); }
40 PINT2(pint_sse4) cast (const pfloat_sse4 &v) { return _mm_castps_si128(v._); }
41
42 PFLOAT(mask_sse4) operator < (const pfloat_sse4 &v) const
43 {
44 return pint_sse4::cast(pfloat_sse4(_mm_cmplt_ps(_, v._)));
45 }
46 PFLOAT(mask_sse4) operator > (const pfloat_sse4 &v) const
47 {
48 return pint_sse4::cast(pfloat_sse4(_mm_cmpgt_ps(_, v._)));
49 }
50 PFLOAT(mask_sse4) operator <= (const pfloat_sse4 &v) const
51 {
52 return pint_sse4::cast(pfloat_sse4(_mm_cmple_ps(_, v._)));
53 }
54 PFLOAT(mask_sse4) operator >= (const pfloat_sse4 &v) const
55 {
56 return pint_sse4::cast(pfloat_sse4(_mm_cmpge_ps(_, v._)));
57 }
58 PFLOAT(mask_sse4) operator == (const pfloat_sse4 &v) const
59 {
60 return pint_sse4::cast(pfloat_sse4(_mm_cmpeq_ps(_, v._)));
61 }
62
63 PFLOAT(pfloat_sse4) operator + (const pfloat_sse4 &f) const { return _mm_add_ps(_, f._); }
64 PFLOAT(pfloat_sse4) operator - (const pfloat_sse4 &f) const { return _mm_sub_ps(_, f._); }
65 PFLOAT(pfloat_sse4) operator * (const pfloat_sse4 &f) const { return _mm_mul_ps(_, f._); }
66 PFLOAT(pfloat_sse4) operator / (const pfloat_sse4 &f) const { return _mm_div_ps(_, f._); }
67
68 PFLOAT(pfloat_sse4) operator & (const pfloat_sse4 &f) const { return _mm_and_ps(_,f._); }
69 PFLOAT(pfloat_sse4) operator ^ (const pfloat_sse4 &f) const { return _mm_xor_ps(_,f._); }
70
71 PFLOAT(pfloat_sse4) and_not (const pfloat_sse4 &f) const { return _mm_andnot_ps(_,f._); }
72
73 PFLOAT(pfloat_sse4) mul_add (const pfloat_sse4 &a, const pfloat_sse4 &b, const pfloat_sse4 &c)
74 {
75 return c + a * b;
76 }
77 PFLOAT(pfloat_sse4) nmul_add (const pfloat_sse4 &a, const pfloat_sse4 &b, const pfloat_sse4 &c)
78 {
79 return c - a * b;
80 }
81 PFLOAT(pfloat_sse4) mul_sub (const pfloat_sse4 &a, const pfloat_sse4 &b, const pfloat_sse4 &c)
82 {
83 return a * b - c;
84 }
85
86 PFLOAT(pfloat_sse4) floor() const { return _mm_floor_ps(_); }
87
88 PFLOAT(pfloat_sse4) rsqrt() const { return _mm_rsqrt_ps(_); }
89
90 PFLOAT(pfloat_sse4) abs() const { return pfloat_sse4(-0.f).and_not(*this); }
91
92 #undef PFLOAT
93 #undef PFLOAT2
94
95
96 PINT() packed() : _(_mm_setzero_si128()) {}
97 PINT() packed(const int32_t *p) : _(__m128i(_mm_load_ps(reinterpret_cast<const float*>(p)))) {}
98
99 PINT(void) extract(int32_t *p) const { memcpy(p, &_, sizeof (_)); }
100 PINT(void) extract_u(int32_t *p) const { memcpy(p, &_, sizeof (_)); }
101
102 PINT(pint_sse4) operator + (const pint_sse4 &v) const { return _mm_add_epi32(_, v._); }
103 PINT(pint_sse4) operator - (const pint_sse4 &v) const { return _mm_sub_epi32(_, v._); }
104 PINT(pint_sse4) operator * (const pint_sse4 &v) const { return _mm_mullo_epi32(_, v._); }
105
106 PINT(pint_sse4) operator ^ (const pint_sse4 &i) const { return _mm_xor_si128(_,i._); }
107
108 PINT(pint_sse4) operator | (const pint_sse4 &i) const { return _mm_or_si128 (_,i._); }
109 PINT(pint_sse4) operator & (const pint_sse4 &i) const { return _mm_and_si128(_,i._); }
110 PINT(pint_sse4) and_not (const pint_sse4 &i) const { return _mm_andnot_si128(_,i._); }
111
112 static_assert (0xffffffff == uint32_t(-1));
113 PINT(pint_sse4) operator ~ () const { return pint_sse4(_) ^ pint_sse4(-1); }
114
115 PINT(mask_sse4) operator < (const pint_sse4 &i) const { return _mm_cmplt_epi32(_, i._); }
116 PINT(mask_sse4) operator > (const pint_sse4 &i) const { return _mm_cmpgt_epi32(_, i._); }
117 PINT(mask_sse4) operator == (const pint_sse4 &i) const { return _mm_cmpeq_epi32(_, i._); }
118
119 PINT2(pint_sse4) operator << (int count) const { return _mm_slli_epi32(_, count); }
120 PINT2(pint_sse4) operator >> (int count) const { return _mm_srai_epi32(_, count); }
121
122 PINT(pint_sse4) abs() const { return _mm_abs_epi32(_); }
123 #undef PINT
124 #undef PINT2
125
126
127 #define MASK(return) template<> inline return mask_sse4::
128 #define MASK2(return) template<> template<> [[nodiscard]] inline return mask_sse4::
129
130 MASK2(pfloat_sse4) operator &(const pfloat_sse4 &f) const { return pfloat_sse4::cast(pint(_)) & f; }
131
132 MASK2(pfloat_sse4) blendv (const pfloat_sse4 &a, const pfloat_sse4 &b)
133 {
134 return _mm_blendv_ps(a._, b._, pfloat_sse4::cast(pint(_))._);
135 }
136
137 MASK(pint_sse4) blendv (const pint_sse4 &a, const pint_sse4 &b ) { return _mm_blendv_epi8(a._, b._, _); }
138 #undef MASK
139 #undef MASK2
140
141 }
142 #endif
File libs/simdcpp added (mode: 160000) (index 0000000..4b5f4d1)
1 Subproject commit 4b5f4d1c1dd6f0b73239b162ef90210dd0d162a3
File src/CMakeLists.txt changed (mode: 100644) (index a78d248..72ea2fe)
... ... add_library(JEN STATIC
40 40 ) )
41 41
42 42 target_link_libraries(JEN target_link_libraries(JEN
43 SIMD
43 SIMDCPP
44 44 ATLAS ATLAS
45 45 VKW VKW
46 46 glfw glfw
Hints:
Before first commit, do not forget to setup your git environment:
git config --global user.name "your_name_here"
git config --global user.email "your@email_here"

Clone this repository using HTTP(S):
git clone https://rocketgit.com/user/Jackalope/jen

Clone this repository using ssh (do not forget to upload a key first):
git clone ssh://rocketgit@ssh.rocketgit.com/user/Jackalope/jen

Clone this repository using git:
git clone git://git.rocketgit.com/user/Jackalope/jen

You are allowed to anonymously push to this repository.
This means that your pushed commits will automatically be transformed into a merge request:
... clone the repository ...
... make some changes and some commits ...
git push origin main