Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt to consolidate SSE and ARM NEON SIMD code for GCC/clang and Visual Studio #252

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

fanc999
Copy link
Contributor

@fanc999 fanc999 commented May 11, 2022

Hi,

This attempts to clean up the code a bit in graphene-simd4f.h and graphene-simd4x4f.h by trying to reduce the code duplication for SSE and ARM NEON SIMD implementation due to syntactical differences in Visual Studio and GCC/clang in regards to inlining, via:

  • Defining macros that deals with the inlining method (__extension__ and direct intrinsic call) supported by GCC/clang and Visual Studio, for calls that could be done as one-liners.
  • In a similar fashion, define initializing graphene_simd4f_t arrays from the 4 floats that we pass in, especially as we required C99 support for a while and the supported Visual Studio compilers have the needed support for this.
  • Remove unneeded repetitions in the code.
  • Prefix the MSVC-specific implementations with graphene_msvc_ instead of just _ to make things clearer to people[1].

[1]: Sadly, I was not able to do the cleanup for the SIMD code that are done in a function-like manner. I couldn't get the preprocessor happy in one shot for Visual Studio and clang, ugh :|, so I had to leave that alone, since preprocessors don't allow a working #define inside a macro and doesn't like splitting lines when set apart by #if/#ifdef's. So this is the best I could do for now. For instance:

(unrelated parts omitted for brevity, trying to remember things on top of my head, so there might be some mistakes below)

(graphene-macros.h)
#if defined (__GNUC__) || defined (__clang__)
...
#define GRAPHENE_FUNCCALL_2ARG_MACRO(ftype,fname,v0,v1) \
  (__extension({

#define GRAPHENE_FUNCCALL_2ARG_BEGIN(rtype,ftype,fname,t0,v0,t1,v1)
#define GRAPHENE_FUNCCALL_BODY(expr) expr;
#define GRAPHENE_FUNCCALL_RETURN(rtype,rvalue) (rtype) rvalue;
#define GRAPHENE_FUNCCALL_END \
  }))
#elif defined (_MSC_VER)
...
#define GRAPHENE_FUNCCALL_2ARG_MACRO(ftype,fname,v0,v1) \
  graphene_msvc_##ftype##_##fname## (v0, v1)

#define GRAPHENE_FUNCCALL_2ARG_BEGIN(rtype,ftype,fname,t0,v0,t1,v1) \
static inline rtype \
graphene_msvc_##ftype##_##fname## (t0 v0, t1 v1) \
{

#define GRAPHENE_FUNCCALL_BODY(expr) expr;
#define GRAPHENE_FUNCCALL_RETURN(rtype,rvalue) return rvalue;

#define GRAPHENE_FUNCCALL_END \
}
#else
...

(graphene-simd4f.h)
...
#  define graphene_simd4f_get(s,i) \
  GRAPHENE_FUNCCALL_2ARG_MACRO (simd4f, get,s ,i) \ /* for this line, it's either with the trailing backslash for GCC/clang or without it for MSVC :(, otherwise other lines here all work */
  GRAPHENE_FUNCCALL_2ARG_BEGIN (float, simd4f, get, graphene_simd4f_t, int, s, i) \
  GRAPHENE_FUNCCALL_BODY (graphene_simd4f_union_t __u = { (s) }) \
  GRAPHENE_FUNCCALL_RETURN (float, __u.f[(i)]) \
  GRAPHENE_FUNCCALL_END
...

I understand that this PR might well conflict with the changes in #251, so if one of this or #251 goes through, I will fix things up as needed as soon as possible.

With blessings, thank you!

This way, we can try to abstract uses of such calls between different compilers
that we support instead of repeating them in the headers due to differences in
compiler syntax/feature support.
Use the newly-added macros to abstract the one-liner intrinsic calls for
GCC/CLang and Visual Studio for building the SSE code, to reduce duplication.
It's not totally exhausive, but should cover quite a number of items.
Use the newly-added macros to abstract one-liner intrinsic calls for GCC/CLang
and Visual Studio for building the ARM NEON code, to reduce duplication.  It's
not totally exhausive, but should cover quite a number of items.
These macros can be used to abstract initializing SIMD data arrays for the
different compilers that we support, especially as we already require C99
support for building and using Graphene.
Use the macros that we just added to initialize the graphene_simd4f_t arrays
with the floats that we pass into graphene_simd4f_init*() as applicable.  This
especially simplifies the code for Visual Studio since we already require C99
support for building and using Graphene, and we can reduce some code
duplication.
We don't need identical typedefs separate for GCC/CLang and Visual Studio.  Put
them in one place for all cases.
We can define them instead to call the respective graphene_simd4f_get()
accordingly instead.
...with graphene_msvc_ instead of just _.  This attempts to make things clearer
to people.
...with graphene_msvc_ instead of just _.  This attempts to make things clearer
to people.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants