Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace __ROR4__ stdlib, make _ROR2__ C frindly #540

Merged
merged 4 commits into from
Jan 6, 2019
Merged

Replace __ROR4__ stdlib, make _ROR2__ C frindly #540

merged 4 commits into from
Jan 6, 2019

Conversation

AJenbo
Copy link
Member

@AJenbo AJenbo commented Jan 1, 2019

This PR replaces the __ROR*__ C++-only helpers with inline ASM of what they try to emulate.

This allows for control.cpp to compile as C.

engine.cpp makes use of the init_event as so still can't be compiled as C.

@AJenbo AJenbo mentioned this pull request Jan 1, 2019
38 tasks
@AJenbo AJenbo changed the title Replace __ROR*__ with ASM for MS compiler Replace __ROR*__ with ASM for MS compilers Jan 1, 2019
@OmniBlade
Copy link

Since there are already wrappers for the C++ macro that limit them to specific type sizes, why not just go the whole hog and make them into macros that are valid standard C so the file can be compiled as C everywhere?
nemequ/portable-snippets#14 has a good discussion of idioms that clang and gcc recognised and translate to the correct asm.

@AJenbo
Copy link
Member Author

AJenbo commented Jan 4, 2019

Looks interesting, however this isn't my strong side, if you are able to create a PR with the changes that would be much appreciated 🙂

@ghost
Copy link

ghost commented Jan 4, 2019

what effects would this have? I am worried this would break any compatibility with other Os

@OmniBlade
Copy link

I doubt that VC6 could optimize the construct correctly to get your binary exact so you might need to fall back on the asm for that version specifically, but I've tested on godbolt and at -O2 and above all recent compilers including msvc correctly generate a rol or ror asm instruction. At -O1 only clang fails, generating a call rather than inlining the function, though its still the correct asm within the function. At -O0 it just generates the shifts and such.

Does VC6 have the __rotl and __rotr intrinsics? If so you could avoid the inline asm altogether with them.

@AJenbo
Copy link
Member Author

AJenbo commented Jan 4, 2019

@ApertureSecurity this woudn't affect OS but can affect the supported CPU architecture, of-cause we want to go with a solution that doesn't strictly limit us to x86.

@OmniBlade Unfortunately VC6 doesn't have __rotl or __rotr. What you describe sounds perfect, the functions in question aren't bin perfect at the moment so what it compiles to isn't critical, as you say we probably will need to use ASM for VS6 to eventually get it bin perfect, but we should have something fairly generic for all others.

VS6 does have _lrotr and _rotr,, so that might be an option?

@AJenbo
Copy link
Member Author

AJenbo commented Jan 4, 2019

@OmniBlade _rotr was suitable as a replacement for __ROR4__, thanks for pointing this out. With __ROR__ only having to support one type I was able to rewrite it in a C friendly way so the whole thing should now be a hole lot cleaner with no special cases for compilers. Could you take a new look at this PR and see what you think of the current solution?

@AJenbo AJenbo changed the title Replace __ROR*__ with ASM for MS compilers Replace __ROR4__ stdlib, make _ROR2__ C frindly Jan 4, 2019
@AJenbo
Copy link
Member Author

AJenbo commented Jan 4, 2019

As a small bonus this PR now also makes qmemcpy C compatible so the compiler doesn't have to fall back on memcpy

@AJenbo
Copy link
Member Author

AJenbo commented Jan 4, 2019

Note travis build failed because Mac Debug server went MIA (services issues), all others passed.

@OmniBlade
Copy link

_rotr is a windows only intrinsic, so none windows platforms will have to implement a conditionally compiled replacement themselves, but that will be a problem for another time.

If you know you don't have to handle negative cases (which should really be rotl anyhow), you can simplify ROR2 a lot more by removing the conditional and you can cut down on the math calculating the bits since you know its going to be constant 16.

For MSVC 2008 and above, you can include intrin.h and use _rotr16 for ROR2 but that doesn't help you with VC6 or other platforms.

@AJenbo
Copy link
Member Author

AJenbo commented Jan 5, 2019

I agree that we should deal with platform issues at a later point.

I stripped __ROR2__ down to the minimum, as I understand it most compilers should now be able to optimize it in to a ror instruction (VC6 is not, not even __rotr get compiled as ror).

Thanks for taking the time to review the PR

This also makes qmemcpy avalible to the C compiler
@mewmew
Copy link
Contributor

mewmew commented Jan 5, 2019

VC6 is not, not even __rotr get compiled as ror

Haha, that started my day with a laugh :D

@OmniBlade
Copy link

OmniBlade commented Jan 5, 2019

I'm amazed that the _rotr doesn't get compiled to the correct instructions. Looks like you'll have to roll your own version that relies on inline asm to get the job done, though I would suggest still defining it in the defs header if you can get away with it rather than using ifdefs to put it in all the functions that need it. I did some testing with https://godbolt.org/z/cE5ZwY and it seems a handrolled 16bit version only gets compiled to the correct ror instruction with clang 6.0 and higher and GCC 4.9 and higher. GCC also has intrinsics in x86intrin.h rorw and rord that are for 16bit and 32bit respectively.

Intel and MSVC don't recognise it for what it is for 8bit and 16bit, only the 32bit one and the 64bit one when compiling as 64bit, I'd need to check what the _rotr16 intrinsic from MSVC 2008 and above gets compiled to as that might be needed to get optimum code for that compiler.

This is more just for future information though as currently only windows builds are supported and the current PR looks okay to me.

@AJenbo
Copy link
Member Author

AJenbo commented Jan 5, 2019

My current thinking is that DrawSpellCel() was mostly inline ASM as it also makes use of XLAT, again something that the compiler is unable to produce from the function that we make use of. But that's a concern for another time, Brevik was more used to writing ASM then C as I understand it :)

@mewmew
Copy link
Contributor

mewmew commented Jan 6, 2019

@AJenbo this PR looks good to me. Would you mind doing a rebase to fix the conflict with MakefileVC? After that, and if you feel satisfied feel free to merge.

@AJenbo AJenbo merged commit ca64011 into diasurgical:nightly Jan 6, 2019
@AJenbo AJenbo deleted the __ROR2__-__ROR4__ branch January 6, 2019 02:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants