-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[a64] Implement an ARM64 backend #2259
Draft
Wunkolo
wants to merge
137
commits into
xenia-project:master
Choose a base branch
from
Wunkolo:arm64-backend
base: master
Could not load branches
Branch not found: {{ refName }}
Could not load tags
Nothing to show
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Separates the `Windows` platform into `Windows-x86_64` and `Windows-ARM64`. Adds `--arch` argument to `build`. Removes x64 backend on non-x64 targets.
Marked as TODO for now
Adding the `a64` backend will be a different PR. For now it's stubbed to the null backend to allow the main executable to open without failing initalization.
This value is currently returning `0` on ARM machines and throws an exception.
Adds the new `xenia-cpu-backend-a64` build-target with linkage following the x64 backend.
Header-only library for emitting arm64v8 instructions. Enables C++20 only for the a64 backend for now
Mostly element-accessors
First pass framework that gets emitted ARM code executing. Based on the x64 backend, implements an ARM64 JIT backend.
This just reverses the bytes of 32-bit values, not reverse the whole vector.
Wrong register index and vector-register size
These calls need to preserve and restore the `lr` register. Unit tests all run now!
These are stomping over X0 and Q0 which is returning input argument registers as return values. Fixes some guest-to-host calls.
Vector registers are passed as pointers rather than directly in the `Qn` registers. So these functions should be taking pointer-type arguments rather than vector-register types directly. Fixes `OPCODE_VECTOR_SHL` and passes unit tests.
We dont load it back so no need to store it
Passes all unit tests
Uses MOVI to optimize some cases of constants rather than EOR. MOVI is a register-renaming idiom on many architectures.
The LSL can be embedded into the ADD to remove an additional instruction. What was `cset`+`lsl`+`add` should now just be `cset`+`add ... LSL 12`
Use pair-stores rather than singular-stores to write 32-bytes of data at a time.
Uses the `CNTVCT_EL0`-register and applies frequency scaling
Passes cpu-ppc-tests
This is a very literal translation from the x64 code into ARM and may not be very optimized. Passes unit test save for a couple off-by-one errors.
Adds two new flags for allowing the use of LSE and FP16C
Narrow-saturation instructions causes off-by-one rounding errors. Using the min+max+shuffle passes more unit tests
Load the pointer to the VConst table once, and use offsets from this base address from the underlying enum value. Reduces the amount of instructions for each VConst memory load.
Detect when all bytes are repeating and use `MOVI` when applicable
Indices and non-const tables were using the same scratch-register
Uses `CNTFRQ` and `CNTVCT` system-registers as a raw clock source. On my ThinkPad x13s, the raw clock source returns a tick-frequency of 19,200,000 while the platform clock source(QueryPerformanceFrequency) returns 10,000,000. Almost double the accuracy over the platform-clock!
Misses some during the first pass. Now the config files with mention a64 differences.
Read direction from the ZR in the case that we are just storing a 64 or 32 bit zero
This directly maps to the QC bit in the FPSR. Just have to make sure that the saturated instruction is the very last instruction(which is currently the case for stuff like VECTOR_ADD and such).
Latest iteration running Beautiful Katamari and Geometry Wars. Still some minor issues but serving gameplay now. kata.mp4geo.wars.mp4 |
The 64-bit cases uses a particular Replicated 8-bit immediate so something else will have to handle that This cases a lot of cases without having to touch memory. Does not catch cases of `1.0`(0x3f800000).
`FMOV` encodes an 8-bit floating point immediate that can be used to accelerate the loading of certain constant floating point values between -31.0 and 32.0. A lot of immediates such as -1.0, 1.0, 0.5, etc fall within this range and this code gets lots of hits in my testing. This is much more optimal than trying to load a 32/64-bit value in W0/X0 and moving it into an FP register.
Uses LSE when available, but provides an armv8.0 baseline implementation.
No longer requires Armv8.1. Instructions are emitted with an Armv8.0-a baseline and will detect features such as FP16 and LSE and such before utilizing them(and expose them in the feature-mask config similar to x64). |
Removes all comments relating to x64 implementation details
`dc civac` causes an illegal-instruciton on Windows-ARM. This is likely as a security measure against cache-attacks. On Linux this instruction is trapped into an EL1 kernel function. Windows does not seem to have any user-mode cache-maintenance instructions available for data-cache(only instruction-cache via `FlushInstructionCache`). The closest thing we can do for now is a full data memory-barrier with `dsb ish`. Prefetches are implemented using `prfm pldl1keep, ...`.
Out-of-bound shift-values are handled as modulo-element-size
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Implements a 64-bit ARM backend that emits a64 instructions using oaknut.
Depends on #2258 and xenia-project/FFmpeg#8
Addresses #2002
Tested on a ThinkPad X13s and uses unit tests from #1348 as well. There is currently a ARMv8.1-a requirement due to the use of some of the newer atomic instructions such as CASAL.