-
-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimization Issue in lv_memcpy Due to Macro Usage #7406
Comments
Are you able to explain which part of the current implementation is undefined behavior? |
The idea of using inline code above is good and it is even more expressive than the current builtin option, the problem here is the behavior change across different compilers, using inline keyword will serve as a hint, but will not enforces the function inline behavior from the compiler, some platforms have methods to force the function to be inline, but it is not C standard. On the same direction from @liamHowatt could you point which kind of issue did you find with the current approach? Something related to the address alignment? |
@liamHowatt @uLipe . Thanks a lot for your replies. So, the code switches between uint8_t* and uint32_t* pointers for optimization, but O3 might make aggressive assumptions about pointer aliasing that could break the alignment handling logic. The compiler might incorrectly assume these pointers can't alias to the same memory. Also, The manual loop unrolling (using that _REPEAT8 macro) could conflict with O3's auto-vectorization. The compiler might try to further optimize these already-unrolled loops, potentially causing cache-unfriendly access patterns or excessive register pressure. And last one is that O3 might reorder memory operations since there aren't explicit barriers. The _COPY macro (which does *d = *s; d++; s++;) could have its loads/stores reordered in ways we don't want. So, I'm guessing that what I'm having is that O3's aggressive optimizations might not properly respect my platform's alignment requirements. Given the strict aliasing, when we cast between uint8_t* and uint32_t*, we're technically violating this rule. The compiler sees you writing through a uint32_t*. Later, it sees you reading through a uint8_t*. Under strict aliasing, the compiler knows these pointers can't point to the same memory. So it might cache the uint32_t value in a register and not reload it when you read through the uint8_t pointer. About the solution, that's a fair assumption you mentioned. One thing we can do is adding volatile qualifiers or memory barriers at critical points. volatile uint8_t* d8 = dst;
volatile const uint8_t* s8 = src; Another alternative would be to use proper type punning: union ptr_cast {
uint8_t* p8;
uint32_t* p32;
};
// Then use it like:
union ptr_cast d;
d.p8 = d8;
uint32_t* d32 = d.p32; |
Your problem can be solved by adding the GCC compiler option "-fno-strict-aliasing"? |
The -fstrict-aliasing option is enabled at levels -O2, -O3, -Os. However, code that only works with GCC with -fno-strict-aliasing is not correct C. |
Regarding the pointer aliasing assumption issue, it only dereferences a |
@1saeed Thank you for the detailed reply. Have you really experienced a crash because of it or is it more like a theoretical issue? These functions might heavily affect the performance so I wonder if you have checked if there is no regression. |
Hello Gabor @kisvegabor. Sorry for the late reply! Had a super busy week. I can confirm that I’ve replicated the issue with unaligned pointers, external SDRAM memory , cache being enable and O3 optimization on my platform (STM23H7)
Using
|
LVGL version
v9.2,2
Platform
STM32H755 Platform
GCC (-O3)
-ffunction-sections
-fdata-sections
What happened?
The current implementation of lv_memcpy uses macros like _COPY and _REPEAT8, which can lead to undefined or unintended behavior under high compiler optimization levels (-O3). This results in hard faults in some cases.
Here’s the refactored version of the function using inline functions:
How to reproduce?
No response
The text was updated successfully, but these errors were encountered: