-
-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Colorkey blit optimization #2696
Colorkey blit optimization #2696
Conversation
4184342
to
c79ca8a
Compare
When I read the title, I assumed this was a SIMD implementation of What you've made is not for alpha blitting all, it's another special cased blit routine to take over from SDL. Therefore I don't think should be in pygameBlit, because everything in pygameBlit is for alpha or blend flags? If you can determine what code path it's meant for in surface.blit, why not send it there directly? Secondly, could this be upstreamed to SDL? That way we don't need a special path at all, and everyone's code gets faster. |
I believe that with pygame having "game" in its name and many softwares made with it being pixel art games, this change is really important and worth implementing it directly here and not going through SDL. Besides, it won't require us changing it or expanding it again since it's a unique case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
I think a fair comparison is probably between the current SDL color key path using RLE acceleration and this method which doesn't use RLE. However, AVX2 comes out on top either way and it looks to me to be pretty much a dead heat between SDL RLE path and the SSE2 version of this function.
Might be worth adding a documentation note to the set_colorkey function if this gets merged in that on AVX2 platforms (windows, non-arm linux) that 32 bit colorkeys will be faster without RLE.
Or we just bite the bullet and deprecate RLE for 32 bit entirely in pygame. Though maybe we could AVX2 8bit colorkey stuff as well... hmm.
On SDL2 - while it would be nice to upstream stuff to SDL2 I don't see why we can't slip this in to pygame for now and leave upstreaming stuff as future work. There is probably lots of our AVX2 blitting code that could be upstreamed to SDL, not just this.
I think you don't want to implement it in SDL because you're less familiar with the development process and foibles of contributing to SDL, you think it would be way harder than implementing it here, and you're probably right. It would be the most significant contribution to SDL by a pygame(-ce) contributor in my memory. I only suggested it because I think you have what it takes to actually do it.
I don't think this is the case. I think the reason we have blitting code at all is because the algorithm changed between SDL1 and SDL2. We can't upstream what isn't compatible with what they have. |
I don't get if this is a strong/weak no or a strong/weak yes @Starbuck5. |
Most of pygame game types consist of pixel art, which oftentimes utilizes colokey blitting for effective use of pixel assets. This PR optimizes that with AVX2/SSE2 significantly. This strategy is limited to 32bit surfaces that don't use RLE.
The improvements vary widely with surface size so refer to these graphs for a better visualization:
![Figure_1](https://private-user-images.githubusercontent.com/103119829/300184570-b51e89d3-128f-4108-9d2e-e6577467af8c.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTkxOTMyNDcsIm5iZiI6MTcxOTE5Mjk0NywicGF0aCI6Ii8xMDMxMTk4MjkvMzAwMTg0NTcwLWI1MWU4OWQzLTEyOGYtNDEwOC05ZDJlLWU2NTc3NDY3YWY4Yy5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNjI0JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDYyNFQwMTM1NDdaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0yODBjMTc1ZDk1NjQ1YmI2NjNlNDMyZGQzZWJkYzM4NGIyZjM4YzdlZTU0N2ZhMDNkOTgxYmI4YWNlOTExMzYyJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.ZqDkP4S1PV5xuFHANdkENFlX3xOFptrb4KeB43-z_vI)
![Figure_1](https://private-user-images.githubusercontent.com/103119829/300184618-ab16f29b-0f6a-43d1-a5d1-61f54113aeb5.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTkxOTMyNDcsIm5iZiI6MTcxOTE5Mjk0NywicGF0aCI6Ii8xMDMxMTk4MjkvMzAwMTg0NjE4LWFiMTZmMjliLTBmNmEtNDNkMS1hNWQxLTYxZjU0MTEzYWViNS5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNjI0JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDYyNFQwMTM1NDdaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1jMmI0MzA4MGQ5MGUyMjQ0NzMyNTczNmUzNjAyNDkwZWM4OTZmOTNiMWYyMmU4YjUyM2Y4MzM1NWQ3NWZhNTFkJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.liEiGkuHG1FECleNHgv4jKMl8Yv3WboHMcjZYKUp77Q)
![Figure_1](https://private-user-images.githubusercontent.com/103119829/300184660-9100b8b6-8d27-4a58-9d43-25265d45c514.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTkxOTMyNDcsIm5iZiI6MTcxOTE5Mjk0NywicGF0aCI6Ii8xMDMxMTk4MjkvMzAwMTg0NjYwLTkxMDBiOGI2LThkMjctNGE1OC05ZDQzLTI1MjY1ZDQ1YzUxNC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNjI0JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDYyNFQwMTM1NDdaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT00ZGYyM2ZhNDk4YzlkYjdiMzE0ZjdkZjNhYmFiOTA3OWM4YjE3MDU2YzNjZDBhOTc3OWI3NTQ3ZDk3N2VjZWFkJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.il6Kl8qJfxQ8amxPAVpi6LYYAWD_zy-12LH_a88P79o)
![Figure_1](https://private-user-images.githubusercontent.com/103119829/300185036-43ad1467-ae59-42a5-9672-a239fcc9e8a4.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTkxOTMyNDcsIm5iZiI6MTcxOTE5Mjk0NywicGF0aCI6Ii8xMDMxMTk4MjkvMzAwMTg1MDM2LTQzYWQxNDY3LWFlNTktNDJhNS05NjcyLWEyMzlmY2M5ZThhNC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNjI0JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDYyNFQwMTM1NDdaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1iMDhlOGIzYmZmYjc3NmFkMWQ3ZWQ3ZjgzNzAyNWYzODc4YjVjMTI4OGE5OTgwMjRiMGNkZGE2Yzc2ZWM3ZjZjJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.n7xD9j6yMySMEPvTmejCn8k14HKdp1RqSxcWxYNr1NY)
Current VS AVX2
Current VS SSE2
SSE2VS AVX2
My actual timings files:
blit_colorkey_AVX2.json
blit_colorkey_old.json
blit_colorkey_SSE2.json
All made with the following test program:
And this is the data_utils file if you're interested: