Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite shift-AND algorithm to enable compiler auto vectorization when GCC 14.1 becomes available. #171

Closed
rhpvorderman opened this issue Jun 4, 2024 · 5 comments · Fixed by #220

Comments

@rhpvorderman
Copy link
Owner

See https://godbolt.org/#z:OYLghAFBqd5QCxAYwPYBMCmBRdBLAF1QCcAaPECAMzwBtMA7AQwFtMQByARg9KtQYEAysib0QXACx8BBAKoBnTAAUAHpwAMvAFYTStJg1AB9U8lJL6yAngGVG6AMKpaAVxYMQkgGykHAGTwGTAA5dwAjTGIQACZfAAdUBUJbBmc3Dy8EpJSBQOCwlkjouItMKxsBIQImYgJ0908fMorU6tqCfNCIqNjfBRq6hszmgY6uwuK%2BgEoLVFdiZHYOAFIYgGYg5DcsAGoV9ccB/EEAOgQD7BWNAEE1zYZt1z2Do4J0LCpzy%2Bu724A3VB4dD7W40BjoYwsJgEZAITAKCBoBgDXauIIEAAcxgIuwAVMQEQRiHhrLsBiSjKRyXgAF6YHHk4lBYDGehGAgIUi/Xa8mn0xkMHrEYyoKjGeIwghRFHc258tEY7ySRkEokksn8VwQqFMBQAawU1J5fPRgmVqsJFNJuKChF1BqNJt5ZoIFtxeLV1rJ4UI0MdcpuCuSAo9eOhsIQxgGJEw01BdwA7AAhZ27fjEXYQEMM20HAAiGgOyd2eFeQqKUVF4slBGlxBRxdLa1TMWT8ZWKbTCtd7vT8x1/v1Gn263z/e1kKHChWAFZk2XZ/mm93TUqVbitYO9fquKPx1upzuZ/O8Ps27suHPl%2BsS6uXevGYeHfqYvuJ9vHXOF%2BeSzFryu8oKmu5obh%2BR4Gus77PtO35ni2uzrABt4JveiqgYydoEC%2BI4FqWDD2rBp7IamQHAehbpgVhL57nh1FET%2BCFXkuxZob2VEEdhQ5vnRnEviejEXv%2BLG3mxj62nxQ5QbxhHHnBv6ISRvxiRhHq%2BgQQ64WOuzqQxi43qRQbkRRfZ4rpO60dp5lfqeCnMQZKmUaq1mvu%2BLkCfBQlKWRwHsc5fo7tJVkBTZgklkhImGY5fYAEpaeORaiT5PbibsMWWQlrHJSBTm4jFPHaYlhnGSZYExUFmVJX8RnkX5uJiISTDoAAnsYh7vkVaEZlmOaMtoBZFbs2ijo4TKUqy7LAJyTb9W2LYdl22XkXFI2vHhV5VSVCrpathzrVlNVbflu2OPtm1bby5UnWdUVLcBK2dqd2n0TunV3dttGJk944vQaG3FRdaU8V9MlcTu/7nRdV2PaDL5IZDW2utiElYKo77WkY379ZFaHbbhcR4S5RanhCmCqN5h0lTtazeITIXDnBpNo0xFOA0D5608FGkWYzqOs4D0ME1zQ7MQuTN2RTynvXyeBUFmD1xOBOH7CDX1pbRiswRZC0A2zuwRnC0ZEIScEAfO2PLnNY7/bjfKdsu0u8rL8s8ZrA4Qa5MNq4L3hK1JOu28BBtRjGJvESx5sAVb%2BY247KsO5TduJgnCr28pyccLMtCcLOvCeNwvCoJwo0KPMiyYOe6w8KQBCaJnszwk1vQQLM%2BogLOGj6Jwkh53XpBFxwvAKCAne1xwWizHAsBIGgLDxHQUTkJQs/z/Q0TIMAUhcHwdD1sPEDhH3vrMMQzWcNXx%2B1M1ADy4TaJg1jn7ws9sII18MLQZ/j7wWDhK4wCODELQYeBdSBYGhEYcQ38wF4EJNYPA/wER9zJg/Vw0on7kEEOUPutA8DhGIFfZwWA%2B7MhYBgxBxBwhJEwPmTAEDgC4KMHXWYVADDAAUAANTwJgAA7tfeIjAMH8EECIMQ7ApAyEEIoFQ6hoG6G3gYJhphjDmFweEYekBZioHiJUFEnAAC018oL6OOAWZAABOcxhcKEkiwBoluLQH66PsBCYYnht4BCCN0Ss0Rt6JGSLotxeh/G5AYBMYUehLBOLaIMeoLhGiRPKNEqosTwk%2BMibEoJ28xh1DSVMLgsxS4LCWBILOOde7QIHrsVQmJvD6OVLsYAyBkCXkkKcPcEBcCEBIJXApvAx4TwbpgJu0QHFtw7l3DgPdSD5y0P3TgQ8R412YWUjgMQKlzIHv0lZpAKHJDsJIIAA%3D

@rhpvorderman
Copy link
Owner Author

GCC 14 is available on 2_34 manylinux images. Hurray!

https://github.com/pypa/manylinux?tab=readme-ov-file#manylinux_2_34-almalinux-9-based

@rhpvorderman
Copy link
Owner Author

Bioconda still uses GCC 13.3. As per the latest cutadapt build.

@rhpvorderman
Copy link
Owner Author

Whoops I made a mistake. Auto vectorization is not yet possible when the implementation is correct. Oh well. I still need to refactor it.

@rhpvorderman
Copy link
Owner Author

Never mind. This is the correct way of rewriting it:

https://godbolt.org/#z:OYLghAFBqd5QCxAYwPYBMCmBRdBLAF1QCcAaPECAMzwBtMA7AQwFtMQByARg9KtQYEAysib0QXACx8BBAKoBnTAAUAHpwAMvAFYTStJg1AB9U8lJL6yAngGVG6AMKpaAVxYMQAZgAcpBwAyeAyYAHLuAEaYxCCSGqQADqgKhLYMzm4e3n5JKTYCQSHhLFExcRaYVvkMQgRMxAQZ7p6%2BFVVptfUEhWGR0bHxCnUNTVmtQ109xaUDAJQWqK7EyOwcAKQATF7ByG5YANRrXo5D%2BIIAdAhH2GsaAIKb2wy7rgdHJwToWFSX17cP9wAbqg8OhDvcaAx0MYWEwCMgEJgFBA0AwhvtXMECD5jAR9gAqYhIgjEPDWfZDUlGUgUvAAL0wuIpJOCwGM9CMBAQpH%2B%2Bz5tIZTIYfWIxlQVGMCThBGiaJ5935GKxADZJEzCcTSeT%2BK4oTCmAoANYKGm8/mYwSq9VEylkvHBQj6o0ms18i0EK14/Ea23kiKEWHO%2BV3RUpQVe/Gw%2BEIYxDEiYWbgh4AdgAQq79vxiPsIGHGfajgARDRHVP7PDvYUlaJiiVSggy4ho0vlzbpjapxNrNMZxXuz2ZxZ6wOGjSHLyFwe66EjhRrACsqYr88LLd75pVarxOuHBsNXHHk53M73c8XeEOHf2XAXq68ZfXbs3TOPTsNG0PU93zoXS8vZY2W81wVRUN0tLcvxPI0vE/V9Z1/C8232LwgPvJNHyVcCmQdAg3zHItywYR14PPVD0xA0DMI9CCcLfA8CNoki/yQm8V1LDD%2BxoojcJHD8GO4t8z2Yq9ALY%2B8OOfe0BJHGD%2BOI08EP/ZCyP%2BCSsLxAAlfCJ32EtxIo0DOKZDT6J0vTyJDSiqIHDS%2BLM9iDL7ST9g02T7P0gFLMooy8TEIkmHQABPYxj0/cyMKzHM8yZbQi3M/ZtHHRxmSpNkOWALkW1ijs2y7HtHKsrSkveAibw8qzQJM4rjlKhyvIqly7McEqdLKiyGv5Vzqua1q6owxUiu7HrJ0YvdwoKyiqqGuSeL3Nr%2Bs6vjk2Gwj5KNQDyo6lzZOWma3xQzaOvdHEpKwVRP1tIxf1isT2q2waNmVAj/QIJjgjOoDFxLW6Fr5KbHuegMFPPKFMFUT7U1Yu87o62zLyenSXre0HwbYxdROh37tvhwHXuBpcUYhlCfvuVSJtAvAqBzB7lUgvDDl25aXPox66ZHLg8phrb9ijBFYyIIkEIhm7Vxyid5vJxVu1XSX%2BUp6m%2BNZuC9yW5qma6zZaeV6DOax/leZjONBdItHUxFtsiwl%2BqKuljDbYM%2B2U0LDh5loTh514TwOC0UhUE4ZKFEWZZMEvLweFIAhNBd%2BZEQC/oIHmQ0QHneI3Y4SRPaj33OF4BQQHiSPvZd0g4FgJA0BYBI6GichKArqv6BiZBgCkLg%2BDoRs84gCIs/9ZhiECzhw77%2BpAoAeQibRMGsIfeArthBDHhhaEHovSCwCJXGARwxFoPPuF4LBYSMcQ1/wIlrDwQEkSzsHp9cGVZ/IQRKiz2g8AiYhR%2BcLAs5ZFgn7X2IBEZImBCyYGPsAd%2BRgo7zCoAYYACgABqeBMAAHcx4JEYE/fgggRBiHYFIGQghFAqHUGvXQbcDAwNMMYcw78Ih50gPMVACRqj7wALRjw/Bw04RZkAAE4BG8FQEA0kWAmEJzaNPao9goSjE8G3QIwRejVhiG3XIqQBAKL0Jo6oUwRR6EsDIjowxGguGaEYyoJiBCdAaAYtRRizE6LbhMexKjpj9A5gsJYKwJCu3dpnNefsOD7FUD4ZUHDVT7GAMgZA15JDnAPBAXAhASChw5rwQuWhZgx0wHHGIUik4p30JwDOpAvY%2BxCbnfOEdYEBI4BsIJVSc51KLrk0gQCUh2EkEAA%3D%3D%3D

@rhpvorderman
Copy link
Owner Author

It works and the code is faster, but not nearly as fast as hand vectorized code. Since the algorithm is not too hard, it is much better to hand-vectorize in this code. It can be easily ported to ARM instructions when the need arises.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant