Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(off-topic?) inter-language 'drag race' competition #600

Open
pricerc opened this issue Jul 5, 2021 · 20 comments
Open

(off-topic?) inter-language 'drag race' competition #600

pricerc opened this issue Jul 5, 2021 · 20 comments

Comments

@pricerc
Copy link

pricerc commented Jul 5, 2021

Former Microsoft engineer turned YouTuber Dave Plummer (https://www.youtube.com/channel/UCNzszbnvQeFzObW0ghk0Ckw) has instigated a test of language performance in one of his videos (https://youtu.be/tQtFdsEcK_s for the latest installment)

I'll leave it to the interested to investigate his channel further, he's got some interesting stuff for long-time users (victims ?) of Microsoft.

The 'competition' involves calculating primes as quickly as possible, and there are now 45 languages 'competing'.

It's a community effort over at: https://github.com/PlummersSoftwareLLC/Primes

There is a VB implementation which is 'ok', but I was able to get about a 15% performance improvement with minimal effort.

And I'm sure there must be ways of improving it further.

@hartmair
Copy link

hartmair commented Jul 5, 2021

First idea:

Although I liked VB.NET; what is the point in writing a VB.NET version if it is a proper subset of C# nowadays; especially performance stuff like Span is missing, see #322.

Btw., I didn't find your changes at https://github.com/PlummersSoftwareLLC/Primes/tree/drag-race/PrimeBASIC/solution_2

@pricerc
Copy link
Author

pricerc commented Jul 5, 2021

Btw., I didn't find your changes at https://github.com/PlummersSoftwareLLC/Primes/tree/drag-race/PrimeBASIC/solution_2

I haven't loaded them (yet, not sure if I will have time to get all their requirements together soon enough).

The boolean array is one of the things I did. Although in the context of the 'competition' there's a question mark over that - if you were to try and calculate really large numbers, then memory may be more of a problem with an array of boolean.

Also switching the logic for the 'prime' flag, since ReDim clears them already.

@RevensofT
Copy link

RevensofT commented Jul 9, 2021

I'm optimize it with unfaithful and got result almost twice from current code, any suggestion before I pull merge ?
https://github.com/RevensofT/Primes/tree/drag-race/PrimeBASIC/solution_2

Edit: I add faithful version and testing, however both version result difference only 1% - 5% so I remove unfaithful version, only faithful left which also got result almost twice from rbergen's code.

@pricerc
Copy link
Author

pricerc commented Jul 9, 2021

any suggestion before I pull merge

From a structure point-of-view, I'd make all the method names match the original 'C++' version, and in the same order. I think it would be easier for people trying to compare the implementations.

I think this

     Dim Start_time = DateTime.UtcNow
...
    If (DateTime.UtcNow - Start_time).TotalSeconds > 5.0 Then
	    Call (DateTime.UtcNow - Start_time).TotalSeconds.
				    report(Sieve.count, Pass_count)
	    Return Sieve
    End If

is part of the 'faithful'. But you can save a few clock cycles by pre-calculating the 'expiry' time.

something like

        Dim Start_time = Date.UtcNow
        Dim Expiry_time = startTime.AddSeconds(5)
...
            If Date.UtcNow > Expiry_time Then
		Call (DateTime.UtcNow - Start_time).TotalSeconds.
					report(Sieve.count, Pass_count)
		Return Sieve
            End If

It's the weekend, so I might find time to take a look at what you've done.

I haven't really tried to do much optimizing yet. I just converted the original 'C' code without referring to the existing PrimeVB code, and on my computer the existing VB code gets ~1865 runs in 5 seconds, and I get ~2485. The biggest differences being 1) I 'flipped' the non-prime array, since ReDim inititializes to false, and 2) using a combination of a while loop and for loop (like the original C++ code does) instead of nested for loops.

@pricerc
Copy link
Author

pricerc commented Jul 10, 2021

One thing I was experimenting with, but breaks the 'faithful' algorithm, is instead of calling SQRT to find the square root at the beginning of a loop, was to calculate the square of the factors within the loop. The idea being that the square root calculation is an expensive operation at the silicon level, where calculating an integer square is trivial.

This certainly worked well for calculating primes up to 10,000,000.

@RevensofT
Copy link

@pricerc Your suggestion aren't work, loop won't stop, I hope to see your revision and if you have some time, could you test my code on your machine ? I would know how difference it could improvement on difference PC. :)

@pricerc
Copy link
Author

pricerc commented Jul 10, 2021

Just been doing some review of my experiments, using 'release' builds instead of 'debug' builds (a bit of a facepalm when I clicked that I was still 'debugging'). I have 5 minor variations that I'm experimenting with - two very close to Dave's original C++ version, with BitArrays, one with the true/false logic flipped. Then two using Boolean(), one using Math.Sqrt, the other calculating squares of the factors. And the 5th using BitArray and squaring the factors.

On my machine, a Ryzen 7 4800H running at ~4GHz, with 64GB of ram.

At 1,000,000, an array of Boolean outperforms a BitArray. And my calculating the square outperforms calculating the square root.

At 10,000,000, the BitArray outperforms a Boolean(), although Math.Sqrt is still outperformed by calculating the squares.

At 100,000,000, there is no longer a 'reference' result, but the Math.Sqrt seems to outperform calculating the squares, but only by a small margin.

@RevensofT
Copy link

Did you try on remove integer overflow check option ? it's in compile option, in most case it boot code performance.

@hartmair
Copy link

One thing I was experimenting with, but breaks the 'faithful' algorithm, is instead of calling SQRT to find the square root at the beginning of a loop, was to calculate the square of the factors within the loop. The idea being that the square root calculation is an expensive operation at the silicon level, where calculating an integer square is trivial.

I am not so sure whether the one call to Math.Sqrt per iteration is a performance issue. However, Math.Sqrt only works for Doubles, so there are a lot of (implicit) conversions from Double to Integer in the loop. I would suggest:

'see https://stackoverflow.com/questions/23672069/fast-integer-square-root
Function Sqrt(value As Integer) As Integer
  Return CInt(Math.Floor(Math.Sqrt(value)))
End Function

Btw., I personally prefer Option Strict On, which warns about such implicit conversions.

@RevensofT
Copy link

You maybe not believe it but change result sqrt to Integer make performance down by 1% instead on my code and performance of Dim Sieve_sqrt As Integer = CInt(Math.Floor(Math.Sqrt(value))) and Dim Sieve_sqrt As Integer = Math.Sqrt(Size) is identical, at lease I'm not notice it.

I think change type of value from result of function which can't do inline has some negate impact on inline method optimize instead.

@hartmair
Copy link

hartmair commented Jul 11, 2021

You maybe not believe it but change result sqrt to Integer make performance down by 1% instead

I didn't expect this^^ Maybe because CInt involves rounding, which may result in Sieve_sqrt one too big..

I tried for myself and came to the same conclusion: Comparing int with int or int with double just makes no real difference at all

@hartmair
Copy link

I could get around 18% improvement on my machine by doing the following:

  • Inline helper methods check_prim, find_prim
  • Replace divisions and multiplications times two by shifting left or right
  • Syntactic rewrite for my own pleasure

The complete method:

Private Sub process_prim(Size As Integer)
  Dim Sieve_sqrt = Math.Sqrt(Size)
  Dim Factor = 3
  Do While Factor <= Sieve_sqrt
    If primes(Factor >> 1) Then
      Dim Factor2 = Factor << 1
      Dim Multiple = Factor2 + Factor
      Do While Multiple <= Size
        primes(Multiple >> 1) = False
        Multiple += Factor2
      Loop
    End If
    Factor += 2
  Loop
End Sub

@RevensofT
Copy link

RevensofT commented Jul 12, 2021

Great upgrade algorithm, you merge loop from check_prim and find_prim together which improve performance a lot.

On my test, your algorithm got 10% faster then base algorithm and my rewrite version of your algorithm got 15% faster then base algorithm.

Test result 3 times in row.

C:\>dotnet run -c "release free" --project "repos\Prime"
rber_gen_vb;2001;5.0012098;1;algorithm=base,faithful=yes,bits=1
hartmair_gen_vb;2225;5.0004361;1;algorithm=hartmair,faithful=yes,bits=1
hartmair_rewrite_gen_vb;2406;5.0005775;1;algorithm=hartmair,faithful=yes,bits=1
rbergen_vb;2099;5.0014436;1;algorithm=base,faithful=no,bits=1
C:\>dotnet run -c "release free" --project "repos\Prime"
rber_gen_vb;2052;5.0017633;1;algorithm=base,faithful=yes,bits=1
hartmair_gen_vb;2243;5.0003168;1;algorithm=hartmair,faithful=yes,bits=1
hartmair_rewrite_gen_vb;2411;5.0019972;1;algorithm=hartmair,faithful=yes,bits=1
rbergen_vb;2102;5.0022839;1;algorithm=base,faithful=no,bits=1
C:\>dotnet run -c "release free" --project "repos\Prime"
rber_gen_vb;2042;5.0016609;1;algorithm=base,faithful=yes,bits=1
hartmair_gen_vb;2249;5.0015053;1;algorithm=hartmair,faithful=yes,bits=1
hartmair_rewrite_gen_vb;2403;5.0011379;1;algorithm=hartmair,faithful=yes,bits=1
rbergen_vb;2067;5.001148;1;algorithm=base,faithful=no,bits=1

Rewrite version code.

	Private Sub find_prim(Size As Integer,
						  Factor As Integer,
						  Factor2 As Integer)
		Dim Multiple = Factor2 + Factor
		Do
			If Multiple > Size Then Return
			primes(Multiple >> 1) = False
			Multiple = Multiple + Factor2
		Loop
	End Sub

	Private Sub process_prim(Size As Integer, Factor As Integer)
		Dim Sieve_sqrt = Math.Sqrt(Size)
		Do
			If Factor > Sieve_sqrt Then Return
			If primes(Factor >> 1) Then find_prim(Size, Factor, Factor << 1)
			Factor = Factor + 2
		Loop
	End Sub

@paul1956
Copy link

Your numbers depend on CPU microarchitecture, FPU units run in parallel with integer units and in integer only programs they are idle. They also depend on how efficient the runtime is at pipelining and are they optimized for the CPU you are running, conditional jumps are BAD on. You want speedup run the code on all the cores or call into hand tuned parallel math libraries. Also latter versions of DotNet have more tuning so just changing the .Net version could have a large impact,

@hartmair
Copy link

I could get another +25% improvement, though I am not convinced it is worth it:

  • Swap True and False for primes
  • Inlining BitArray from dotnet runtime
Imports System.Numerics
Imports System.Runtime.InteropServices

' see https://github.com/dotnet/runtime/blob/main/src/libraries/System.Collections/src/System/Collections/BitArray.cs
Public Class BitArray

    Private ReadOnly m_array As Integer()
    Private ReadOnly m_length As Integer

    Public Sub New(length As Integer)
        m_length = length
        m_array = New Integer(length >> 5) {}
    End Sub

    Default Public Property Item(index As Integer) As Boolean
        Get
            Return (m_array(index >> 5) And (1 << index)) <> 0
        End Get
        Set(value As Boolean)
            Dim bitMask = 1 << index
            If value Then
                SetMask(m_array(index >> 5), bitMask)
            Else
                ClearMask(m_array(index >> 5), bitMask)
            End If
        End Set
    End Property
    Private Shared Sub SetMask(ByRef values As Integer, mask As Integer)
        values = values Or mask
    End Sub
    Private Shared Sub ClearMask(ByRef values As Integer, mask As Integer)
        values = values And Not mask
    End Sub

    Public ReadOnly Property Length As Integer
        Get
            Return m_length
        End Get
    End Property

    Public ReadOnly Property Count As Integer
        Get
            Return m_array.Sum(Function(x) BitOperations.PopCount(ToUnsigned(x)))
        End Get
    End Property
    <StructLayout(LayoutKind.Explicit)>
    Private Structure ReinterpretCast32
        <FieldOffset(0)> Public SignedInteger As Integer
        <FieldOffset(0)> Public UnsignedInteger As UInteger
    End Structure
    Private Shared Function ToUnsigned(value As Integer) As UInteger
        Return New ReinterpretCast32() With {.SignedInteger = value}.UnsignedInteger
    End Function

End Class

@pricerc
Copy link
Author

pricerc commented Jul 13, 2021

  • Swap True and False for primes
    Makes a big difference if using an array of Boolean.
  • Inlining BitArray from dotnet runtime
    I think this is pretty much one of the techniques authors have used in some other languages.

To be fair, if you're comparing languages, it's quite reasonable, because then you're relying on the VB compiler for the whole algorithm, instead of relying on some other languages compilation.

I also adapted an integer square root algorithm I found, which was a bit faster than Math.Sqrt.

        Function SquareRoot(value As ULong) As ULong
            Dim remainder As ULong = 0, root As ULong = 0

            For i As Integer = (64 \ 2) To 1 Step -1
                root = root << 1
                remainder = (remainder << 2) Or (value >> (64 - 2))

                value = value << 2
                If (root < remainder) Then
                    remainder -= (root Or 1UL)
                    root += 2UL
                End If
            Next i
            Return root >> 1
        End Function

But I think it needs more testing to be sure.

@RevensofT
Copy link

@hartmair I test your new BitArray but always got incorrect result.

Test result:

C:\>dotnet run -c "release free" --project "repos\Prime"
rber_gen_vb;2038;5.0007981;1;algorithm=base,faithful=yes,bits=1
hartmair_gen_vb;2236;5.0008227;1;algorithm=base,faithful=yes,bits=1
hartmair_rewrite_gen_vb;2390;5.0002147;1;algorithm=base,faithful=yes,bits=1
WARNING: result is incorrect!
hartmair_rewrite_BA2_gen_vb;2598;5.0020134;1;algorithm=base,faithful=yes,bits=1
rbergen_vb;2058;5.0003475;1;algorithm=base,faithful=no,bits=1

I add default value option back and it should not cause incorrect result.

Public Class BitArray2

	Private ReadOnly container As Integer()
	Private Const bit32 = 5

	Public Sub New(Capital As Integer, Optional Default_value As Boolean = False)
		container = New Integer((Capital >> bit32) - 1) {}
		If Default_value Then
			Array.Fill(container, -1)

			Dim Extra = Capital And (32 - 1)
			If Extra > 0 Then
				container(container.Length - 1) = (1 << Extra) - 1
			End If
		End If
	End Sub

@Nukepayload2
Copy link

I've converted some code from one of the C# implementations. It uses ArrayPool to reduce garbage collection time.

https://github.com/Nukepayload2/Primes/tree/drag-race/PrimeBASIC/solution_3

@hartmair
Copy link

I've converted some code from one of the C# implementations. It uses ArrayPool to reduce garbage collection time.

I am not sure whether an ArrayPool is considered faithful regarding the rules:

See https://github.com/PlummersSoftwareLLC/Primes/blob/drag-race/CONTRIBUTING.md#faithfulness

It uses a class to encapsulate the sieve, or an equivalent feature in your language. This class must contain the full state of the sieve. Each iteration should re-create a new instance of this class.

@Nukepayload2
Copy link

I am not sure whether an ArrayPool is considered faithful regarding the rules:

@hartmair
This is a good question.
According to the rules of "faithfulness", ArrayPool(Of T).Shared is unfaithful. Because it brings recycled arrays of previous iterations to the current iteration. I'll mark the converted project as "unfaithful" and add a new "faithful" project based on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants