Skip to content

Consider ditching Montgomery form for Small Fields #718

@Tabaie

Description

@Tabaie
func (z *Element) mulNaive(a, b *Element) *Element {
	z[0] = uint32((uint64(a[0]) * uint64(b[0])) % q)
	return z
}

func BenchmarkMulNaive(b *testing.B) {
	var a, z Element
	a.MustSetRandom()
	z.MustSetRandom()
	b.ResetTimer()

	for range b.N {
		z.mulNaive(&a, &z)
	}
}

func BenchmarkMul(b *testing.B) {
	var a, z Element
	a.MustSetRandom()
	z.MustSetRandom()
	b.ResetTimer()

	for range b.N {
		z.Mul(&a, &z)
	}
}

On an M1 (ARM64) MacBook Pro I get the following:

BenchmarkMulNaive-8                                     203830368                5.818 ns/op
BenchmarkMul-8                                          195042295                6.128 ns/op

A 5% speedup in favor of the "naive" option, which apart from performance concerns is attractive due to its simplicity.

A major caveat here is that while AVX512 has shift instructions ("division by R" in Montgomery reduction), it doesn't seem to support generic integer division, which will have to be hacked around using floating point division. (i.e. divide as floats, truncate the quotient, multiply ad subtract). So it is possible that this change would speed up scalar ops but hurt vector ops. Barrett reduction on the other hand, which works on the naive representation, can be vectorized well.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions