Consider ditching Montgomery form for Small Fields

```go
func (z *Element) mulNaive(a, b *Element) *Element {
	z[0] = uint32((uint64(a[0]) * uint64(b[0])) % q)
	return z
}

func BenchmarkMulNaive(b *testing.B) {
	var a, z Element
	a.MustSetRandom()
	z.MustSetRandom()
	b.ResetTimer()

	for range b.N {
		z.mulNaive(&a, &z)
	}
}

func BenchmarkMul(b *testing.B) {
	var a, z Element
	a.MustSetRandom()
	z.MustSetRandom()
	b.ResetTimer()

	for range b.N {
		z.Mul(&a, &z)
	}
}
```

On an M1 (ARM64) MacBook Pro I get the following:

```
BenchmarkMulNaive-8                                     203830368                5.818 ns/op
BenchmarkMul-8                                          195042295                6.128 ns/op
```

A 5% speedup in favor of the "naive" option, which apart from performance concerns is attractive due to its simplicity.

A major caveat here is that while AVX512 has shift instructions ("division by R" in Montgomery reduction), it doesn't seem to support generic integer division, which will have to be hacked around using floating point division. (i.e. divide as floats, truncate the quotient, multiply ad subtract). So it is possible that this change would speed up scalar ops but hurt vector ops. Barrett reduction on the other hand, which works on the naive representation, can be vectorized well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Consider ditching Montgomery form for Small Fields #718

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Consider ditching Montgomery form for Small Fields #718

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions