Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement shotgun4Intersect #239

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Implement shotgun4Intersect #239

wants to merge 5 commits into from

Conversation

lemire
Copy link
Member

@lemire lemire commented Sep 25, 2019

This is a second attempt at fixing #215

It is based on a previous PR by @alldroll. The changes are as follows:

  • The benchmark, though still unrealistic, is a tiny bit more realistic. Important: which benchmarking data processing code with branches, you don't want to run the same benchmark over the same data in sequence because the processor might get the ability to predict the branches with an unrealistic ability that is not achievable in the real world.

  • It looks like the Go compiler is not smart enough to use conditional moves, as it should. So it does many, many branches when compiling the shotgun4 approach. We want to avoid that. A solution (which is kind of a hack) is to replace the branches ("if") by some arithmetic. The arithmetic is more expensive (many more instructions) than a simple conditional move, but saving on mispredicted branches and useless speculative loads is important.

Results should vary but here is what I get after making these changes...

$ go version
go version go1.13 darwin/amd64
lemire@iMac-2018 ~/CVS/github/roaring $ go test  -bench BenchmarkIntersectAlgorithms -run -
goos: darwin
goarch: amd64
pkg: github.com/RoaringBitmap/roaring
BenchmarkIntersectAlgorithms/onesidedgallopingintersect2by2-8         	  123374	      9254 ns/op
BenchmarkIntersectAlgorithms/shotgun4-8                               	  203089	      5521 ns/op
PASS
ok  	github.com/RoaringBitmap/roaring	2.434s

So shotgun4 is "almost" twice as fast. Not bad!

Of course, more work is needed...

  • We should use even richer, more realistic testing.
  • We need better testing. We pass some sanity tests, but I cannot be certain that the code is correct.
  • I cannot vouch that the code is optimal. At a minimum, we need to explore variations such as shotgun6, shotgun8, shotgun2. Note that the results will depend on the hardware, better/more recent hardware should benefit from wider algorithms.
  • We need to actually integrate the new approach in the roaring code and do realistic benchmarks there.

I propose that the work be done in the shotgun4 branch directly in the main repository.

@coveralls
Copy link

coveralls commented Sep 25, 2019

Coverage Status

Coverage increased (+0.2%) to 82.404% when pulling f476012 on shotgun4 into 4676818 on master.

Copy link
Member

@alldroll alldroll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lemire Amazing trick with branchlessComparator and if idx4+1 < nL { // common case optimization!

@lemire
Copy link
Member Author

lemire commented Sep 26, 2019

Thanks! Let us finish this!

@lemire
Copy link
Member Author

lemire commented Sep 30, 2019

I am leaving this PR intentionally open.

@puzpuzpuz
Copy link

Tried to follow-up on this branch here: puzpuzpuz@3ac5959

The changes are the following:

  • Improve tests for shotgun4Intersect to get 100% LoC coverage
  • Simplify shotgun4Intersect
  • Add real data benchmark for And

The real data benchmark I've added looks like the following:

func BenchmarkRealDataAnd(b *testing.B) {
	benchmarkRealDataAggregate(b, func(bitmaps []*Bitmap) uint64 {
		b := bitmaps[0]
		for i := 1; i < len(bitmaps); i++ {
			b = And(b, bitmaps[i])
		}
		return b.GetCardinality()
	})
}

I'm not sure if it's good enough though. Looking forward to hear your thoughts. The results on my machine are the following.

Environment: Ubuntu 20.04 x86-64, Go 1.16.5, i5-8300h CPU

onesidedgallopingintersect2by2:

$ BENCH_REAL_DATA=1 go test -bench BenchmarkRealDataAnd -run -
goos: linux
goarch: amd64
pkg: github.com/RoaringBitmap/roaring
cpu: Intel(R) Core(TM) i5-8300H CPU @ 2.30GHz
BenchmarkRealDataAnd/census-income_srt-8         	  112808	     10660 ns/op
BenchmarkRealDataAnd/census-income-8             	  116253	     10362 ns/op
BenchmarkRealDataAnd/census1881_srt-8            	  135080	      8795 ns/op
BenchmarkRealDataAnd/census1881-8                	  130599	      9168 ns/op
BenchmarkRealDataAnd/dimension_003-8             	    1376	    885094 ns/op
BenchmarkRealDataAnd/dimension_008-8             	    4058	    308777 ns/op
BenchmarkRealDataAnd/dimension_033-8             	   25941	     46678 ns/op
BenchmarkRealDataAnd/uscensus2000-8              	  121574	      9786 ns/op
BenchmarkRealDataAnd/weather_sept_85_srt-8       	   97990	     12082 ns/op
BenchmarkRealDataAnd/weather_sept_85-8           	   43671	     28879 ns/op
BenchmarkRealDataAnd/wikileaks-noquotes_srt-8    	  128178	     14734 ns/op
BenchmarkRealDataAnd/wikileaks-noquotes-8        	  133525	      9009 ns/op
PASS
ok  	github.com/RoaringBitmap/roaring	39.732s

shotgun4Intersect:

$ BENCH_REAL_DATA=1 go test -bench BenchmarkRealDataAnd -run -
goos: linux
goarch: amd64
pkg: github.com/RoaringBitmap/roaring
cpu: Intel(R) Core(TM) i5-8300H CPU @ 2.30GHz
BenchmarkRealDataAnd/census-income_srt-8         	  111625	     10605 ns/op
BenchmarkRealDataAnd/census-income-8             	  117244	     10248 ns/op
BenchmarkRealDataAnd/census1881_srt-8            	  133972	      8770 ns/op
BenchmarkRealDataAnd/census1881-8                	  131566	      9120 ns/op
BenchmarkRealDataAnd/dimension_003-8             	    1359	    871526 ns/op
BenchmarkRealDataAnd/dimension_008-8             	    4050	    292773 ns/op
BenchmarkRealDataAnd/dimension_033-8             	   26175	     44918 ns/op
BenchmarkRealDataAnd/uscensus2000-8              	  127453	      9230 ns/op
BenchmarkRealDataAnd/weather_sept_85_srt-8       	   97687	     12014 ns/op
BenchmarkRealDataAnd/weather_sept_85-8           	   44283	     26967 ns/op
BenchmarkRealDataAnd/wikileaks-noquotes_srt-8    	  135856	      8815 ns/op
BenchmarkRealDataAnd/wikileaks-noquotes-8        	  134486	      8959 ns/op
PASS
ok  	github.com/RoaringBitmap/roaring	38.144s

No significant difference except for the wikileaks-noquotes_srt data set.

@lemire
Copy link
Member Author

lemire commented Jul 10, 2021

Thanks. This is useful and seems to go toward my own bias: there may simply be not enough gain to make it worth it.

@puzpuzpuz
Copy link

Yes, so far I'm also under impression that shotgun intersection doesn't make the weather on operations with real data sets. I'm going to stop at this point and won't make further analysis. Going to check other GH issues and find the ones I could help with.

@lemire
Copy link
Member Author

lemire commented Jul 10, 2021

@puzpuzpuz That was helpful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants