Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add a new compare func CompareValueONeil for bsi, which faster than CompareValue #461

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ChicChip
Copy link

According to Java's compare function , I code a new function CompareValueONeil for go version ,and it only support number which is >= 0 , it is faster than CompareValue. When bsi include 200 million columnIds, using same range query,CompareValueONeil is cost about 1s when CompareValue cost about 12s(https://github.com/RoaringBitmap/RoaringBitmap/blob/cca90c986d5c0096bbeabb5f968833bf12c28c0e/bsi/src/main/java/org/roaringbitmap/bsi/longlong/Roaring64BitmapSliceIndex.java#L456)

…oNeilCompare), only for number which is >= 0
@ChicChip
Copy link
Author

ChicChip commented Nov 26, 2024

Test code read my file "total.bin"
file, _ := os.OpenFile(path+"/total.bin", os.O_RDWR, 0755)
defer file.Close()
total := roaring64.NewDefaultBSI()
total.ReadFrom(file)
t := time.Now().UnixNano()
res := total.CompareValue(0, roaring64.RANGE, 25, 30, nil)
fmt.Printf("totalCount[%v], count[%v], CompareValue cost %v [ms]\n", total.GetCardinality(), res.GetCardinality(), (time.Now().UnixNano()-t)/1000000)
t1 := time.Now().UnixNano()
res = total.CompareValueONeil(roaring64.RANGE, 25, 30, nil)
fmt.Printf("totalCount[%v], count[%v], CompareValueONeil cost %v [ms]\n", total.GetCardinality(), res.GetCardinality(), (time.Now().UnixNano()-t1)/1000000)

result
totalCount[207701480], count[234], CompareValue cost 14051 [ms]
totalCount[207701480], count[234], CompareValueONeil cost 882 [ms]

@guymolinari
Copy link
Contributor

I will run some tests in my environment. In the meantime, could we come up with a better more descriptive name for this function?

@guymolinari
Copy link
Contributor

Also (and better yet), when constructing the BSI we could potentially set min/max value providing some context about the nature of the data being stored in the BSI. Then we could choose to apply the optimization without creating an entirely new API.

@ChicChip
Copy link
Author

I will run some tests in my environment. In the meantime, could we come up with a better more descriptive name for this function?

yes, more descriptive name is better

@ChicChip
Copy link
Author

Also (and better yet), when constructing the BSI we could potentially set min/max value providing some context about the nature of the data being stored in the BSI. Then we could choose to apply the optimization without creating an entirely new API.

Yes, util now, BSI's min/max doesn't be refreshed when setValues or deserialization. If min/max exist in BSI, new API can cancel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants