You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As the title described, I observe that creating index for float16 type vector takes longer time. Here is a short code snippet to reproduce the problem.
importnumpyasnpimportlancedbimporttimeimportpyarrowaspa# modify the below parameters accordinglyuri="./example-lancedb"record_num=1000# feel free to increase the dataset size to see the performance diff part_num=10sub_vec_num=64vector_dim=512# No need for modification below# prepare the tables before building indices.db=lancedb.connect(uri)
schema_32=pa.schema([pa.field("vector", pa.list_(pa.float32(), vector_dim))])
tbl_32=db.create_table("my_table_32", schema=schema_32, mode="overwrite") # set mode to overwrite for demo purpose.tbl_32.add([{"vector": np.random.uniform(-1, 1, size=vector_dim)} for_inrange(record_num)])
print("size of float32 table is now: "+str(tbl_32.count_rows()))
schema_16=pa.schema([pa.field("vector", pa.list_(pa.float16(), vector_dim))])
tbl_16=db.create_table("my_table_16", schema=schema_16, mode="overwrite")
tbl_16.add([{"vector": np.random.uniform(-1, 1, size=vector_dim).astype(np.float16)} for_inrange(record_num)])
print("size of float16 table is now: "+str(tbl_16.count_rows()))
# create index for float32 tablestart_time=time.perf_counter()
tbl_32.create_index(metric="cosine", num_partitions=part_num, num_sub_vectors=sub_vec_num, vector_column_name="vector")
end_time=time.perf_counter()
print ("float32 create index time used: %ss"% ((end_time-start_time)))
# create index for float16 tablestart_time=time.perf_counter()
tbl_16.create_index(metric="cosine", num_partitions=part_num, num_sub_vectors=sub_vec_num, vector_column_name="vector")
end_time=time.perf_counter()
print ("float16 create index time used: %ss"% ((end_time-start_time)))
And the result is:
size of float32 table is now: 1000
size of float16 table is now: 1000
float32 create index time used: 1.0481734249997317s
float16 create index time used: 4.043548755000302s
When I grow the size of dataset to 1M, the time difference is 10min v.s. 2 hours. Is this expected behaviour? Or am I doing anything wrong?
Are there known steps to reproduce?
No response
The text was updated successfully, but these errors were encountered:
Not sure if this is the reason, but we only compile optimized fp16 kernels for some platforms, and on all other ones it is expected to be slow. When you installed lancedb, it should have installed pylance too. Do you know what the wheel name was? You can run pip install -U lancedb again and you should see somewhere in the logs a string like pylance-0.10.12-cp38-abi3-macosx_11_0_arm64.whl.
Yes. I am developing app on an ARM development board. The wheel used to install python package is pylance-0.10.18-cp39-abi3-manylinux_2_24_aarch64.whl.
But the same thing happens when I run this script on my desktop machine with AMD64 CPU where pylance is installed by pylance-0.10.12-cp38-abi3-manylinux_2_28_x86_64.whl. So is it really related to platform?
LanceDB version
v0.6.13
What happened?
As the title described, I observe that creating index for float16 type vector takes longer time. Here is a short code snippet to reproduce the problem.
And the result is:
When I grow the size of dataset to 1M, the time difference is 10min v.s. 2 hours. Is this expected behaviour? Or am I doing anything wrong?
Are there known steps to reproduce?
No response
The text was updated successfully, but these errors were encountered: