bug(python): Creating index for float16 vectors takes significantly longer time than float32 vector #1312

weitianhan · 2024-05-17T07:10:20Z

LanceDB version

v0.6.13

What happened?

As the title described, I observe that creating index for float16 type vector takes longer time. Here is a short code snippet to reproduce the problem.

import numpy as np
import lancedb 
import time
import pyarrow as pa

# modify the below parameters accordingly
uri = "./example-lancedb"
record_num = 1000 # feel free to increase the dataset size to see the performance diff 
part_num = 10
sub_vec_num = 64
vector_dim = 512

# No need for modification below
# prepare the tables before building indices.
db = lancedb.connect(uri)
schema_32 = pa.schema([pa.field("vector", pa.list_(pa.float32(), vector_dim))])
tbl_32 = db.create_table("my_table_32", schema=schema_32, mode="overwrite") # set mode to overwrite for demo purpose.
tbl_32.add([{"vector": np.random.uniform(-1, 1, size=vector_dim)} for _ in range(record_num)])
print("size of float32 table is now: " + str(tbl_32.count_rows()))

schema_16 = pa.schema([pa.field("vector", pa.list_(pa.float16(), vector_dim))])
tbl_16 = db.create_table("my_table_16", schema=schema_16, mode="overwrite")
tbl_16.add([{"vector": np.random.uniform(-1, 1, size=vector_dim).astype(np.float16)} for _ in range(record_num)])
print("size of float16 table is now: " + str(tbl_16.count_rows()))

# create index for float32 table
start_time = time.perf_counter()
tbl_32.create_index(metric="cosine", num_partitions=part_num, num_sub_vectors=sub_vec_num, vector_column_name="vector")
end_time = time.perf_counter()
print ("float32 create index time used: %ss" % ((end_time - start_time)))

# create index for float16 table
start_time = time.perf_counter()
tbl_16.create_index(metric="cosine", num_partitions=part_num, num_sub_vectors=sub_vec_num, vector_column_name="vector")
end_time = time.perf_counter()
print ("float16 create index time used: %ss" % ((end_time - start_time)))

And the result is:

size of float32 table is now: 1000
size of float16 table is now: 1000
float32 create index time used: 1.0481734249997317s
float16 create index time used: 4.043548755000302s

When I grow the size of dataset to 1M, the time difference is 10min v.s. 2 hours. Is this expected behaviour? Or am I doing anything wrong?

Are there known steps to reproduce?

No response

The text was updated successfully, but these errors were encountered:

wjones127 · 2024-05-17T17:56:42Z

Not sure if this is the reason, but we only compile optimized fp16 kernels for some platforms, and on all other ones it is expected to be slow. When you installed lancedb, it should have installed pylance too. Do you know what the wheel name was? You can run pip install -U lancedb again and you should see somewhere in the logs a string like pylance-0.10.12-cp38-abi3-macosx_11_0_arm64.whl.

weitianhan · 2024-05-18T00:38:27Z

Yes. I am developing app on an ARM development board. The wheel used to install python package is pylance-0.10.18-cp39-abi3-manylinux_2_24_aarch64.whl.

But the same thing happens when I run this script on my desktop machine with AMD64 CPU where pylance is installed by pylance-0.10.12-cp38-abi3-manylinux_2_28_x86_64.whl. So is it really related to platform?

weitianhan added the bug Something isn't working label May 17, 2024

QianZhu assigned wjones127 May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug(python): Creating index for float16 vectors takes significantly longer time than float32 vector #1312

bug(python): Creating index for float16 vectors takes significantly longer time than float32 vector #1312

weitianhan commented May 17, 2024 •

edited

wjones127 commented May 17, 2024

weitianhan commented May 18, 2024

bug(python): Creating index for float16 vectors takes significantly longer time than float32 vector #1312

bug(python): Creating index for float16 vectors takes significantly longer time than float32 vector #1312

Comments

weitianhan commented May 17, 2024 • edited

LanceDB version

What happened?

Are there known steps to reproduce?

wjones127 commented May 17, 2024

weitianhan commented May 18, 2024

weitianhan commented May 17, 2024 •

edited