Overhaul of metadata to move away from pandas #415

sjvenditto · 2025-02-06T22:18:21Z

Pandas DataFrames, while versatile, add a lot of overhead to object initialization with metadata, even when metadata is an empty DataFrame. Since slicing an existing object can often return a new object, this overhead is compounded each time an object is sliced.

In this PR, I've replaced the datatype of the private _metadata to a custom dictionary, where it previously was a pandas DataFrame. This custom dictionary includes minimal methods used by _metadata's DataFrame counterpart -- e.g. .loc, .iloc, .columns, .index -- but is proving to be more lightweight. Rudimentary benchmarking suggests that, with the dictionary metadata, slicing IntervalSet and TsdFrame objects is 4-8X faster for objects with metadata and 2-3X faster for objects without metadata (i.e. empty metadata), when compared to objects with DataFrame metadata. (This speed up is not seen for TsGroup objects, which has a much slower initialization that the other objects, where metadata initialization is not the primary source of overhead)

On the user side of things, metadata will behave exactly the same as it did previously, where obj.metadata still returns a DataFrame.

…d.Index

sjvenditto added 21 commits December 2, 2024 14:59

add groupby and get_group, saving groups as a dictionary

cb578c2

groupby and groupby_apply functions with preliminary tests

c794f6d

Merge branch 'dev' into metadata

7404ad4

update docstrings

87a1505

grouping examples for intervalset, also let object index be of type p…

5bac58d

…d.Index

some tsdframe grouping examples

070b771

tsgroup examples

8ba5abf

merge redundant tests, use dict comprehension

81b079d

metadata as dictionary

f977502

changes for metadata dictionary

b9714be

Merge branch 'dev' into faster_metadata

8d4fd5c

updating metadata dictionary

88ae508

fix tsgroup, update groupby to use dictionary

1248fd1

updating tests

c89c6a5

fixing tests

b87a013

fix tests

01aa1f6

fix tests

943798e

prevent rate from being dropped for TsGroups

c5eb1b4

run isort

2809164

adding some docstrings and fixing imports

143d542

isort

bd634a7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overhaul of metadata to move away from pandas #415

Overhaul of metadata to move away from pandas #415

sjvenditto commented Feb 6, 2025

Overhaul of metadata to move away from pandas #415

Are you sure you want to change the base?

Overhaul of metadata to move away from pandas #415

Conversation

sjvenditto commented Feb 6, 2025