Skip to content

Support for parquet format and/or dataframe access #193

Open
@effigies

Description

@effigies

Related to bids-standard/bids-specification#1792, which is on my mind again because of https://hbcd-docs.readthedocs.io/data_access/dataformats/tabulated/:

🔗 Note: Parquet Not Currently Supported by BIDS ▸

Please note that Parquet files are not officially supported by the BIDS specification. For NBDC datasets, we decided to add Parquet as an alternative file format to the BIDS standard TSV to allow users to take advantage of the features of this modern and efficient open source format that is commonly used in the data science community.

A large project like HBCD adopting parquet in addition to BIDS seems like an indication that this is a recognized hole in BIDS, and so I think #1792 is likely to move forward. The validator could be the biggest sticking point, so I want to get out ahead of it.

Potentially relevant projects:

  • hyparquet: A pure-javascript library that may have good browser support and <10KiB increase in the payload.
  • parquet-wasm: wasm bindings to the Rust parquet and arrow implementations.
  • apache-arrow: A library for working with parquet's underlying memory model (arrow). May also be useful for loading TSVs to the same data structures, ensuring unified treatment if we do add parquet suppor.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions