Skip to content

interregna/JArrow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

J language addon for Apache Arrow

Read (and eventually write) Apache Arrow and Parquet files to and from J. Uses C API.

Installation and Loading

  1. Ensure that you have installed the Arrow GLib (C) Packages for your OS. Instructions can be found at: arrow.apache.org/install.

  2. From your J session:

   install 'github:interregna/JArrow@main'
   load 'data/arrow'

Usage

   install 'github:interregna/JArrow@main'

   load 'data/arrow'
   readParquetTable '~addons/data/arrow/test/test1.parquet'
┌─┬───────────────┐
│a│0 1 2 3 4 5 6 7│
├─┼───────────────┤
│b│8 7 6 5 4 3 2 1│
└─┴───────────────┘
   readsParquetTable '~addons/data/arrow/test/test2.parquet'
┌────────┬──────────┬────────┬─────────┬───────┬────────┬───────┬───────┬────────┬────────┬────────┬──────────┬──────────┬───────────┬────────────┬─────────┬─────────┬───────┬───────────────┐
│Column 1│Column Two│shortCol│ushortCol│intcCol│uintcCol│int_Col│uintCol│int16Col│int32Col│int64Col│float32Col│float64Col│longlongCol│ulonglongCol│DoubleCol│StringCol│boolCol│datetime64Col  │
├────────┼──────────┼────────┼─────────┼───────┼────────┼───────┼───────┼────────┼────────┼────────┼──────────┼──────────┼───────────┼────────────┼─────────┼─────────┼───────┼───────────────┤
│0100000100100100300500100600700100100100    │This     │1946684800000000│
│188.7511188908826344388531.25613.75888888.75    │ is      │0946771200000000│
│277.522277807722738777462.5527.5777777.5    │all      │0946857600000000│
│366.2533366706619133166393.75441.25666666.25    │ valid   │0946944000000000│
│45544455605515527555325355555555    │text     │1947030400000000│
│543.7555543504311821843256.25268.75434343.75    │         │0947116800000000│
│632.56663240328216232187.5182.5323232.5    │data.    │0947203200000000│
│721.257772130214610621118.7596.25212121.25    │         │0947289600000000│
└────────┴──────────┴────────┴─────────┴───────┴────────┴───────┴───────┴────────┴────────┴────────┴──────────┴──────────┴───────────┴────────────┴─────────┴─────────┴───────┴───────────────┘
   readCSVTable '~addons/data/arrow/test/test1.csv'
┌──┬───────────────────────────...
│ID│1 2 3 4 5 8 10 11 12 14 15 ...
├──┼───────────────────────────...
│y │100.669 100.669 100.669 100...
└──┴───────────────────────────...
  NB. Note this is json-line format, not json-format. See: https://jsonlines.org
  readsJsonTable'~Jaddons/data/arrow/test/test1.json'
┌───────┬──────────┐
│name   │date      │
├───────┼──────────┤
│Gilbert│12-13-2014│
│Alexa  │09-04-1983│
│May    │01-01-1924│
│Deloise│04-25-1894│
└───────┴──────────┘
   readsFeatherTable '~addons/data/arrow/test/test1.feather'
┌────┬───┬──────┐
│team│pos│points│
├────┼───┼──────┤
│A   │G  │17    │
│A   │F  │17    │
│B   │G  │15    │
│B   │F  │ 5    │
│C   │G  │11    │
│C   │F  │10    │
│D   │G  │ 5    │
│D   │F  │14    │
└────┴───┴──────┘

(6!:16) and (6!:17) can be used to convert Arrow datetime64 types to and from ISO 8601 format (e.g. 2000-01-11T22:58:04). fromdate32 can be used to convert Arrow date32 types to YYYY M D tuples.

Notes

readsTable minimizes display time in the UI but uses more space

readTable minimizes space but can take more time to display

Development

  1. In Jqt, set your path for JPackageDev File > Configure > Folders JPackageDev /code/JPackageDev (or the path of your choice in, then modify build.ijs)

  2. Clone the JArrow repo in JPackageDev

  3. Restart Jqt and open the Arrow project Project > Open > JPackageDev > arrow

  4. Re-build the addon. Ctrl + F9

  5. Run the addon. F9 (Re-build addon scripts, reload and run tests)

Examples: see test/test1.ijs

TODO
  • Error catching for empty pointers, missing files, and general errors.
  • Dereference / cleanup gobjects and allocated memory
  • Additional data types
    • Dictionaries (need to store lookup tables)
    • Lists
    • Maps
  • Tensors
  • Documentation (see: ~/addons/gui/cobrowser/scriptdoc.ijs)
  • CSV reader
  • JSONL reader
  • Arrow Feather (IPC v1) reader
  • IPC files (".arrow" files)
  • IPC streams (".arrows" files when stored on disk)
  • Flight client
  • Flight server
  • Non-local filesystems (S3)
  • IPC streaming with event-driven calls

About

J add-on for Apache Arrow, Parquet, CSV, & JSON

Topics

Resources

License

Stars

Watchers

Forks

Contributors 4

  •  
  •  
  •  
  •  

Languages