homogeneous arrays #4

michaelficarra · 2016-12-17T05:35:14Z

I propose a new type for homogenous arrays which allows omitting the type tags of each of its elements. Most of the arrays I encode are homogenous and have more than one element, so this will save one byte per additional element.

LewisJEllis · 2016-12-20T01:27:51Z

By homogenous do you mean "exact same superpack type tag" or "same type family but possibly varying size classes"? The latter feels like it would have a very complex specification/implementation and I don't think it would help much since we'd still have to spend at least a byte per element specifying size, so I'll comment on the former.

If we look only for exact same superpack type tag, I think the cases where this would help are so narrow as to almost be contrived. The three most reasonable cases where I can see this helping significantly are:

an array of all ascii-only strings with some of length > 32 (so we encode all as cstring, save no bytes on strings of length < 32 and one byte per string of length > 32)
an array of all float32s/double64s (save 1 byte per element, but note that if some elements fit in float32 and some need double64, we probably won't benefit)
an array of uints all less than 65536 but with fewer elements below 64 than exceeding 16383 (so we encode all as uint16 and save a byte per n > 16383 but lose a byte per n < 64)

How common are any of these? The latter two are so specific that I'm a bit dubious as to the frequency of circumstances in which this might be worth the additional spec and implementation complexity.

At first glance I might be convinced by the string case, but note that it likely won't play well with the repeated string optimization. We only store strings in the lookup table if doing so will actually save space, so if we have an array containing a mix of one-off and repeated strings, we're out of luck. I think the only way we could still realize any benefit in this case would be to change the string lookup table to be of unbounded size, put every string (even one-offs) in the table, and change the strref type tag to be strref uint instead of strref one-byte-index. Then we could do homogenous array optimization over every element being a strref, but this would also effectively add 1-2 bytes to the total payload for each one-off string (the original motivation for limiting the string lookup table to only the 256 most-often-repeated strings), to say nothing of the compatibility concerns of major changes to the repeated string optimization.

I'd be interested to hear about the specific cases you have in mind where this would help significantly, or if you mean something other than what I've interpreted and commented on.

michaelficarra mentioned this issue May 20, 2017

add built-in optimisation for homogeneous arrays shapesecurity/superpack-js#49

Closed

michaelficarra changed the title ~~homogenous arrays~~ homogeneous arrays May 20, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

homogeneous arrays #4

homogeneous arrays #4

michaelficarra commented Dec 17, 2016

LewisJEllis commented Dec 20, 2016 •

edited

homogeneous arrays #4

homogeneous arrays #4

Comments

michaelficarra commented Dec 17, 2016

LewisJEllis commented Dec 20, 2016 • edited

LewisJEllis commented Dec 20, 2016 •

edited