Skip to content

Commit

Permalink
Merge pull request #51 from daidoji/revised-format
Browse files Browse the repository at this point in the history
Emph why MGPK has multiple codes in perfresync sec.
  • Loading branch information
m00sey authored Feb 14, 2024
2 parents c5d02e8 + 834450d commit 8c98a83
Showing 1 changed file with 9 additions and 6 deletions.
15 changes: 9 additions & 6 deletions spec/spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -552,7 +552,10 @@ A CESR Stream parser supports three specific interleaved serializations, namely,

Furthermore, it may also be highly beneficial to support in-stride switching between interleaved CESR text-domain Streams and CESR Binary domain Streams. In other words, the start bits for Count Codes in both the ‘T’ domain and the ‘B’ domain should be unique. This would provide the analogous equivalent of a UTF Byte Order Mark (BOM) [[ref: BOM]]. Recall that a BOM enables a parser of UTF-encoded documents to determine if the UTF codes are big-endian or little-endian [[ref: BOM]]. In the CESR case, an analogous feature would enable a Stream parser to know if a Count Code, along with its associated counted group of Primitives, is expressed in the ‘T’ or ‘B’ domain. Together these impose the constraint that the boundary start bits for interleaved text CESR, binary CESR, JSON, CBOR, and MGPK be mutually distinct.

Amongst the codes for map objects in the JSON, CBOR, and MGPK, only the first three bits are fixed and not dependent on mapping size. In JSON, a serialized mapping object always starts with `{`. This is encoded as `0x7b`. the first three bits are `0b011`. In CBOR, the first three bits of the major type of its serialized mapping object are `0b101`. In MGPK, there are three different mapping object codes. The FixMap code starts with `0b100`. Both the Map16 and Map32 codes start with `0b110`.
Among the codes for map objects in JSON, CBOR, and MGPK, only the first three bits are fixed and not dependent on mapping size.
* In JSON, a serialized mapping object always starts with `{`. This is encoded as `0x7b`. the first three bits are `0b011`.
* In CBOR, the first three bits of the major type of its serialized mapping object are `0b101`.
* In MGPK, there are three different mapping object codes. The FixMap code starts with `0b100`. Both the Map16 and Map32 codes start with `0b110`.

Therefore, the JSON, CBOR, and MGPK encodings consume four starting Tritets (3 bits) that are in numeric order `0b011`, `0b100`, `0b101`, and `0b110`. This leaves four unused Tritets, namely, `0b000`, `0b001`, `0b010`, and `0b111`. These latter are potential candidates for the CESR Count Code start bits. In Base64, there are two codes that satisfy the constraints. The first is the dash character, `-`, encoded as `0x2d`. Its first three bits are `0b001`. The second is the underscore character, `_`, encoded as `0x5f`. Its first three bits are `0b010`. Both of these are distinct from the starting Tritets of any of the JSON, CBOR, and MGPK encodings above. Moreover, the starting Tritet of the corresponding binary encodings of `-` and `_` is `0b111`, which is also distinct from all the others. To elaborate, Base64 uses `-` in position 62 or `0x3E` (hex) and uses `_` in position 63 or `0x3F` (hex), both of which have starting Tritet of `0b111`

Expand All @@ -564,14 +567,14 @@ This is summarized in the following table:

| Starting Tritet | Serialization | Character |
|:------------:|:------------:|:------------:|
|0b000|Unused||
|0b000|Unused| |
|0b001|CESR ‘T’ domain Count Code|`-`|
|0b010|CESR ‘T’ domain Op Code|`_`|
|0b011|JSON|`{`|
|0b100|MGPK||
|0b101|CBOR||
|0b110|MGPK||
|0b111|CESR ‘B’ domain Count Code or Op Code||
|0b100|MGPK (FixMap)| |
|0b101|CBOR| |
|0b110|MGPK (Map16, Map32)| |
|0b111|CESR ‘B’ domain Count Code or Op Code| |

#### Stream parsing rules

Expand Down

0 comments on commit 8c98a83

Please sign in to comment.