From feeb0fabd5c1c8de86178707fbcb2682caba16d1 Mon Sep 17 00:00:00 2001 From: Samuel M Smith Date: Tue, 4 Jun 2024 07:18:03 -0600 Subject: [PATCH] edited MUST for the sad path appendix and updated codes to version 2.0 --- spec/spec.md | 34 ++++++++++++++++++---------------- 1 file changed, 18 insertions(+), 16 deletions(-) diff --git a/spec/spec.md b/spec/spec.md index de11d0e..c5a2fcf 100644 --- a/spec/spec.md +++ b/spec/spec.md @@ -392,13 +392,13 @@ There are many coding schemes that could satisfy the Composability constraint of Stable type coding makes it much easier to recognize Primitives of a given type when debugging source, reading Messages, or documents in the ‘T’ domain that include encoded Primitives. This is true even when those Primitives have different lengths or values. For Primitive types with fixed lengths, i.e., all Primitives of that type have the same length, Stable type coding aids visual type and visual size recognition. Stable type coding means that the leading characters that determine the type do not change when any other portion of the primitive changes. -The usability of Stable type coding is maximized when the type portion appears first in the Framing Code. Stability also requires that for a given type, the type coding portion must consume a fixed integral number of characters in the ‘T’ domain. To clarify, as used here, Stable type coding in the ‘T’ domain never shares information bits with either length or value coding in any given Framing Code character and appears first in the Framing Code. Stable type coding in the ‘T’ domain translates to Stable type coding in the ‘B’ domain, except that the type coding portion of the Framing Code MAY or MAY not respect byte boundaries. This is an acceptable tradeoff because binary-domain parsing tools easily accommodate bit fields and bit shifts while text-domain parsing tools do not. Generally, Text domain parsing tools only process whole characters. This is another reason to impose a stability constraint on the ‘T’ domain type coding instead of the ‘B’ domain. +The usability of Stable type coding is maximized when the type portion appears first in the Framing Code. Stability also requires that for a given type, the type coding portion MUST consume a fixed integral number of characters in the ‘T’ domain. To clarify, as used here, Stable type coding in the ‘T’ domain never shares information bits with either length or value coding in any given Framing Code character and appears first in the Framing Code. Stable type coding in the ‘T’ domain translates to Stable type coding in the ‘B’ domain, except that the type coding portion of the Framing Code MAY or MAY not respect byte boundaries. This is an acceptable tradeoff because binary-domain parsing tools easily accommodate bit fields and bit shifts while text-domain parsing tools do not. Generally, Text domain parsing tools only process whole characters. This is another reason to impose a stability constraint on the ‘T’ domain type coding instead of the ‘B’ domain. Therefore, the type portion MUST begin the Framing Code, and the type coding portion MUST consume a fixed integral number of characters in the 'T' domain. ##### Stable value encoding -A secondary usability constraint is recognizable or readable Stable value coding in the Text, ‘T’, domain. Stable value encoding means that the trailing Base64 characters that encode the primitive value are right aligned. This means one can manually confirm values are the same. Not all Primitives benefit from Stable value coding. Any representation of a value that is a long random string of characters is essentially unreadable or recognizable versus some other representation. Consequently, bit shifts of the value that result in leading or trailing zero pads, as long as they are static, do not change the readability. This is not true, however, of values that are small numbers. Base64 encodings of small numbers are readable. for example, the numerical sequence of decimal numbers, `0, 1, 2`, is recognizable as the sequence of Base64 characters, `A, B, C`. Thus, all else equal, readable Stable value encodings also contribute to usability, at least in some cases. The combination of Stable leading type encoding and Stable trailing value encoding means that any zero padding must appear in the middle of the Primitive, after the type code, but before the value. +A secondary usability constraint is recognizable or readable Stable value coding in the Text, ‘T’, domain. Stable value encoding means that the trailing Base64 characters that encode the primitive value are right aligned. This means one can manually confirm values are the same. Not all Primitives benefit from Stable value coding. Any representation of a value that is a long random string of characters is essentially unreadable or recognizable versus some other representation. Consequently, bit shifts of the value that result in leading or trailing zero pads, as long as they are static, do not change the readability. This is not true, however, of values that are small numbers. Base64 encodings of small numbers are readable. for example, the numerical sequence of decimal numbers, `0, 1, 2`, is recognizable as the sequence of Base64 characters, `A, B, C`. Thus, all else equal, readable Stable value encodings also contribute to usability, at least in some cases. The combination of Stable leading type encoding and Stable trailing value encoding means that any zero padding MUST appear in the middle of the Primitive, after the type code, but before the value. Therefore the value portion of any primitive MUST be right aligned. @@ -406,7 +406,7 @@ Therefore the value portion of any primitive MUST be right aligned. Two ways exist to provide the required alignment on 24-bit boundaries to satisfy the Composability property defined above. One is to post-pad (with trailing pad characters, `=`) the Text domain encoding to ensure that the ‘T’ domain Primitive has a total size (length) of an integer multiple of 4. This is what naive Base64 encoding does. The other way is to pre-pad leading bytes of zeros to the raw binary value portion before conversion to Base64 to ensure the total size of the raw binary value with pre-pad bytes is an integer multiple of 3 bytes. This ensures that the size in characters of the Base64 conversion of the pre-padded raw binary is an integer multiple of 4 characters. This means that, effectively, value padding shows up as a mid-pad relative to the full Primitive with a prepended type code. -Given pre-padded values, one of two options is available that depends on the specific code. In the first option, an appropriate number of text characters that result from the conversion of a porting of the leading pre-pad zero bytes are replaced with the appropriate number of code characters. In the second option, the code characters are pre-pended to the conversion with leading zeros intact. In the second option, the length of the pre-pended type code must also, thereby, be an integer multiple of 4 characters. In either option, the total length of the ‘T’ domain Primitive with code is an integer multiple of 4 characters. +Given pre-padded values, one of two options is available that depends on the specific code. In the first option, an appropriate number of text characters that result from the conversion of a porting of the leading pre-pad zero bytes are replaced with the appropriate number of code characters. In the second option, the code characters are pre-pended to the conversion with leading zeros intact. In the second option, the length of the pre-pended type code is also an integer multiple of 4 characters. In either option, the total length of the ‘T’ domain Primitive with code is an integer multiple of 4 characters. The first option may be more compact in some cases than the second. The second option may be easier to compute in some cases. The most significant advantage of the second option is that the value portion is Stable and more readable both in the Text, ‘T’, domain and in the Binary, ‘B’, domain because the value portion is not shifted by the Base64 conversion as it is with the first option. @@ -452,7 +452,7 @@ The number of required trailing Base64 post-pad characters or, equivalently the Recall that Composability is provided here by prepending text codes that are of the appropriate length to ensure 24-bit boundaries in both the ‘T’ and the corresponding ‘B’ domain. The advantage of this approach is that naive Base64 software tooling may be used to convert back and forth between the ‘T’ and ‘B’ domains, i.e., `T(B)` is naive Base64 encode, and `B(T)` is naive Base64 decode. In other words, CESR Primitives are compatible with existing Base64 (RFC-4648) tooling. Whereas new software tooling is needed for conversions between the ‘R’ and ‘T’ domains, e.g., `T(R)` and `R(T)` and the ‘R’ and ‘B’ domains, e.g., `B(R)` and `R(B)`. -The pad size computation is also useful for computing the size of the text codes. Because true Composability also requires that the ‘T’ domain value must be an integer multiple of 4 characters in length, the size of the text code also must be a function of the pad size, `ps`, and hence the length of the raw binary element, `N`. Thus, the size of the text code in Base64 characters is a function of the equivalent pad size determined by the length `N mod 3` of the raw binary value. +The pad size computation is also useful for computing the size of the text codes. Because true Composability also requires that the ‘T’ domain value MUST be an integer multiple of 4 characters in length, the size of the text code also must be a function of the pad size, `ps`, and hence the length of the raw binary element, `N`. Thus, the size of the text code in Base64 characters MUST be a function of the equivalent pad size determined by the length `N mod 3` of the raw binary value. #### Example of pad size computation @@ -616,7 +616,7 @@ Each Stream MUST start (restart) with one of eight cases: 7. A MGPK encoded mapping. 8. Annotated Text domain CESR. -A parser merely needs to examine the first Tritet (3 bits) of the first byte of the Stream start to determine which one of the eight it is. When the first Tritet is a Count Code, then the remainder of the Count Code itself will include the additional information needed to parse the attached group. When the first Tritet indicates, its JSON, CBOR, or MGPK, the mapping's first field must be a Version String that provides the additional information needed to parse the associated encoded field map serialization fully. See the Version String annex for the detailed syntax of the information that may be extracted with a regular expression search, given that the first Tritet indicates the following bytes in the stream belong to a JSON, CBOR, or MGPK field map serialization. +A parser merely needs to examine the first Tritet (3 bits) of the first byte of the Stream start to determine which one of the eight it is. When the first Tritet is a Count Code, then the remainder of the Count Code itself will include the additional information needed to parse the attached group. When the first Tritet indicates, its JSON, CBOR, or MGPK, the mapping's first field MUST be a Version String that provides the additional information needed to parse the associated encoded field map serialization fully. See the Version String annex for the detailed syntax of the information that may be extracted with a regular expression search, given that the first Tritet indicates the following bytes in the stream belong to a JSON, CBOR, or MGPK field map serialization. The Stream MUST resume with a frame starting byte that begins with one of the 8 Tritets, either another Count Code expressed in the ‘T’ or ‘B’ domain or a new JSON, CBOR, or MGPK encoded mapping or a new annotated encoding. @@ -784,7 +784,7 @@ Codes in the large Count Code table MUST be each 8 characters long. The first tw ### Protocol genus and version table -The protocol genus/version table is special because its codes modify the Count Code groups that appear at the top level of the stream or the Count code groups that appear inside special enclosing Count Code groups. A protocol genus and version code itself MUST not provide a count of the following Quadlets or triplets but MUST modify the protocol genus and Version of all the following Count Codes that either appear at the top level until another protocol and genus Count Code is provided or are inside a special enclosing Count Code group. There are three general-purpose special enclosing Count Codes that allow an embedded genus/version table code. These are defined below. These are special because they must be universal across all genera of Count Code tables. +The protocol genus/version table is special because its codes modify the Count Code groups that appear at the top level of the stream or the Count code groups that appear inside special enclosing Count Code groups. A protocol genus and version code itself MUST not provide a count of the following Quadlets or triplets but MUST modify the protocol genus and Version of all the following Count Codes that either appear at the top level until another protocol and genus Count Code is provided or are inside a special enclosing Count Code group. There are three general-purpose special enclosing Count Codes that allow an embedded genus/version table code. These are defined below. These are special because they MUST be universal across all genera of Count Code tables. The purpose of the protocol genus/version table is twofold. First, it allows CESR to be used for different protocols and protocol stacks, where each protocol may have its own dedicated set of code tables. The only table that all protocols MUST share (i.e. has identical values) is the protocol genus and version table (protocol table for short). All other entries in all other tables MAY vary by protocol. Secondly, for a given protocol genus, a protocol genus and version code MUST provide the Version of that given protocol's table set. This allows versioning of the CESR code tables for a given protocol. @@ -1651,7 +1651,7 @@ The additions to the Master Code Table of CESR is shown below: | 9AAA#### | String Base64 Only with 2 Lead Bytes | 4 | 4 | 8 | #### SAD Path Signature Attachments -CESR defines several Count Codes for attaching signatures to serialized CESR event Messages. For KERI event Messages, the signatures in the attachments apply to the entire serialized content of the KERI event Message. As all KERI event Messages are SADs, the same rules for signing a KERI event Message applies to signing SADs for SAD Path Signatures. A brief review of CESR signatures for transferable and non-transferable identifiers follows. In addition, signatures on nested content must be specified. +CESR defines several Count Codes for attaching signatures to serialized CESR event Messages. For KERI event Messages, the signatures in the attachments apply to the entire serialized content of the KERI event Message. As all KERI event Messages are SADs, the same rules for signing a KERI event Message applies to signing SADs for SAD Path Signatures. A brief review of CESR signatures for transferable and non-transferable identifiers follows. In addition, signatures on nested content will be specified. ##### Signing SAD Content @@ -1663,23 +1663,25 @@ Signatures on SAD content require signing the serialized encoding format of the } ``` -where KERI is the identifier of KERI events followed by the hexadecimal major and minor version code and then the serialized encoding format of the event, JSON in this case. KERI and ACDC support JSON, MessagePack and CBOR currently. Field ordering is important when apply cryptographic signatures and all serialized encoding formats must support static field ordering. Serializing a SAD starts with reading the Version String from the SAD field (`v` for KERI and ACDC events Message) to determine the serialized encoding format of the Message. The serialized encoding format is used to generate the SAID at creation and cannot be changed. The event map is serialized using a library that ensures the static field order preserved across serialization and deserialization and the private keys are used to generate the qualified cryptographic material that represents the signatures over the SAD content. +where KERI is the identifier of KERI events followed by the hexadecimal major and minor version code and then the serialized encoding format of the event, JSON in this case. KERI and ACDC support JSON, MessagePack and CBOR currently. Field ordering is important when apply cryptographic signatures and all serialized encoding formats MUST support static field ordering. Serializing a SAD starts with reading the Version String from the SAD field (`v` for KERI and ACDC events Message) to determine the serialized encoding format of the Message. The serialized encoding format is used to generate the SAID at creation and cannot be changed. The event map is serialized using a library that ensures the static field order preserved across serialization and deserialization and the private keys are used to generate the qualified cryptographic material that represents the signatures over the SAD content. -The same serialized encoding format must be used when nesting a SAD in another SAD. For example, an ACDC credential that was issued using JSON can be embedded and presented only in a KERI `exn` presentation event Message that uses JSON as its serialized encoding format. That same credential cannot be transmitted using CBOR or MessagePack. Controllers can rely on this restriction when verifying signatures of embedded SADs. When processing the signature attachments and resolving the data at a given SAD path, the serialization of the outer most SAD can be used at any depth of the traversal. New Version String processing does not need to occur at nested paths. However, if credential signature verification is pipelined and processed in parallel to the event Message such that the event Message is not available, the Version String of the nested SAD will still be valid and can be used if needed. +The same serialized encoding format MUST be used when nesting a SAD in another SAD. For example, an ACDC credential that was issued using JSON can be embedded and presented only in a KERI `exn` presentation event Message that uses JSON as its serialized encoding format. That same credential MUST NOT be transmitted using CBOR or MessagePack. Controllers can rely on this REQUIREMENT when verifying signatures of embedded SADs. When processing the signature attachments and resolving the data at a given SAD path, the serialization of the outer most SAD can be used at any depth of the traversal. New Version String processing does not need to occur at nested paths. However, if credential signature verification is pipelined and processed in parallel to the event Message such that the event Message is not available, the Version String of the nested SAD will still be valid and can be used if needed. -Each attached signature is accompanied by a SAD Path that indicates the content that is signed. The path must resolve within the enveloping SAD to either a nested SAD (map) or a SAID (string) of an externally provided SAD. This of course, includes a root path that resolves to the enveloping SAD itself. +Each attached signature is accompanied by a SAD Path that indicates the content that is signed. The path MUST resolve within the enveloping SAD to either a nested SAD (map) or a SAID (string) of an externally provided SAD. This of course, includes a root path that resolves to the enveloping SAD itself. ##### Signatures with Non-Transferable Identifiers -Non-transferable identifiers only ever have one public key. In addition, the identifier prefix is identical to the qualified cryptographic material of the public key and therefore no Key Event Log ( KEL) is required to validate the signature of a non-transferable identifier [[1]]. The attachment code for witness receipt couplets, used for SAD Path Signatures, takes this into account. The four-character Count Code `-C##` is used for non-transferable identifiers and contains the signing identifier prefix and the signature. Since the verification key can be extracted from the identifier prefix and the identifier cannot be rotated, all that is required to validate the signature is the identifier prefix, the data signed and the signature. + +Non-transferable identifiers only ever have one public key. In addition, the identifier prefix is identical to the qualified cryptographic material of the public key and therefore a Key Event Log ( KEL) is NOT REQUIRED (i.e. is OPTIONAL) to validate the signature of a non-transferable identifier [[1]]. The attachment code for witness receipt couplets, used for SAD Path Signatures, takes this into account. The four-character Count Code `-C##` is used for non-transferable identifiers and contains the signing identifier prefix and the signature. Since the verification key can be extracted from the identifier prefix and the identifier cannot be rotated, all that is required to validate the signature is the identifier prefix, the data signed and the signature. ##### Signatures with Transferable Identifiers -Transferable identifiers require full KEL resolution and verification to determine the correct public key used to sign some content [[1]]. In addition, the attachment code used for transferable identifiers, `-F##` must specify the location in the KEL at which point the signature was generated. To accomplish this, this Count Code includes the identifier prefix, the sequence number of the event in the KEL, the digest of the event in the KEL and the indexed signatures (transferable identifiers support multiple public/private keys and require index signatures). Using all the values, the signature(s) can be verified by retrieving the KEL of the identifier prefix and determine the key state at the sequence number along with validating the digest of the event against the actual event. Then using the key(s) at the determined key state, validate the signature(s). + +Transferable identifiers REQUIRE full KEL resolution and verification to determine the correct public key used to sign some content [[1]]. In addition, the attachment code(s) used for transferable identifiers, `-O` or `-0O` MUST specify the location in the KEL at which point the signature was generated. To accomplish this, this Count Code includes the identifier prefix, the sequence number of the event in the KEL, the digest of the event in the KEL and the indexed signatures (transferable identifiers support multiple public/private keys and require index signatures). Using all the values, the signature(s) can be verified by retrieving the KEL of the identifier prefix and determining the key state at the sequence number along with validating the digest of the event against the actual event. Then using the key(s) at the determined key state, validate the signature(s). #### Additional Count Codes -This specification adds two Counter Four Character Codes to the CESR Master Code Table for attaching and grouping transposable signatures on SAD and nested SAD content. The first code (`-J##`) is reserved for attaching a SAD path and the associated signatures on the content at the resolution of the SAD Path (either a SAD or its associated SAID). The second reserved code (`-K##`) is for grouping all SAD Path signature groups under a root path for a given SAD. The root path in the second grouping code provides signature attachment transposability for embedding SAD content in other Messages. +This specification adds two Counter Four Character Codes to the CESR Master Code Table for attaching and grouping transposable signatures on SAD and nested SAD content. The first code `-T` or `-0T` is reserved for attaching a SAD path and the associated signatures on the content at the resolution of the SAD Path (either a SAD or its associated SAID). The second reserved code `-U` or `-0U` is for grouping all SAD Path signature groups under a root path for a given SAD. The root path in the second grouping code provides signature attachment transposability for embedding SAD content in other Messages. ##### SAD Path Signature Group -The SAD Path Signature Group provides a four-character Count Code, `-J##`, for attaching an encoded Variable Length SAD Path along with either a transferable index signature group or non-transferable identifier receipt couplets. The SAD Path identifies the content that this attachment is signing. The path must resolve to either a nested SAD (map) or a SAID (string) of an externally provided SAD within the context of the SAD and root path against which this attachment is applied. Using the following ACDC SAD embedded in a KERI `exn` Message: +The SAD Path Signature Group provides a four-character Count Code, `-T` or `-0T`, for attaching an encoded Variable Length SAD Path along with either a transferable index signature group or non-transferable identifier receipt couplets. The SAD Path identifies the content that this attachment is signing. The path must resolve to either a nested SAD (map) or a SAID (string) of an externally provided SAD within the context of the SAD and root path against which this attachment is applied. Using the following ACDC SAD embedded in a KERI `exn` Message: ```json { @@ -1759,7 +1761,7 @@ BmMfUwIOywRkyc5GyQXfgDA4UOAMvjvnXcaK9G939ArM | 0BT7b5... aBg | sig | ##### SAD Path Groups -The SAD Path Group provides a four-character Count Code, `-K##`, for attaching encoded Variable Length root SAD Path along with 1 or more SAD Path Signature Groups. The root SAD Path identifies the root context against which the paths in all included SAD Path Signature Groups are resolved. When parsing a SAD Path Group, if the root path is the single `-` character, all SAD paths are treated as absolute paths. Otherwise, the root path is prepended to the SAD paths in each of the SAD Path Signature Groups. Given the following snippet of a SAD Path Group: +The SAD Path Group provides a four-character Count Code, `-U`or `-0U`, for attaching encoded Variable Length root SAD Path along with 1 or more SAD Path Signature Groups. The root SAD Path identifies the root context against which the paths in all included SAD Path Signature Groups are resolved. When parsing a SAD Path Group, if the root path is the single `-` character, all SAD paths are treated as absolute paths. Otherwise, the root path is prepended to the SAD paths in each of the SAD Path Signature Groups. Given the following snippet of a SAD Path Group: ``` -KAB6AABAAA--JAB5AABAA-a... @@ -1826,7 +1828,7 @@ The same signature gets transposed to the outer `exn` SAD by updating the root p Now the SAD Path of the first signed SAD content resolves to the `a` field of the `a` field of the streamed `exn` Message #### Small Variable Raw Size SAD Path Code -The small variable raw side code reserved for SAD Path encoding is `A` which results in the addition of 3 entries (`4A##`, `5A##` and `6A##`) in the Master Code Table for each lead byte configuration. These codes and their use are discussed in detail in [CESR Encoding for SAD Path Language](). +The small variable raw side code reserved for SAD Path encoding is a variable length string with code `A` which uses the 3 entries (`4A##`, `5A##` and `6A##`) in the Master Code Table for each lead byte configuration. These codes and their use are discussed in detail in [CESR Encoding for SAD Path Language](). ::: issue fix this citation