Merge KDL v2 #286

zkat · 2022-08-28T20:09:23Z

Here it is! The long-awaited KDL v2, which is where we go ahead and make a handful of technically-breaking changes to address some corner cases we've run into over the past year while KDL has been getting implemented in a bunch of languages by various people.

I'd love to get feedback on what we have slated, and whether there's anything else we should definitely include when this goes out.

Honestly, they're just too implementation-specific

As I read the grammar in the spec, `"//"` wouldn't parse as a single-line-comment as it requires as least one non-newline character after the slashes.

zkat · 2022-08-28T20:26:42Z

/cc @CAD97

CAD97 · 2022-08-28T20:58:02Z

I have a slight preference for #241 over #204 personally, though only slight.

IceDragon200 · 2022-08-28T21:48:19Z

tests/test_cases/input/no_solidus_escape.kdl

@@ -0,0 +1 @@
+node "\\"


Shouldn't there also be an output case for this, and one for forwardslash (node "/") then?

Seeing as the escaped form was removed

EDIT: mixing up my slashes again

hkolbeck · 2022-08-29T15:34:19Z

I have a preference for #204, because the primary use case I can see for # in bare identifiers is hashtag-like which would be illegal under either, and it seems better to go with the simpler rule.

That preference is not terribly strong, though.

Edit: I misread, I'm fine with either

CAD97 · 2022-08-29T19:33:32Z

the primary use case I can see for # in bare identifiers is hashtag-like

To clarify, #241 allows #ident as a bare ident, and both will of course still allow "r#ident" as a quoted ident.

Argument for allowing: transliterating CSS selectors, for e.g. CSS-in-KDL. Argument against allowing: using the syntax in KQL as a selector like CSS.

lilyball · 2022-08-31T06:44:22Z

Argument for allowing: transliterating CSS selectors, for e.g. CSS-in-KDL. Argument against allowing: using the syntax in KQL as a selector like CSS.

#foo in CSS is special-casing the id attribute. KQL doesn't have an equivalent to HTML's id, and using #foo syntax in KQL to mean something else might be confusing given its meaning in CSS, so I don't find the argument against compelling.

My inclination is to prefer #241 as well, as I think being able to write hashtags is neat. It also allows for doing things like writing Nix flake references as bare words, e.g. nixpkgs#hello.

Lucretiel · 2022-08-31T17:05:15Z

Can we squeeze #213 into this? The specific proposal is the addition of escaped whitespace in string literals– that \, followed by literal (non-escaped) whitespace, should consume and discard all that whitespace. This is a slight simplification of the Rust rule, which specifically requires that \ be followed by \n.

hkolbeck · 2022-08-31T17:14:58Z

I'm also a fan of #213, though it seems like there's some ambiguity in the discussion. Namely, does

- "x\
    y\
    z"

Translate to "xyz", "x y z", or "x\ny\nz"?

zkat · 2022-08-31T17:23:38Z

Can we squeeze #213 into this? The specific proposal is the addition of escaped whitespace in string literals– that \, followed by literal (non-escaped) whitespace, should consume and discard all that whitespace. This is a slight simplification of the Rust rule, which specifically requires that \ be followed by \n.

@Lucretiel do you have time to put together a PR with this grammar+prose change? I'm game.

Lucretiel · 2022-08-31T20:13:52Z

Can we squeeze #213 into this? The specific proposal is the addition of escaped whitespace in string literals– that \, followed by literal (non-escaped) whitespace, should consume and discard all that whitespace. This is a slight simplification of the Rust rule, which specifically requires that \ be followed by \n.

@Lucretiel do you have time to put together a PR with this grammar+prose change? I'm game.

Yes, tonight I can put that together :) should it be in the form of an amendment to SPEC.md?

Lucretiel · 2022-08-31T20:16:40Z

I'm also a fan of #213, though it seems like there's some ambiguity in the discussion. Namely, does
- "x\
    y\
    z"
Translate to "xyz", "x y z", or "x\ny\nz"?

I agree there's some ambiguity in the original. That example would translate to "xyz", because all literal whitespace after the \ is consumed and discarded. If you want to retain whitespace, it should either come before the \ or itself be escaped. I think my comment (#213 (comment)) succinctly describes this.

zkat · 2022-08-31T21:41:58Z

Can we squeeze #213 into this? The specific proposal is the addition of escaped whitespace in string literals– that \, followed by literal (non-escaped) whitespace, should consume and discard all that whitespace. This is a slight simplification of the Rust rule, which specifically requires that \ be followed by \n.

@Lucretiel do you have time to put together a PR with this grammar+prose change? I'm game.

Yes, tonight I can put that together :) should it be in the form of an amendment to SPEC.md?

yep!

I agree there's some ambiguity in the original. That example would translate to "xyz", because all literal whitespace after the \ is consumed and discarded. If you want to retain whitespace, it should either come before the \ or itself be escaped. I think my comment (#213 (comment)) succinctly describes this.

Is this what Rust does? I would've expected that to at least preserve the first newline. Then again, this is consistent with KDL's existing escline rule where \<newline> is the same as <non-newline whitespace>

CAD97 · 2022-08-31T21:48:57Z

Is this what Rust does?

[playground]

[src/main.rs:2] dbg!("\
    here\
    is\
    an\
    example\
    ") = "hereisanexample"

hkolbeck · 2022-08-31T22:01:28Z

It's worth noting that bash behaves similarly as far as just dropping the newline, though it doesn't consume space afterward:

❯ echo foo\
… ❯ bar\
… ❯ baz
foobarbaz

With that I think xyz is the right output, and am +1 on including it in v2

Edit: Scratch that, I'm a space cadet:

❯ echo foo\
      bar\
      baz
foo bar baz

I'm more prone to emulating bash over rust, but I'm curious how others feel

Lucretiel · 2022-08-31T23:08:36Z

Bash's behavior is concerned with syntactic whitespace (ie, allowing commands to spread over multiple lines with line continuations). It doesn't meaningfully behave in terms of consuming or not consuming specific whitespace so much as it extends a line to the next line while retaining the separation of tokens for a command. In your echo example, all that's happened is that the foo and bar and baz have correctly been passed as different arguments to echo; it's no different than:

> echo foo             bar \
   baz
foo bar baz

Kaydle has basically the same behavior with its own line continuation syntax, where you can use a \ to continue a single node into the next line. All these nodes are the same:

node 1 2 3
node 1    2   3
node 1\
  2\
  3

#213 is instead concerned with treatment of escaped whitespace in strings, where I think the plain consumption of unescaped whitespace makes the most sense

Is this what Rust does? I would've expected that to at least preserve the first newline. Then again, this is consistent with KDL's existing escline rule where <newline> is the same as

Rust does just consume all whitespace, regardless of type. The canonical way to add newlines to a whitespace-escaped string to to escape them:

assert_eq!(
    "line 1\n\
    line 2\n\
    line 3\n",

"line 1
line 2
line 3
"
);

Though more commonly I use it to stretch out long sentences with simple spaces:

assert_eq!(
    "This is a sentence with a \
    lot of words in it.",
    "This is a sentence with a lot of words in it."
);

zkat · 2024-04-02T07:51:41Z

ahhh yes. I see your point. I'll change it back to be dedent-before-escape.

zkat · 2024-04-02T07:59:00Z

There, that's done :)

tjol · 2024-05-14T18:21:44Z

CHANGELOG.md

+  opening `"`, and a final newline plus whitespace preceding the closing `"`.
+* SMALL EQUALS SIGN (`U+FE66`), FULLWIDTH EQUALS SIGN (`U+FF1D`), and HEAVY
+  EQUALS SIGN (`U+1F7F0`) are now treated the same as `=` and can be used for
+  properties (e.g. `お名前＝☜(ﾟヮﾟ☜)`). They are also no longer valid in bare


This example doesn't look right: surely ( and ) aren't allowed here?

tjol · 2024-05-25T15:20:55Z

Implementing the multi-line string and whitespace escaping rules is proving quite subtle.

When processing a Multi-line String, implementations MUST resolve all whitespace escapes after dedenting the string.

This sounds simple enough: if there are newlines in the string, I check that the indentation is consistent and remove it. Then, I handle the various backslash escapes.

That should take care of illegal strings like

  "
  foo \
bar
  baz
  "

(from in the spec), and legal strings like

"
    Hello
    \
         World
    "

(which is equal to "Hello\nWorld", if I'm understanding this correctly)

This algorithm does not work for this example in the spec:

    "Hello\n\
    World"

Before considering the \ escaping the newline, this looks very much like a syntax error: there is a newline in the string, but there is no initial or final newline. I believe the formal grammar also prohibits this string.

The spec prose appears to have a solution to this conundrum: (emphasis mine)

When a Quoted or Raw String spans multiple lines with literal, non-escaped Newlines, it follows a special multi-line syntax ...

So if all newlines in the string are escaped, it is not a multi-line string? To my mind that should imply that escaped newlines are not newlines for the purposes of dedenting, and contradicts the rule that dedenting comes before backslash escapes. Overall, not very satisfying.

I would suggest:

removing “, non-escaped” from the definition of multi-line strings, requiring that every string that contains ANY literal newlines follows the multi-line-string rules. This matches what the formal grammar already says.
replacing the offending example with

"
Hello\n\
    World
"

Edit: I've added a PR - #391

tjol · 2024-05-27T07:01:21Z

I wonder if the more intuitive way for strings to work would be:

remove escaped whitespace
dedent multi-line string
resolve other backslash escapes

This should have the same result as the #391 rule for all strings valid under that rule, but also accept more cases with escaped newlines.

zkat · 2024-06-11T16:26:45Z

@tjol sorry for the delay in responding:

I'm confused, what you're describing is definitely the intended behavior. You should still be able to write "multiline" strings by using whitespace escapes, they're just not going to be beholden to the multiline string rules, and can be easily detected by looking for the character sequence \<NL>.

But maybe allowing that is indeed too confusing and too painful to implement? So your suggestion means that whitespace escapes essentially no longer work unless you're in multiline string mode?

tjol · 2024-06-11T16:46:10Z

@zkat I think the way the spec is currently written – certainly the way I understood the 'letter of the law' while implementing – whitespace escapes basically can't escape newlines in single-line strings, yeah. But I agree that's probably not the way it should work. (Either way should be easy enough to implement)

I'll have another look at the wording and how to maybe clarify it (probably in a few days' time)

zkat · 2024-06-11T17:02:41Z

@tjol yeah it could definitely be improved. Thanks for being willing to take a look!!

tjol · 2024-06-13T18:42:59Z

Ok - alternative suggestion PR: #392

These rules are a bit more liberal than what was described previously, but I think they're clearer and more consistent: * This way, strings have the (I think intuitive) property that, when you 'blindly' remove the whitespace escapes, the meaning is unchanged. * If you take any valid single-line string and add a newline character and some indentation both at the start and the end, the string will still be valid (and unchanged) - previously, this was not necessarily the case if there were whitespace escapes.

tjol · 2024-06-13T20:55:23Z

SPEC.md

+
+-----------
+
+Empty lines can contain any whitespace, or none at all, and will be reflected as empty in the value:


I don't think this matches the description above or the discussion at https://github.com/kdl-org/kdl/pull/286/files#r1483699764

Should this say something like "Empty lines may omit the whitespace prefix:" instead?

j-tai · 2024-09-22T16:00:55Z

SPEC.md

+Strings in KDL represent textual UTF-8 [Values](#value). A String is either an
+[Identifier String](#identifier-string) (like `foo`), a [Quoted String](#quoted-string) (like `"foo"`) or
+a [Raw String](#raw-string) (like `#"foo"#`). Identifier Strings let you write short, "single-word" strings with a minimum of syntax; Quoted Strings let you write strings with whitespace (including newlines!) or escapes; Raw Strings let you write strings with whitespace *but without escapes*, allowing you to not worry about the string's content containing anything that might look like an escape.


Hey, I like the new barewords! I think it would be useful to document whether the different types of strings MUST be treated identically or not. In other words, we should specify whether the following MUST behave identically:

foo arg foo "arg"

If they MAY be treated differently, then this could be useful for CLI-like syntaxes, such as

node1 --option // treated as an "option" node1 "--option" // treated as a "positional argument"

The intention is definitely that they are identical, just different syntaxes for a "string". All the existing kdl1 locations that allow idents and strings (type annotations, node names, property names) don't allow you to distinguish between the two syntaxes, any more than they let you distinguish between the various quoted string syntaxes. Extending this to attribute values and property values shouldn't change things.

If you do have a node where you want to have both string positional args and boolean flags, the recommended way is to make the flags into boolean properties. node1 --option=#true and node1 "--option" are completely distinguishable. (If you're not mixing these two things, tho, then just having the presence of an ident string indicate a flag is perfectly fine, like node1 --option.)

Fair enough. In that case, I think it would be worth adding this to the spec. Something like:

"Implementations MUST treat the different string forms identically."

that's not technically correct. A document-oriented parser might represent those differently, because they're represented differently in text and they want to maintain formatting and allow control of formatting. I'm not sure the spec needs any additional language beyond specifying "these two syntaxes represent the same data type," which is already expressed.

j-tai · 2024-09-22T16:04:35Z

SPEC.md

+* idents that are the language keywords (`inf`, `-inf`, `nan`, `true`,
+  `false`, and `null`) without their leading `#`.


I wonder if it would be a good idea to make this a bit more conservative? For example, we could also disallow NaN, -nan, +inf, Infinity, etc.

Fixes: #399

Fixes: #401

zkat · 2024-11-29T08:04:41Z

This PR has gotten really unwieldy, so I'm going to go ahead and merge it to main. We can continue iterating on v2 off that branch from now on. I've updated the readme to clarify this in case anyone gets confused when first visiting.

Selonian83 · 2024-12-06T13:35:04Z

nice work pretty !!!

danini-the-panini and others added 4 commits August 28, 2022 12:59

Do not escape / (Solidus, Forwardslash) (#197)

910f6e9

KQL: require operator and change operator grammar a bit (#221)

69ac280

KQL: remove map operator and accessors (#222)

2d5e543

Honestly, they're just too implementation-specific

Allow "empty" single line comments in the spec (#234)

1bf4d74

As I read the grammar in the spec, `"//"` wouldn't parse as a single-line-comment as it requires as least one non-newline character after the slashes.

zkat requested review from hkolbeck, tabatkins, danini-the-panini, Lucretiel and larsgw August 28, 2022 20:09

Draft changelog

78a2d5f

zkat requested review from IceDragon200 and lilyball August 28, 2022 20:26

add failing test for removed solidus escape

f38edc7

IceDragon200 reviewed Aug 28, 2022

View reviewed changes

bgotink added 2 commits August 30, 2022 08:11

Use forward slash in solidus-escape test (#288)

ffeea8e

Update expected output of test with changed input (#289)

337bd1b

change escape resolution order again

c9134e3

zkat and others added 2 commits April 2, 2024 21:36

unicode was not defined in grammar

fa204ce

kql: only allow top() at start of selector (#388)

6a77436

tjol reviewed May 14, 2024

View reviewed changes

tjol mentioned this pull request May 26, 2024

[v2] Clarify how whitespace escapes work in multi-line strings #391

Closed

tjol reviewed Jun 13, 2024

View reviewed changes

tokyo4j mentioned this pull request Aug 23, 2024

Support rc.yaml (and experimentally menu.yaml) labwc/labwc#2106

Draft

j-tai reviewed Sep 22, 2024

View reviewed changes

zkat added 7 commits October 3, 2024 20:53

clarifications around multiline prefixes

1e924bc

clarify that numbers don't need to be IEEE 754 floats

93c4400

add 128-bit ints

fa3050c

get rid of syntactically significant unicode equals signs (#400)

1588b1f

Fixes: #399

[v2] more predictable slashdash (#407)

90e22bc

Fixes: #401

Release 2.0.0 draft 5

76a1de5

prep readme for merging to main

8aa4c15

zkat merged commit c8632b7 into main Nov 29, 2024

zkat deleted the kdl-v2 branch November 29, 2024 08:05

zkat restored the kdl-v2 branch November 29, 2024 08:06

zkat deleted the kdl-v2 branch November 29, 2024 08:12


		-----------

		Empty lines can contain any whitespace, or none at all, and will be reflected as empty in the value:

		* idents that are the language keywords (`inf`, `-inf`, `nan`, `true`,
		`false`, and `null`) without their leading `#`.

Uh oh!

Merge KDL v2 #286

Merge KDL v2 #286

Uh oh!

Conversation

zkat commented Aug 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zkat commented Aug 28, 2022

Uh oh!

CAD97 commented Aug 28, 2022

Uh oh!

IceDragon200 Aug 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hkolbeck commented Aug 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CAD97 commented Aug 29, 2022

Uh oh!

lilyball commented Aug 31, 2022

Uh oh!

Lucretiel commented Aug 31, 2022

Uh oh!

hkolbeck commented Aug 31, 2022

Uh oh!

zkat commented Aug 31, 2022

Uh oh!

Lucretiel commented Aug 31, 2022

Uh oh!

Lucretiel commented Aug 31, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zkat commented Aug 31, 2022

Uh oh!

CAD97 commented Aug 31, 2022

Uh oh!

hkolbeck commented Aug 31, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Lucretiel commented Aug 31, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zkat commented Apr 2, 2024

Uh oh!

zkat commented Apr 2, 2024

Uh oh!

tjol May 14, 2024

Choose a reason for hiding this comment

Uh oh!

tjol commented May 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tjol commented May 27, 2024

Uh oh!

zkat commented Jun 11, 2024

Uh oh!

tjol commented Jun 11, 2024

Uh oh!

zkat commented Jun 11, 2024

Uh oh!

tjol commented Jun 13, 2024

Uh oh!

tjol Jun 13, 2024

Choose a reason for hiding this comment

Uh oh!

j-tai Sep 22, 2024

Choose a reason for hiding this comment

Uh oh!

tabatkins Sep 23, 2024

Choose a reason for hiding this comment

Uh oh!

j-tai Sep 23, 2024

Choose a reason for hiding this comment

Uh oh!

zkat Sep 24, 2024

Choose a reason for hiding this comment

Uh oh!

j-tai Sep 22, 2024

zkat commented Aug 28, 2022 •

edited

Loading

IceDragon200 Aug 28, 2022 •

edited

Loading

hkolbeck commented Aug 29, 2022 •

edited

Loading

Lucretiel commented Aug 31, 2022 •

edited

Loading

hkolbeck commented Aug 31, 2022 •

edited

Loading

Lucretiel commented Aug 31, 2022 •

edited

Loading

tjol commented May 25, 2024 •

edited

Loading