Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Alternative "Array of Tables" and arrays syntax #1052

Open
svew opened this issue Jan 21, 2025 · 0 comments
Open

Proposal: Alternative "Array of Tables" and arrays syntax #1052

svew opened this issue Jan 21, 2025 · 0 comments

Comments

@svew
Copy link

svew commented Jan 21, 2025

This is a proposal for an alternative syntax for representing an array of tables. It has much in common with the conversation had in #309.

Problem

My issue lies with this syntax:

[[list]]
a = 0
b = 1
[[list]]
a = 2
c = 3

The main issue with it is it's confusing what the second set of brackets is meant to symbolize, and makes this not "obvious" as an array syntax. What do the square brackets here mean? On one hand, the purpose of the section header brackets in INI files is to be visually different and stand out compared to keys, so the double bracket mean... an emphasized section header? Why would an array element be emphasized over other tables? On the other hand, one could argue it's meant to visually represent an array's brackets, but why would the array's name appear inside the brackets? Shouldn't it be more like [list_element][]?

Additionally, the syntax isn't composable. The nice thing about the dotted keys is that dotted keys as members of sections and dotted keys as defining a section mean very similar and complimentary things. The double square brackets for array of tables barely relates to other concepts of tables, arrays, keys, etc. in TOML, aside from the fact they both generally use square brackets somewhere.

The problem is NOT that this syntax is unable to represent nested arrays. Nesting complex structures is not a design goal of INI nor TOML, and trying to accomidate this is attempting to turn TOML into YAML. This proposal is not interested in finding a syntax that makes nesting easier.

Proposal

I'd propose this (unoriginal) alternative syntax (same data as above):

[list.+]
a = 0
b = 1
[list.+]
a = 2
c = 3

In this new syntax, + is a special key called an "incrementing numeric key". You can also think of it as meaning "some numeric key", where "some" means that you are not describing which numeric subkey this table represents, and allowing that assignment to be done by the parser (which will assign the first numeric key it finds as '0', and increment by one for each + it encounters). Or, you can think of it as just simply meaning "append". Maybe "appending key" would be easier to remember...

This means that in addition to using + for sections, you can also use them for writing dotted keys in general, to append to an array:

list.+ = "Hello, world!"
list.+ = "Goodbye, world."

This allows the author to append to an array in multiple places across the configuration, giving the same freedom that tables have to set fields after their initial declaration:

# Make an exception for port 98102 in response to customer ask
server.exceptions.+ = 98102
client.exceptions.+ = 98102

Another benefit of this is that you can write arrays in a very similar manner to regular tables, instead of using an inline array. The incrementing numeric key can be used within a defined section to enumerate elements of the array like so:

[list]
+ = "Hello, world!"
+ = "Goodbye, world."

(This syntax could possibly create an issue for parsers, more about that in the "Cons" section at the bottom.)

+ is a special key because it is not an actual key, and must be used only as a placeholder for an array key. You cannot, for example, assign to + in the global section, because that would imply that the root of the TOML file is an array (which is explicitly disallowed):

# NOT ALLOWED
+ = "Hello, world!"
+ = "Goodbye, world."

+ is also special because each array element created by by a + is unique, and cannot be referred to again. A mistake a TOML author might make with this change is thinking that, like normal dotted keys, you can refer to tables that have been defined earlier in the config later on. For this reason, you're not allowed to define an array element and a subkey inline. In other words, + must be the last key in the dotted key path:

# NOT ALLOWED
list.+.first_name = "George"
list.+.last_name = "Foreman"

# NOT ALLOWED
[list.+.item]

Because + refers to one unique array element and must be the last key in the dotted key, you also cannot use + multiple times in a dotted key. Therefore, you cannot nest arrays by using multiple +, which makes sense because it's ambiguous what array new elements would be added to.

# NOT ALLOWED
nested_list.+.+ = 0

You can however still make nested arrays by combining + with inline arrays & tables:

list.+ = [ 1, 2, 3 ]

# Or...

[matrix]
+ = [ 1, 2, 3 ]
+ = [ 4, 5, 6 ]
+ = [ 7, 8, 9 ]

Finally, you can use + as part of both section and key:

[nested_array.+]
+ = 0
+ = 1
[nested_array.+]
+ = 2
+ = 3
# In JSON, would be: { "nested_array": [ [ 0, 1 ], [ 2, 3 ] ] }
[player.+]
name = "Sam Squiggly"
background = "Gambler"
inventory.+ = {
    name = "Sword",
    type = "Weapon",
    damage = "2d6",
}
inventory.+ = {
    name = "Tiamut's Ring",
    type = "Clothing",
    desc = "Forged from the fires of hell",
}
equipped.+ = "Sword"
equipped.+ = "Tiamut's Ring"

Pros

1. Easy to grasp

+ is a key that appends to an array. Simple as that.

2. More coherent mental model

By reusing concepts defined in the TOML spec, we re-enforce these concepts. table.item means we define a table "table", and reference/create a subelement of this table using a textual key "item". list.+ means we define a table "list", and reference/create a subelement of this table using a numeric key, aliased as +.

This is the first time mentioning this particular concept, but in the Lua programming language, "arrays" and "lists" do not really exists. Arrays are instead represented using tables with numbered indices. The same concept can be spiritually mapped here. TOML does make a distinction between tables and arrays (as it should), but indexing tables and arrays in any language is not so different, and that idea is represented well by this proposal.

3. Backwards compatible

Currently, the + character is not allowed to be used as part of a raw key string, so there's no issues with collisions here. Additionally, it would be a syntax error to use a + in any of the areas I've proposed above.

4. Enables future syntax extensions

I think this format lends itself much better to possible future additions to the language, or to custom TOML parser extensions. This proposal isn't necessarily arguing in favor of any of these, but this syntax opens new doors that weren't previously available, or didn't relate at all to the current syntax:

  • Allowing relative subheaders to make sections appear more "listlike":

    [list]
      [.+]
        a = 0
        b = 1
      [.+]
        a = 2
        b = 3
    

    Could also be [+], but I think the dotted syntax makes it clearer that it's saying it's a subelement of the last absolute path. This referential key syntax has been thoroughly discussed in Proposal: Reference shortcut for nesting tables #744 and ultimately rejected. My two cents are that relative subheaders would be handy if only a single level of depth were allowed, and it always referred to the last concrete section header. Any further nesting is unecessary and complicated.

  • Write specific array indices:

    [list.0] # This table is always the first element of the array
    [list.-1] # This table is always the last element of the array
    
    list.0.enabled = true # Always enables the first item of the list
    

    This extension may not be backwards compatible, as TOML interprets keys that are only made up of digits as a valid, string key. This could be side stepped by noting that it's impossible to have a string key for an array, so the difference would be that an array will always parse keys as digits, and a table will always parse keys as strings.

  • A future feature that allows key paths to numerically subindex would relate to this concept much better

    let value = myConfig.get("list.3.a")

    Note, + should not be reused for querying, as it essentially represents an append operation. Fetching a list of all elements under an array would be better suited for a typical glob symbol such as list.*, not list.+. More on this in the "Cons" section.

Cons

1. More language complexity

This is always worth mentioning, whether the added syntax's conceptual or functional improvement is worth the added complexity of the language, given we can't (shouldn't) go back to decommision the old syntax. I, of course, believe it is, as I believe it grants greater clarity over the current model.

2. Possible parsing ambiguities

I see a possible issue in the case of a regular section header with incrementing numeric keys:

[abc]
+ = 1
+ = 2

One strength of the existing syntax in this specific situation is that [[abc]] would be unambiguously a list to a parser. To support this syntax, the parser would have to wait till the next line to determine whether [abc] is a table or an array. This isn't a breaking change, but it makes this single line ambiguous until the next line is parsed. This means that an empty section cannot be resolved, because it could mean either a table or an array.

Possible Fixes

  1. The issue could be sidesteppted by declaring that sections without keys are treated as tables, since an empty array can be represented by list = []. This would be backwards compatible, but would be a quirk in the language.

  2. A new syntax could be used and enforced for sections that represent a named array as opposed to an element of an array:

    [abc][]
    + = 1
    
    # Or maybe:
    [abc.[]] # Dot needed so that in case you want to nest, you aren't writing '.+[]'
    + = 1
    

    This removes all ambiguity, but adds a new way of representing arrays, increasing complexity of the language. I personally don't care for it much, because while the proposed syntax is ambiguous to a parser, it's not ambiguous to a reader (aside from an empty list), and doesn't require additional knowledge/concepts. However, it's completely acceptable as an alternative, and does read like it's defining an array.

  3. Another possibility is that because this use of + is more of a perk than it is necessary for it to work or necessary for good ergonomics/mental model, we can simply disallow this syntax.

Design Notes

Alternatively, using #

When I first wrote this proposal, I was going to suggest using the # character, since I believed it better communicated a numeric key and therefore a table. Obviously, this is also the character for a comment, which is not ideal. Most of the uses I cared most about still worked even in this case, because using # is unambiguously not a comment when used like [list.#] or list.# = ..., because if it were interpreted as a comment there, the file would be invalid. However, it meant that we could not use the + = ... syntax, because that would have been interpreted as a comment. If it turns out the parsing problem I mentioned earlier is indeed an issue, # could be used instead.

Conclusion

This new syntax does a much better job representing subelements of an array, by connecting the concepts of keys of tables with keys of arrays. It allows representations and organizations that weren't possible before, and language extensions that weren't possible before (regardless of whether official TOML decides to implement them), all without losing the spirit of TOML: Flat, "obvious", configuration files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant