-
Notifications
You must be signed in to change notification settings - Fork 858
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify that comments never affect the tables produced by parsers #950
Conversation
I'm not convinced that this is needed. Could you give a specific example of how a comment could trigger changes to tables without violating the current spec? |
My read of this is to explicitly codify that vanilla TOML does not allow for things like comment-based Or, put differently, a conforming parser is always expected to ignore those constructs entirely, and never change the way the document is interpreted. |
@ChristianSi Basically, what @marzer said, though it's not about the documents as much as the parsers that read them. You could write this test: if you walk through a generated table, a comment's text should not appear as a key, a table name, or a string value, assuming it wasn't introduced elsewhere via legitimate means. If a parser can preserve comments, then those comments could not be identified via a simple walk through the resulting table. They would need to be stored in a way that could not be conflated with keys, values, or table names. End-users could read comments and tweak the tables, but that would be outside the scope of parsers. And any form of extension syntax that an otherwise conformant parser would recognize, obviously, falls outside this restriction. Here are two simple examples, which aren't all that clever, that could be read by consumers (i.e. parsers and their end-user programs) and manipulated in excessively clever ways. I talk a bit about how those clever ways could be implemented legitimately later on. First is, how could we assure that a float is interpreted as a decimal, so that exact representation is preserved? The bad way is to comment that it's a precise decimal, and have the parser read that comment and emit a different value type than expected for floats, based on its representation in the TOML file. # Not Recommended
one_quarter = 0.25 ##decimal
nearly_a_third = 0.3333333333 ##decimal Second is something that I came across at work one day. By default, keys and subtables are not ordered in TOML tables. But what if you wanted to put subtables in a sequence defined by the order in which they're described? # Not Recommended
[report_tabs] ##sequenced
[report_tabs.one]
[report_tabs.two]
[report_tabs.skipafew]
[report_tabs.ahundred] The explicit There doesn't seem to be a common way across languages or parsers to assure that decimal values are preserved. I am aware that the parsed_toml = tomllib.load(toml_file, parse_float=Decimal) Sequenced subtables look like they'd work, but the only thing in TOML that actually guarantees a proper ordering is arrays. Something like the following isn't as pleasant-looking, but it gets the job done right. The names of the subtables from before are now values to [[report_tabs]]
name = "one"
[[report_tabs]]
name = "two"
[[report_tabs]]
name = "skipafew"
[[report_tabs]]
name = "ahundred" |
Hmm, TOML as doesn't guarantee any specific ordering of key/value pairs in a table, but neither does it prohibit it. For example, in Python (since v3.7) Now, for "pragmas" such as |
@ChristianSi I never said that ordered subtables were in violation of the spec, just that their order is not guaranteed. Most of the examples that we discuss are Python-centric, but even if Python guarantees insertion order on dicts, there's no guarantee that the parser will insert subtables in that order that they are encountered in the TOML doc. Nobody can assume that text order will match insertion order, and anyone coding with that assumption is relying on implementation details alone, and that can be a dangerous practice. And in those languages that provide iteration over hash tables but have no such order guarantee, there may be future demand for such a feature. Which their parsers could provide, certainly. So a suggestion to ensure such an ordering at the level of the TOML spec might never be encountered. I just chose sequence order and decimal values as simple examples of potential comment abuse.
Yes, they harm the TOML standard. They shouldn't be ignored. What I'm trying to prevent is the widespread use of TOML comments as pragmas. Comments are meant to be ignored by parsers, and they shouldn't carry any more weight than that. In the long run, we must have that assurance. If TOML parsers are going to implement extension features, then I'd much rather see them do so by introducing changes to the TOML syntax in their implementations. This would lead to two useful states, depending on the viability of the changes. If they're unpopular changes, they'll be limited to the single parser that implemented them, and no other parser will be able to read the documents written with the nonconformant syntax. End-user programs reliant on such features will be bound to their oddball parsers, such that variations of narrowly scoped deviations from TOML will be isolated, and rightly so. And if they're popular changes, then the extended usage of the new syntax in the wild would guide the development of future iterations of the TOML standard, which would put the widely adopted syntax into a larger perspective and prevent fragmentation of TOML's user base. Parsers may ignore the new syntax and fall out of conformance, and rightly so. But they'd see enough reasons to implement these features for themselves, and not ignore them like how they would ignore comments. Do you see why I am proposing this change now? |
I get the purpose of this PR, I but I still don't see the need for it. Are there any pragmas in actual use anywhere in the wild, with some parsers actually supporting and honoring them? If so, I'd like to see and study them, to see whether they are actually harmful or rather harmless and maybe even suggestive about which additional features TOML should come to support. If there aren't any, I don't see any cause for action at all. Lets not act prematurely. If there are no pragmas in real life, this whole thing is a non-issue. If there are any, they might teach us something. Personally, I think the best attitude is "from the viewpoint TOML spec, comments have no meaning", but without forcing every TOML parser to necessarily adapt the same attitude. |
I don't know if there are any parsers that include comment-based pragmas and I agree it's not needed, but I think a single short line is helpful as it clarifies that comments are always "pure comments". I'm not too worried about some implementation using wild pragma statements, it's just a clarity thing. We could maybe also tweak the opening line a bit from:
to something like:
Or something along those lines (my suggestion could be improved). That would serve the same purpose. |
Co-authored-by: Pradyun Gedam <[email protected]>
I'm guessing, kadir, that your GitHub tool has bugs. @pradyunsg Can we mark the last three messages (including this one) as off-topic? |
@pradyunsg It's been 19 months since this PR was touched in earnest. I know it's a minor issue, but I'd love for us to get TOML v1.1.0rc1 out the door, and if reviewing this PR helps to get you a bit more involved in coordinating a release, it'll be worth reminding you of it. Will we merge it, close it, or defer it until after v1.1.0? |
Yea, I need to set some time aside to do a proper handoffs here to share the workload around this repo/project. It's been on my TODO list for a while now. 😅 |
In the discussion on #603, it became clear that no restrictions are placed on parsers that could further modify the hash tables that they generate based on included comments. The purpose of this PR is to assure that parsers will never modify a TOML document's resulting hash table due to the presence or contents of the comments contained within the document.
The change to
toml.md
will prevent a certain class of abuses of the TOML syntax, without prohibiting comment-preserving parsers from doing their thing (as those comments are not part of the resulting table but merely reference it from locations in the source document).(This PR replaces #948.)