-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A case of missed delimiter in string literals and comments #18
Comments
I'v always thought that Plus why would you not support multiline comments when you have 64 ASCII octet lines in a block seem rather short, likewise 80 octet lines (min. input buffer length & transient buffer) seem rather short, especially if UTF8 multibyte characters become more of a norm (1 to 4 bytes in length (upto 6 in theory)). I mention this because short lines for comments seems limiting. Side-buffet: Another issue of concern is the use of |
Yes, but with one exception (namely, when the input source is the user input device). The word 11.6.1.0080 The word 6.1.0080 I dislike this, because it will work incorrectly when a program with multi-line comments from a file is redirected to stdin. An ambiguous condition will allow to fix this at least in Forth systems (and will allow to support multi-line string literals by I thought this ambiguous condition is stated in Forth-2012. But I was wrong.
It is counted in "primitive characters", not in UTF-8 multi-byte characters. In Forth-2019 draft, the primitive character size is always 1 address unit. |
Couldn't resist. |
up to 4 bytes (6 if they ever extend that far),
Encodings of 5 octets or more aren’t valid UTF-8. The UNICODE standard is the authoritative specification, but this RFC is an easier read: https://datatracker.ietf.org/doc/html/rfc3629; “In UTF-8, characters from the U+0000..U+10FFFF range (the UTF-16 accessible range) are encoded using sequences of 1 to 4 octets.”
Of course, it’s possible that some mad bonkers techie could persuade the rest of the world that 5 and 6 octet encoding had some use and get them standardised, but I doubt that will happen. There’s a whole chunk of code out there that needs UTF-8 to fit in a maximum of 32 bits.
Regards
Alex
From: Anthony Howe ***@***.***>
Sent: Friday, November 22, 2024 12:07 PM
To: ForthHub/standard-evolution ***@***.***>
Cc: Subscribed ***@***.***>
Subject: Re: [ForthHub/standard-evolution] A case of missed delimiter in string literals and comments (Issue #18)
I'v always thought that ( ...) closing delimiter was intended for multiline comments (and implemented such), much like C's /* ... */. In a similar manner I would think its more useful to allow S" ...", S\" ...", C" ..", ." ..." .( ...) to be multiline to display runtime text (instructions, help, usage, etc) or parse with EVALUATE or submit to other words (maybe templated strings, printf like strings, etc).
Plus why would you not support multiline comments when you have \ for single line comments.
64 ASCII octet lines in a block seem rather short, likewise 80 octet lines (min. input buffer length & transient buffer) seem rather short, especially if UTF8 multibyte characters become more of a norm (1 to 4 bytes in length (upto 6 in theory)).
Side-buffet: Another issue of concern is the use of character for buffer sizes; is this a hidden assumption of ASCII single byte characters or do they actually account for UTF8 or wide characters. In the case of UTF8 a character can occupy up to 4 bytes (6 if they ever extend that far), so an 80 byte input buffer could only hold 20 UTF8. Maybe character should be highlighted or rephrased as wide character when they mean characters, instead of bytes.
—
Reply to this email directly, view it on GitHub <#18 (comment)> , or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAOB6OQZXH6SAQ4HH5PSUUD2B4M4ZAVCNFSM6AAAAABSIX64BGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOJTGYYDMMRSGI> .
You are receiving this because you are subscribed to this thread. <https://github.com/notifications/beacon/AAOB6OTVNS6JYHRLLVQMBI32B4M4ZA5CNFSM6AAAAABSIX64BGWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTUUUFUU4.gif> Message ID: ***@***.*** ***@***.***> >
|
When parsing (see 3.4.1 Parsing), if a delimiter character is not present in the parse area, the selected string continues up to and including the last character in the parse area.
This meaning of parsing is used in the words
(
,s"
,c"
. Therefore, if there is no delimiter in the parse area, the system is not allowed to throw an exception or even display a warning.In practice, this is not used in programs — if the
)
or"
delimiter is absent, then most likely it is typo.Perhaps, an ambiguous condition should be declared for these words when the corresponding delimiter is absent in the parse area.
This ambiguous condition will allow to throw an exception or accept a multi-line string literal if a delimiter is missed in the parse area.
The text was updated successfully, but these errors were encountered: