A case of missed delimiter in string literals and comments #18

ruv · 2024-11-22T07:47:40Z

When parsing (see 3.4.1 Parsing), if a delimiter character is not present in the parse area, the selected string continues up to and including the last character in the parse area.

This meaning of parsing is used in the words (, s", c". Therefore, if there is no delimiter in the parse area, the system is not allowed to throw an exception or even display a warning.

In practice, this is not used in programs — if the ) or " delimiter is absent, then most likely it is typo.

Perhaps, an ambiguous condition should be declared for these words when the corresponding delimiter is absent in the parse area.

This ambiguous condition will allow to throw an exception or accept a multi-line string literal if a delimiter is missed in the parse area.

The text was updated successfully, but these errors were encountered:

SirWumpus · 2024-11-22T12:06:14Z

I'v always thought that ( ...) closing delimiter was intended for multiline comments (and implemented such), much like C's /* ... */. In a similar manner I would think its more useful to allow S" ...", S\" ...", C" ..", ." ..." .( ...) to be multiline to display runtime text (instructions, help, usage, etc) or parse with EVALUATE or submit to other words (maybe templated strings, printf like strings, etc).

Plus why would you not support multiline comments when you have \ for single line comments.

64 ASCII octet lines in a block seem rather short, likewise 80 octet lines (min. input buffer length & transient buffer) seem rather short, especially if UTF8 multibyte characters become more of a norm (1 to 4 bytes in length (upto 6 in theory)). I mention this because short lines for comments seems limiting.

Side-buffet: Another issue of concern is the use of character for buffer sizes; is this a hidden assumption of ASCII single byte characters or do they actually account for UTF8 or wide characters. In the case of UTF8 a character can occupy up to 4 bytes (6 if they ever extend that far), so an 80 byte input buffer could only hold 20 UTF8. Maybe character should be highlighted or rephrased as wide character when they mean characters, instead of bytes.

ruv · 2024-11-22T16:09:41Z

I'v always thought that ( ...) closing delimiter was intended for multiline comments

Yes, but with one exception (namely, when the input source is the user input device).

The word 11.6.1.0080 ( (form the FILE word set) is a multi-line comment.

The word 6.1.0080 ( (from the CORE word set) is limited by the input buffer. This means, when the input source is the user input device, the parsing is limited by one line.

I dislike this, because it will work incorrectly when a program with multi-line comments from a file is redirected to stdin.

An ambiguous condition will allow to fix this at least in Forth systems (and will allow to support multi-line string literals by s"). Then, the corresponding changes could be standardized.

I thought this ambiguous condition is stated in Forth-2012. But I was wrong.

Side-buffet: Another issue of concern is the use of character for buffer sizes; is this a hidden assumption of ASCII single byte characters or do they actually account for UTF8 or wide characters.

It is counted in "primitive characters", not in UTF-8 multi-byte characters. In Forth-2019 draft, the primitive character size is always 1 address unit.

SirWumpus · 2024-11-22T17:49:11Z

It is counted in "primitive characters", not in UTF-8 multi-byte characters. In Forth-2019 draft, the primitive character size is always 1 address unit.

Ugh I'm neolithic ASCII
I'm evolved UTF8 このソースコードには何も問題はありません

Couldn't resist.

alextangent · 2024-11-22T17:58:27Z

up to 4 bytes (6 if they ever extend that far), Encodings of 5 octets or more aren’t valid UTF-8. The UNICODE standard is the authoritative specification, but this RFC is an easier read: https://datatracker.ietf.org/doc/html/rfc3629; “In UTF-8, characters from the U+0000..U+10FFFF range (the UTF-16 accessible range) are encoded using sequences of 1 to 4 octets.” Of course, it’s possible that some mad bonkers techie could persuade the rest of the world that 5 and 6 octet encoding had some use and get them standardised, but I doubt that will happen. There’s a whole chunk of code out there that needs UTF-8 to fit in a maximum of 32 bits. Regards Alex From: Anthony Howe ***@***.***> Sent: Friday, November 22, 2024 12:07 PM To: ForthHub/standard-evolution ***@***.***> Cc: Subscribed ***@***.***> Subject: Re: [ForthHub/standard-evolution] A case of missed delimiter in string literals and comments (Issue #18) I'v always thought that ( ...) closing delimiter was intended for multiline comments (and implemented such), much like C's /* ... */. In a similar manner I would think its more useful to allow S" ...", S\" ...", C" ..", ." ..." .( ...) to be multiline to display runtime text (instructions, help, usage, etc) or parse with EVALUATE or submit to other words (maybe templated strings, printf like strings, etc). Plus why would you not support multiline comments when you have \ for single line comments. 64 ASCII octet lines in a block seem rather short, likewise 80 octet lines (min. input buffer length & transient buffer) seem rather short, especially if UTF8 multibyte characters become more of a norm (1 to 4 bytes in length (upto 6 in theory)). Side-buffet: Another issue of concern is the use of character for buffer sizes; is this a hidden assumption of ASCII single byte characters or do they actually account for UTF8 or wide characters. In the case of UTF8 a character can occupy up to 4 bytes (6 if they ever extend that far), so an 80 byte input buffer could only hold 20 UTF8. Maybe character should be highlighted or rephrased as wide character when they mean characters, instead of bytes. — Reply to this email directly, view it on GitHub <#18 (comment)> , or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAOB6OQZXH6SAQ4HH5PSUUD2B4M4ZAVCNFSM6AAAAABSIX64BGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOJTGYYDMMRSGI> . You are receiving this because you are subscribed to this thread. <https://github.com/notifications/beacon/AAOB6OTVNS6JYHRLLVQMBI32B4M4ZA5CNFSM6AAAAABSIX64BGWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTUUUFUU4.gif> Message ID: ***@***.*** ***@***.***> >

ruv added the enhancement New feature or request label Nov 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A case of missed delimiter in string literals and comments #18

A case of missed delimiter in string literals and comments #18

ruv commented Nov 22, 2024 •

edited

Loading

SirWumpus commented Nov 22, 2024 •

edited

Loading

ruv commented Nov 22, 2024 •

edited

Loading

SirWumpus commented Nov 22, 2024

alextangent commented Nov 22, 2024 via email

A case of missed delimiter in string literals and comments #18

A case of missed delimiter in string literals and comments #18

Comments

ruv commented Nov 22, 2024 • edited Loading

SirWumpus commented Nov 22, 2024 • edited Loading

ruv commented Nov 22, 2024 • edited Loading

SirWumpus commented Nov 22, 2024

alextangent commented Nov 22, 2024 via email

ruv commented Nov 22, 2024 •

edited

Loading

SirWumpus commented Nov 22, 2024 •

edited

Loading

ruv commented Nov 22, 2024 •

edited

Loading