Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A case of missed delimiter in string literals and comments #18

Open
ruv opened this issue Nov 22, 2024 · 4 comments
Open

A case of missed delimiter in string literals and comments #18

ruv opened this issue Nov 22, 2024 · 4 comments
Labels
enhancement New feature or request

Comments

@ruv
Copy link
Collaborator

ruv commented Nov 22, 2024

When parsing (see 3.4.1 Parsing), if a delimiter character is not present in the parse area, the selected string continues up to and including the last character in the parse area.

This meaning of parsing is used in the words (, s", c". Therefore, if there is no delimiter in the parse area, the system is not allowed to throw an exception or even display a warning.

In practice, this is not used in programs — if the ) or " delimiter is absent, then most likely it is typo.

Perhaps, an ambiguous condition should be declared for these words when the corresponding delimiter is absent in the parse area.

This ambiguous condition will allow to throw an exception or accept a multi-line string literal if a delimiter is missed in the parse area.

@ruv ruv added the enhancement New feature or request label Nov 22, 2024
@SirWumpus
Copy link

SirWumpus commented Nov 22, 2024

I'v always thought that ( ...) closing delimiter was intended for multiline comments (and implemented such), much like C's /* ... */. In a similar manner I would think its more useful to allow S" ...", S\" ...", C" ..", ." ..." .( ...) to be multiline to display runtime text (instructions, help, usage, etc) or parse with EVALUATE or submit to other words (maybe templated strings, printf like strings, etc).

Plus why would you not support multiline comments when you have \ for single line comments.

64 ASCII octet lines in a block seem rather short, likewise 80 octet lines (min. input buffer length & transient buffer) seem rather short, especially if UTF8 multibyte characters become more of a norm (1 to 4 bytes in length (upto 6 in theory)). I mention this because short lines for comments seems limiting.

Side-buffet: Another issue of concern is the use of character for buffer sizes; is this a hidden assumption of ASCII single byte characters or do they actually account for UTF8 or wide characters. In the case of UTF8 a character can occupy up to 4 bytes (6 if they ever extend that far), so an 80 byte input buffer could only hold 20 UTF8. Maybe character should be highlighted or rephrased as wide character when they mean characters, instead of bytes.

@ruv
Copy link
Collaborator Author

ruv commented Nov 22, 2024

I'v always thought that ( ...) closing delimiter was intended for multiline comments

Yes, but with one exception (namely, when the input source is the user input device).

The word 11.6.1.0080 ( (form the FILE word set) is a multi-line comment.

The word 6.1.0080 ( (from the CORE word set) is limited by the input buffer. This means, when the input source is the user input device, the parsing is limited by one line.

I dislike this, because it will work incorrectly when a program with multi-line comments from a file is redirected to stdin.

An ambiguous condition will allow to fix this at least in Forth systems (and will allow to support multi-line string literals by s"). Then, the corresponding changes could be standardized.

I thought this ambiguous condition is stated in Forth-2012. But I was wrong.


Side-buffet: Another issue of concern is the use of character for buffer sizes; is this a hidden assumption of ASCII single byte characters or do they actually account for UTF8 or wide characters.

It is counted in "primitive characters", not in UTF-8 multi-byte characters. In Forth-2019 draft, the primitive character size is always 1 address unit.

@SirWumpus
Copy link

It is counted in "primitive characters", not in UTF-8 multi-byte characters. In Forth-2019 draft, the primitive character size is always 1 address unit.

Ugh I'm neolithic ASCII
I'm evolved UTF8 このソースコードには何も問題はありません

Couldn't resist.

@alextangent
Copy link

alextangent commented Nov 22, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants