Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Native/abstracted sub-parsers #340

Open
ckiee opened this issue Feb 21, 2022 · 6 comments
Open

Native/abstracted sub-parsers #340

ckiee opened this issue Feb 21, 2022 · 6 comments

Comments

@ckiee
Copy link

ckiee commented Feb 21, 2022

I'm in a similar situation to #199:

many1(block_expr_node).parse("*hi*")

fn block_expr_node<Input>() -> FnOpaque<Input, BlockExprNode>
where
    Input: Stream<Token = char>,
    Input::Error: ParseError<Input::Token, Input::Range, Input::Position>,
{
    opaque!(no_partial(
        choice!(bold(), char()).message("while parsing block_expr_node")
    ))
}


fn char<Input>() -> impl Parser<Input, Output = BlockExprNode>
where
    Input: Stream<Token = char>,
    Input::Error: ParseError<Input::Token, Input::Range, Input::Position>,
{
    satisfy(|c: char| !c.is_control())
        .map(|c| BlockExprNode::Char(c))
        .message("while parsing char")
}

fn bold<Input>() -> impl Parser<Input, Output = BlockExprNode>
where
    Input: Stream<Token = char>,
    Input::Error: ParseError<Input::Token, Input::Range, Input::Position>,
{
    (token('*'), many1(block_expr_node()), token('*'))
        .map(|(_, v, _)| BlockExprNode::Bold(v))
        .message("while parsing bold")
}

I think this does not work because once it starts to parse in bold, the many1(block_expr_node()) picks char and the input is consumed until EOF:

Error: Parse error at line: 1, column: 4
Unexpected end of input
Expected `*`
while parsing bold
while parsing char
while parsing block_expr_node

Replacing the bold implementation with:

fn bold<Input>() -> impl Parser<Input, Output = BlockExprNode>
where
    Input: Stream<Token = char>,
    Input::Error: ParseError<Input::Token, Input::Range, Input::Position>,
{
    (
        token('*'),
        take_until::<String, _, _>(token('*')).map(|s| {
            // HACK ouch ouch ouch
            many1(block_expr_node())
                .easy_parse(position::Stream::new(&s[..]))
                // this is the except on Result
                .expect("In bold subparser")
                .0
        }),
        token('*'),
    )
        .map(|(_, v, _)| BlockExprNode::Bold(v))
        .message("while parsing bold")
}

..parses correctly but is obviously messy and handling errors correctly as in #199 (comment) only adds more boilerplate. Do you think it could be possible to add an abstraction above flat_map so this could be done like:

fn bold<Input>() -> impl Parser<Input, Output = BlockExprNode>
where
    Input: Stream<Token = char>,
    Input::Error: ParseError<Input::Token, Input::Range, Input::Position>,
{
    (
        token('*'),
        take_until::<String, _, _>(token('*')).and_reparse_with(many1(block_expr_node()),
        token('*'),
    )
        .map(|(_, v, _)| BlockExprNode::Bold(v))
        .message("while parsing bold")
}

It'd still create the sub-parser in a flat_map but would hide the scary types from me :P

@Marwes
Copy link
Owner

Marwes commented Feb 22, 2022

Could you post what kind of input you are trying to parse and how the output is expected to look like? I am not sure I understand it entirely, best I can tell the syntax seem rather ambigous.

**hi** // Could be parsed with the 1st and 4th `*` and the 2nd and 3rd as the start/end of a block or it could be parsed with the 1st and 2nd and the 3rd and 4th in the same block

With combine being LL it will consume eagerly so the first case (1+4 and 2+3) is the one that is parsed.

@ckiee
Copy link
Author

ckiee commented Feb 22, 2022

Overall it's a mix of Org-mode and Markdown so 1+4 and 2+3 seem right, although they don't really have any special meaning and will just get optimized out later:

*hi*                         ; BlockExprNode::Bold(BlockExprNode::Text("hi"))
*/italics & bold/*           ; BlockExprNode::Bold(BlockExprNode::Italics(BlockExprNode::Text("italics & bold")))
**this is kinda useless**    ; BlockExprNode::Bold(BlockExprNode::Bold(BlockExprNode::Text("this is kinda useless")))

@Marwes
Copy link
Owner

Marwes commented Feb 22, 2022

If that is the case, why is bold calling itself recursively? Why not just (token('*'), many1(char()), token('*')) ?

@ckiee
Copy link
Author

ckiee commented Feb 22, 2022

why is bold calling itself recursively?

Because eventually block_expr_node's choice! will have more options (like italics, etc..) so I'm just preparing it for that.

Ideally I could make block_expr_node's choice! skip bold in the second, nested call from bold but this isn't the reason I opened the issue. What do you think about the flat_map abstraction?

@Marwes
Copy link
Owner

Marwes commented Feb 22, 2022

It wouldn't be an unreasonable addition, however **this is kinda useless** would not be parsed as you expect with the and_reparse_with parser you showed, it would be parsed as Bold(""), Text("this is kinda useless"), Bold("") so I am not sure it is the solution you are looking for.

To parse it as BlockExprNode::Bold(BlockExprNode::Bold(BlockExprNode::Text("this is kinda useless"))) you would effectively need infinite lookahead to figure out which * are opening and closing a block which doesn't seem right.

Another solution may be to do something like (many1(choice(bold_char(), italics_char(), etc)).then(|prefix| text().skip(string(prefix.reverse()))) which would consume a "block prefix", then parse the text and finally check that the prefix appears in reverse order after.

@ckiee
Copy link
Author

ckiee commented Feb 22, 2022

Another solution may be to do something like [..]

That would probably work, but you still need to treat hello* world as normal text (no matching bold_char) so it might be a bit more tricky.. For now I think I will leave this edge case alone since I want to get the whole pipeline kinda-working instead of parsing perfectly right away :P

It wouldn't be an unreasonable addition

Should I have a go at making a PR then? It seems tricky and I am still scared of all the types so I would need some mentoring probably

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants