Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to store parsers in a struct ? #289

Open
zearen opened this issue May 29, 2020 · 5 comments
Open

How to store parsers in a struct ? #289

zearen opened this issue May 29, 2020 · 5 comments

Comments

@zearen
Copy link

zearen commented May 29, 2020

Some parsers are really expensive to build, so I'd like to build them once at the start of the app then re-use them throughout the lifetime of the app. When I say expensive, I created a parser that takes a mapping and converts it into a trie, then returns the mapped value with the longest matching prefix. (I do intend to open source it once I get permission from my employer). The issue is that as far as I can tell, I have to type out the full type of the parser state to store it because of E0191. I have to specify the type of PartialState which gets really complicated really quickly, and I don't really care what it is.

An example will hopefully clarify the issue:

struct TranscriptionService {
    parser: Box<for<'a> Parser<&'a str, Output=String>,
    ...
}

impl TranscriptionService {
    fn new() -> Self {
        TranscriptionService {
            parser: many(choice((very(complicated()), large(parser())))),
            ...,
        }
    }

    fn transcribe(&mut self, s: &str) {
        match self.parser1.parse(s) {
            Ok(res) => res,
            Err(e) => // Log it or something.
    }
}

The users of TranscriptionService do not need to know the details of the parser it's using, so I don't want to promote the parser type to a generic parameter of TranscriptionService, but after all those parsers are put together, the PartialState type is going to become very complicated very fast. It'll also be fragile and change often as the parser's implementation changes.

In C++, I'd solve this by writing a function to build the parser with the auto type, i.e. -> impl Trait in Rust, then using decltype of the function in the struct, but I don't know how to do this in Rust. AFAICT, you just can't. Higher Rank Trait Bounds at the type level (for<'a, IDGAF> Parser<&'a str, Output=String, PartialState=IDGAF>) would also work, but are unimplemented.

Am I missing something, or am I just going to have to write out the type by hand for all of the parsers ?

@Marwes
Copy link
Owner

Marwes commented May 29, 2020

There are some combinators which boxes and hides the type of PartialState https://docs.rs/combine/4.2.1/combine/parser/combinator/fn.any_send_sync_partial_state.html https://docs.rs/combine/4.2.1/combine/parser/combinator/fn.any_partial_state.html .

There are also https://docs.rs/combine/4.2.1/combine/parser/combinator/fn.no_partial.html which makes PartialState = () which effectively disables partial parsing inside the wrapped parser. If you don't need to parse partial input that would be the best choice.

@zearen
Copy link
Author

zearen commented May 29, 2020

Oh yes, that's exactly what I needed, thanks ^.^

@zearen zearen closed this as completed May 29, 2020
@zearen
Copy link
Author

zearen commented May 31, 2020

Sorry, but this isn't quite finished yet. I can't figure out how to store parsers of &str, or rather, I can't figure out how to create one. As a simple example, consider:

use combine::{Parser, many, any};
use combine::parser::combinator::no_partial;

struct Service {
    parser: Box<dyn for<'s> Parser<&'s str, Output=String, PartialState=()>>
}

impl Service {
    fn new() -> Self {
        Service {
            parser: Box::new(no_partial(many(any()))),
        }
    }

    fn transcribe(&mut self, input: &str) -> String {
        match self.parser.parse(input) {
            Ok((out, rest)) => {
                assert!(rest.is_empty());
                out
            },
            Err(err) => panic!("Parse error: {}", err)
        }
    }
}

#[test]
fn store_in_struct() {
    let mut service = Service::new();
    assert_eq!(service.transcribe("blah"), String::from("blah"));
}

This fails to compile with:

error[E0271]: type mismatch resolving `for<'s> <combine::parser::combinator::NoPartial<combine::parser::repeat::Many<_, combine::parser::token::Any<&str>>> as combine::parser::Parser<&'s str>>::Output == std::string::String`
  --> tests/store_in_struct.rs:11:21
   |
11 |             parser: Box::new(no_partial(many(any()))),
   |                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected bound lifetime parameter 's, found concrete lifetime
   |
   = note: required for the cast to the object type `dyn for<'s> combine::parser::Parser<&'s str, Output = std::string::String, PartialState = ()>`

It does not compile because any wants a concrete lifetime for &str. My hypothesis is because this lifetime needs to be known when any() is constructed because it's in the type parameters of the struct. It's not actually used to store a reference, but Rust can't know that. I tried reducing the type requirements on the individual parsers and parser constructors, but was unable to remove all reference to the input type.

Adding the lifetime as a generic parameter to the Service struct doesn't work because the lifetime has to be arbitrarily small, i.e. we don't know it until we call .parse(). I'm slowly coming to the opinion it's just not possible, so I'm detaching the expensive state from the parser so the parse itself can be cheap to build. I thought a report might help out, though :)

@zearen zearen reopened this May 31, 2020
@stefnotch
Copy link

For the record, chumsky had a very similar issue. Through the magic of unsafe, chumsky actually has a workaround, albeit a rather clunky one.

zesterer/chumsky#501

@Marwes
Copy link
Owner

Marwes commented Jan 3, 2024

Seems like that solution boxes the parsers, thus causing dynamic dispatch. Depending on where this boxing/caching happens this could greatly regress performance (for instace, if you cache/box a "json" parser you end up doing dynamic dispatch when parsing every number/string/array/object etc which is going to be slow.

In general the parsers in combine are meant to be cheap/zero-cost to create so re-creating them whenever they are needed is not an issue. Only cases such as described here where you have a large amount of (dynamic?) data do we start seeing issues.

I don't know if chumsky actually solves the lifetime issue that lies at the end of this though?. You need to existentially quantify the lifetime to make the typecheck work (as in Box<dyn for<'s> Parser<&'s str, Output=String, PartialState=()>> but you can't do that with a concrete type, but maybe it works because chumsky boxes it 🤷

I guess I missed the last message but to sort of answer it now, I would look at caching and reusing the data used to generate the parsers, instead of caching and reusing the parsers themselves. I assume that this data do not refer to the input lifetime so storing that data in a struct would not cause that lifetime issue and the parsers can then be cheaply created from that data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants