-
-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add python/cython benchmark for the lark parser #1
base: master
Are you sure you want to change the base?
Conversation
Nice graph! It could be a real improvement to what https://github.com/goodmami/python-parsing-benchmarks has right now.
I think the same. If I had time to work on it, I would probably focus on one of two areas:
There is this solution: https://hypothesis.readthedocs.io/en/latest/extras.html?highlight=lark#hypothesis-lark I haven't tried it myself yet. |
P.S. I think the "gold standard" might be to benchmark parsing Python. It's pretty complicated. The Lark grammar for it already exists. And, there's plenty of "test data" for it. |
I attempted to use hypothesis to generate random strings for the grammar. I got it to "somewhat" work, but it was unwieldy and unclear how to generate random strings with different specified sizes (or how to request that a string should be larger or shorter --- I haven't generated from a CFG before, so I'm not familiar with what the exact process is). I saved the patch that adds, this but did point it here as I think it makes the PR worse overall: It's at least a proof of concept. The graphs it generates are much uglier though, and I can't figure out how to get hypothesis to generate long strings. I'm stuck with the set of 23 examples it generates for me. |
Generating strings from a grammar is not very difficult conceptually. You can generate every variation by recursing into rules and generating every variation of the terminals. (it does require generating text for regexps, but I think there are libraries that do that already). It's a long process, but if you do it with BFS, and maybe add a little randomness, maybe you can generate reasonable samples. Re the benchmarks, it's a little suspicious that short inputs take much longer than longer ones, no? |
The x axis is slightly different in this plot. Previously it was the number of iterations I expanded the input for. In this one it is just the number of characters hypothesis generated. The inputs it generated were:
so I imagine overhead time and differences between what type of tokens are being generated is dominating for the smallest inputs, so differences don't become apparent until you get to the larger inputs (which are still reasonably small). Increasing the number of timerit iterations and plotting on a log scale yields this: So overall, it's not that suspicious given the input data its working with. And the most structured example in this PR does cleanly show the expected increasing behavior. I might take a shot at writing a better generator for a grammar, as it would be useful in my own testing. But it's going to be lower on my priority list. |
If you don't mind me bringing it up again, why did you choose not to benchmark parsing Python? |
Benchmarking Python was what I originally tried, but I encountered an error. Taking a quick look again, I see it is because I specified the wrong rule as the start. At the time I just moved to something simpler to get the initial test working. My original goal was to write a benchmark where the grammar can be swapped in/out interchangeably. The benchmark file derives from a benchmark template I've been working on, so most of the code in the PR is actually boilerplate with just a little bit specifying how to generate inputs, what the parameter grid is, and how to run the code to benchmark. Also, I did try to pass the parser for the Python grammar into hypothesis but got:
So work might need to be done to get that up to speed. I'm not a big fan of the hypothesis API so far, so I wonder how much of that is my unfamiliarity with the topic versus the API design itself. Like you said I would have though generating from a grammar would be more straightforward, but either there is a complexity I'm unaware of, or there could be a nicer API to handle it (hypothesis seems like it is built to handle much more than just generating form grammars). |
My sense is that it would be very easy to find a big bunch of Python snippets (probably there are existing collections), and then just sort them by size. I don't think you need to generate anything yourself. Anyway, I admit I haven't had a chance to use Hypothesis yet. And as for the Lark plugin, even its author admits it is not as good as it could be. |
Did you make any progress on this? Just curious |
Probably won't have time to do any more work on this anytime soon. |
@erezsh I appreciate you pointing me to this repo earlier.
I was able to demonstrate about a 2x speedup for my specific DSL, but I was curious what that speedup was more generally, and I as also looking for a nice way to quantify if any chances to the Cython code would provide further speedups.
I took a peek at the Cython code and I think it is ripe for optimization. Python types are being used everywhere, and if we were able to refactor those with pure C types, then I think there could be a large speed gain. Unfortunately, I'm not the greatest Cython coder, so I wasn't able to make any simple changes that did anything.
However, I did write a reproducible benchmark to quantify the speedup between CPython-lark, and Cython-Lark over a range of input sizes.
The timing module I use is timerit (which I wrote). It works similarly to
timeit
, but can work inline in existing code. I also use pandas, matplotlib, and seaborn to generate a nice figure showing the speedup over different input sizes.The stdout of the script is:
This is also benchmarked against the lark.lark grammar that ships with lark. I have a helper function to generate a "random" (not really random, but it is simple) lark file to pass to the parser. One question I had was: Is there a way to use lark to generate a string from a grammar? If so that would make bench-marking more complex grammars much easier.
Anyways, I hope this is helpful. Thanks again for this library!