Created parsing dev doc

carymrobbins · Oct 2, 2017 · ef05f53 · ef05f53
1 parent 7b63749
commit ef05f53
Showing 1 changed file with 214 additions and 0 deletions.
diff --git a/docs/dev/parsing.md b/docs/dev/parsing.md
@@ -0,0 +1,214 @@
+# Parsing
+
+HaskForce implements of lexers and parsers compatible with IntelliJ's API to provide
+various syntax support features. We'll start with a light introduction to some of these
+concepts, but this nowhere near an exhaustive resource to learn about parsing.
+
+Be sure to also read the [official IntelliJ documentation on implementing parsers](http://www.jetbrains.org/intellij/sdk/docs/reference_guide/custom_language_support/implementing_parser_and_psi.html?search=pars).
+
+## Introduction
+
+Let's start with the basics. The two main stages required in order to parse source code are
+the Lexer and the Parser.
+
+## Lexers
+
+Lexers break up the source code into a sequence of tokens. This is then analyzed by some consumer.
+In most cases this is generally a parser which will build an abstract syntax tree from the tokens to
+later be used for analyses; however, a basic syntax highlighter doesn't actually require a parse tree
+and can simply highlight source code based on the tokens alone.
+
+To reiterate, IntelliJ uses lexers in two cases -
+* [Syntax highlighting](#syntax-highlighting)
+* [Parsers](#parsers)
+
+You _can_ use the same lexer for both syntax highlighting and parsing; however, the rules in your parsing lexer
+may be more complicated than the syntax highlighter requires, so instead it is often advantageous and more
+performant to have a simpler lexer for syntax highlighting and a more complex one for parsing.
+
+The most common way to build a lexer is to use [JFlex](http://jflex.de/). Here are our lexer implementations -
+
+* Syntax highlighting lexers -
+    * [_HaskellSyntaxHighlightingLexer.flex](/src/com/haskforce/highlighting/_HaskellSyntaxHighlightingLexer.flex)
+    * [_CabalSyntaxHighlightingLexer.flex](/src/com/haskforce/cabal/highlighting/_CabalSyntaxHighlightingLexer.flex)
+    * [_HamletSyntaxHighlightingLexer.flex](/src/com/haskforce/yesod/shakespeare/hamlet/highlighting/_HamletSyntaxHighlightingLexer.flex)
+* Parsing lexers -
+    * [_HaskellParsingLexer.flex](/src/com/haskforce/parsing/_HaskellParsingLexer.flex)
+    * [_CabalParsingLexer.flex](/src/com/haskforce/cabal/lang/lexer/_CabalParsingLexer.flex)
+
+This repo currently contains a patched version of the JFlex jar (which comes with the IntelliJ JFlex Support plugin)
+in the project root to simplify lexer generation and build reproducibility.
+
+We have a script located at [tools/run-jflex](/tools/run-jflex)
+which will generate the lexers from the `.flex` files, producing
+Java sources (with the same file name, only a `.java` instead of `.flex`). You can also use
+`tools/run-jflex clean` to remove the generated Java files. Our `build.gradle` leverages
+this script, so `./gradlew clean` and `./gradlew assemble` both work as expected, cleaning and generating
+the lexer sources, respectively.
+
+JFlex generated lexers -
+* will implement `com.intellij.lexer.FlexLexer`
+* will be passed to a `com.intellij.lexer.FlexAdapter`
+
+The `FlexAdapter` implements `com.intellij.lexer.Lexer` so that a JFlex lexer can be used as an IntelliJ
+lexer.
+
+### Lexer tokens
+
+Lexer tokens must be of type `com.intellij.psi.tree.IElementType`. In general, it's a good idea to keep
+all of the related tokens for a language in the same file. The strategy employed in HaskForce is to
+use a normal Java interface with defined fields. These will then be accessible statically. This is the
+same approach the [Grammar Kit](#grammar-kit) uses.
+
+Here are our token types -
+
+* [HaskellTypes.java](/gen/com/haskforce/psi/HaskellTypes.java) (note that this is
+    generated by [Grammar Kit](#grammar-kit)).
+* [CabalTypes.java](/src/com/haskforce/cabal/lang/psi/CabalTypes.java)
+* [HamletTypes.java](/src/com/haskforce/yesod/shakespeare/hamlet/psi/HamletTypes.java)
+
+## Syntax Highlighting
+
+In general, implementing a syntax highlighter requires -
+
+* Creating an implementation of `com.intellij.openapi.fileTypes.SyntaxHighlighterBase` which returns
+    a `Lexer` from `getHighlightingLexer`
+* Having that implementation returned from the `getSyntaxHighlighter` method of a
+    `com.intellij.openapi.fileTypes.SyntaxHighlighterFactory`
+* Registering the factory in plugin.xml using `lang.syntaxHighlighterFactory` extension point
+
+Here are our syntax highlighter factories -
+
+* [HaskellSyntaxHighlighterFactory.java](/src/com/haskforce/highlighting/HaskellSyntaxHighlighterFactory.java)
+* [CabalSyntaxHighlighterFactory.java](/src/com/haskforce/cabal/highlighting/CabalSyntaxHighlighterFactory.java)
+* [HamletSyntaxHighlighterFactory.java](/src/com/haskforce/yesod/shakespeare/hamlet/highlighting/HamletSyntaxHighlighterFactory.java)
+
+And their respective syntax highlighters -
+
+* [HaskellSyntaxHighlighter.java](/src/com/haskforce/highlighting/HaskellSyntaxHighlighter.java)
+* [CabalSyntaxHighlighter.java](/src/com/haskforce/cabal/highlighting/CabalSyntaxHighlighter.java)
+* [HamletSyntaxHighlighter.java](/src/com/haskforce/yesod/shakespeare/hamlet/highlighting/HamletSyntaxHighlighter.java)
+
+So the general hierarchy required to build a functional syntax highlighter looks something like -
+
+* `syntaxHighlighterFactory` extension point in `plugin.xml`
+    * `SyntaxHighlighterFactory`
+        * `SyntaxHighlighter`
+            * `Lexer`
+
+From there, if you require more customization of syntax highlighting (which you probably will)
+see [Annotators](#annotators).
+
+### Annotators
+
+Annotators provide more complex syntax highlighting and annotations (e.g. intentions) by implementing the
+`com.intellij.lang.annotation.Annotator` interface. You will need to register your implementation
+with the `annotator` extension point in `plugin.xml`.
+
+There is a big warning in the Javadoc for `Annotator`, so keep this in mind when implementing one -
+
+```
+ * DO NOT STORE any state inside annotator.
+ * If you absolutely must, clear the state upon exit from the {@link #annotate(PsiElement, AnnotationHolder)} method.
+```
+
+Annotators receive elements from the actual parse tree, so this comes downstream from the [parser](#parsers),
+not the syntax highlighter. This is how we're able to leverage it to make decisions about more intelligent
+highlighting or providing quick fixes via source analysis.
+
+Here are our annotators -
+
+* [HaskellAnnotator.java](/src/com/haskforce/highlighting/HaskellAnnotator.java)
+* [CabalAnnotator.scala](/src/com/haskforce/cabal/highlighting/CabalAnnotator.scala)
+
+See the `setHighlighting` method of those implementations for how to provide 
+
+## Parsers
+
+**NOTE:** When debugging problems with a parser, be sure to first check the lexer. Many parsing bugs
+are the fault of the lexer not providing the appropriate layout to the parser. This is specifically the
+case with whitespace-sensitive layout languages, like Haskell and Cabal. The indentation rules for Haskell
+are particularly tricky, so if you are debugging a problem with parsing the layout of a source file, start
+at the lexer and only move on to the parser once you've confirmed the lexer is working properly.
+
+In order to implement an IntelliJ parser for a language, you will need to -
+
+* Implement a `com.intellij.lang.PsiParser`
+* Implement a `com.intellij.lang.ParserDefinition`, returning your `Lexer` and `PsiParser` from
+    the `createLexer` and `createParser` methods, respectively
+* Register the `ParserDefinition` in `plugin.xml` with the `parserDefinition` extension point
+
+So the general hierarchy required to build a functional parser looks something like -
+
+* `parserDefinition` extension point in `plugin.xml`
+    * `ParserDefinition`
+        * `Lexer` (see [Lexers](#lexers))
+        * `PsiParser`
+
+We currently have the following parsers -
+
+* [HaskellParser.java](/gen/com/haskforce/parser/HaskellParser.java) (generated by [Grammar Kit](#grammar-kit))
+    * [HaskellParserWrapper.java](/src/com/haskforce/psi/HaskellParserWrapper.java) - This extends the generated
+        `HaskellParser` with some remapping and hacks of the tokens before they get fed to the `HaskellParser`.
+* [CabalParser.scala](/src/com/haskforce/cabal/lang/parser/CabalParser.scala) - A hand-written `PsiParser`.
+* [HamletParser.java](/src/com/haskforce/yesod/shakespeare/hamlet/HamletParser.java) - Extends a dummy
+    `SimplePsiParser` which simply returns a flat tree from the lexer tokens; needed for proper Hamlet language
+    support.
+
+Hand writing the parser seems to yield code that is much easier to read, reason about, debug, etc.
+We used [Grammar Kit](#grammar-kit) for implementing `HaskellParser`; however, we are looking to re-implement
+the parser to avoid bugs that prevent other features from working well.
+See [issue #233](https://github.com/carymrobbins/intellij-haskforce/issues/233).
+
+### Grammar Kit
+
+We generate the `HaskellParser` using a [Haskell.bnf](/src/com/haskforce/Haskell.bnf). To make changes to the
+parser, you _must_ have the Grammar Kit plugin installed. See the
+[Developing](https://github.com/carymrobbins/intellij-haskforce#developing)
+section on our main README for the appropriate Grammar Kit plugin version to use.
+
+Once you've installed the Grammar Kit plugin, you can update the parser via the following process -
+
+* Edit the `Haskell.bnf` file with your changes
+* Delete the `gen/` directory, which contains the generated parser code
+* Use `Tools > Generate Parser Code` (or its configured shortcut) to generate the new parser in `gen/`
+
+## Testing
+
+When adding or fixing functionality in a lexer or parser, test cases should be added to
+demonstrate the new behavior as well as help to prevent future regressions.
+
+Tests should be written to include the following -
+    * Syntax highlighting lexer
+    * Parsing lexer
+    * Parser
+
+For instance, we use the following test classes for testing the Haskell lexers and parser -
+* `HaskellLexerTest` - Tests for the syntax highlighting lexer
+* `HaskellParsingLexerTest` - Tests for the parsing lexer
+* `HaskellParserTest` - Tests for the parser
+
+The strategy is usually to -
+1. Create a fixture source file to be consumed by the lexer and parser
+2. Create a method in the test classes referencing that file.
+
+For instance, for testing Haskell arrow syntax, we have -
+* A fixture source file at `tests/gold/parser/Arrow00001.hs`
+* A `testArrow00001()` method defined in all 3 Haskell lexer/parser test classes.
+
+The first time the test is run it will fail with something like the following -
+
+```
+junit.framework.AssertionFailedError: No output text found. File tests/gold/parser/expected/Arrow00001.txt created.
+```
+
+Inspect the newly created `.txt` file. For lexers it will be a sequence of tokens, one per line.
+For parsers it will be a tree of tokens representing the tree produced by the parser. Errors
+may be present in the resulting `.txt` file, it is up to you to confirm whether you are expecting
+errors or not (in many cases, you want your parser to produce an error and report it to the user
+for improper syntax).
+
+Note that you will need to repeat this process for each lexer and parser test.
+
+Once you have confirmed you are getting the output that is desired, you can add and commit the
+resulting `.txt` files, fixture source file, and changes to test classes.