Skip to content
This repository has been archived by the owner on May 11, 2020. It is now read-only.

WIP: add a wast format scanner+parser #50

Merged
merged 7 commits into from
Feb 27, 2020

Conversation

Spriithy
Copy link
Contributor

@Spriithy Spriithy commented Dec 21, 2017

Open to reviews and discussion. Expect bugs and unexpected behavior. This branch is not ready for merge (yet !).

As discussed in #34 with @sbinet here is a first take at a file scanner for .wast files. The implementation lacks proper unit tests and comments but I am open to suggestions and reviews.

The end goal of this feature is to be able to input .wast files to wagon and directly run wasm code. In a more usual context, wast files are mostly used for unit testing implementations of a wasm virtual machine and standard compliance.

This means that we could benchmark wagon against the wasm test suite.

I will continue working on this in my spare time. Again, feel free to comment and contribute. All help is welcome.

EDIT: to test the actual scanner, simply run go test in the /wast directory.


This change is Reviewable

@sbinet
Copy link
Contributor

sbinet commented Dec 22, 2017

Review status: 0 of 4 files reviewed at latest revision, 10 unresolved discussions.


wast/scanner.go, line 1 at r1 (raw file):

package wast

please add a license header.


wast/scanner.go, line 53 at r1 (raw file):

const (
	EofRune = -1

do these need to be exported?


wast/scanner.go, line 70 at r1 (raw file):

	}

	s.inBuf.UnreadRune() // rewind

shouldn't we always rewind? (ie: use a defer just after the call to s.inBuf.ReadRune)


wast/scanner.go, line 119 at r1 (raw file):

}

func (s *Scanner) NextToken() (token *Token) {

I would probably just name it Next()


wast/scanner.go, line 324 at r1 (raw file):

}

// *** // *** // *** // *** // *** // **** // *** // *** // *** // *** // *** //

please remove


wast/scanner.go, line 328 at r1 (raw file):

const (
	scanErrPrefix  = "\x1b[31merror:\x1b[0m "
	scanWarnPrefix = "\x1b[34mwarining:\x1b[0m "

s/warining/warning/

not sold on the fancy ANSI colors either :)


wast/scanner.go, line 344 at r1 (raw file):

}

// *** // *** // *** // *** // *** // **** // *** // *** // *** // *** // *** //

please remove.


wast/scanner.go, line 443 at r1 (raw file):

func isNat(s string) bool {
	if strings.HasPrefix(s, "0x") {

what about 0X ?


wast/scanner_test.go, line 1 at r1 (raw file):

package wast

please add a license header.


wast/token.go, line 1 at r1 (raw file):

package wast

please add a license header.


Comments from Reviewable

@sbinet
Copy link
Contributor

sbinet commented Dec 22, 2017

(just a first pass, I'll look at it in more details in a bit)


Review status: 0 of 4 files reviewed at latest revision, 10 unresolved discussions.


Comments from Reviewable

@Spriithy
Copy link
Contributor Author

Review status: 0 of 4 files reviewed at latest revision, 10 unresolved discussions.


wast/scanner.go, line 70 at r1 (raw file):

Previously, sbinet (Sebastien Binet) wrote…

shouldn't we always rewind? (ie: use a defer just after the call to s.inBuf.ReadRune)

This actually fixes some EOF corner cases. Thanks


wast/scanner.go, line 443 at r1 (raw file):

Previously, sbinet (Sebastien Binet) wrote…

what about 0X ?

The wasm standard actually doesn't specify any syntax for 0X. I just thought to stick to pure standard before adding extensions to the language.

Reference: spec interpreter


Comments from Reviewable

}

r, _, err := s.inBuf.ReadRune()
defer s.inBuf.UnreadRune() // rewind
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually... we may not even need a defer. do we?
we're just interested in r, not the state of s.inBuf after we've read the rune.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agreeing on this.

wast/scanner.go Outdated
}

r, n, err := s.inBuf.ReadRune()
if err == io.EOF {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would personally use a switch here:

switch err {
case io.EOF:
case nil:
default:
}

but YMMV :)

case '\'':
return "\\'"
case '"':
return "\\\""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would probably read more easily with backticks:

return `\"`

@@ -0,0 +1,37 @@
package wast
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing license header :)


var wastFile = flag.String("wast-file", "", "the wast file to test")

func TestScanner(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

travis is failing because no test file is given to this TestScanner test.
couldn't we give it a few test files?
we have some under exec/testdata/spec.

start simple :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's more suited to the situation / test case to provide a .wast directly in this package to modify and update as the scanner advances.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

@sbinet
Copy link
Contributor

sbinet commented Dec 31, 2017

It's starting to look good. perhaps time to start documenting a bit the exported API?

@Spriithy
Copy link
Contributor Author

There is one pretty caveat here. It might look good, but this only showcases a very limited part of the accepted wast syntax. For instance, integers and floating point numbers syntax are pretty complex (pretty high priority in TODO list). Some instructions such as f64.reinterpret/i64 need some context.

A question I can't seem to be able to chose any reasonable solution is: do we want to generate specific tokens for each instructions independently or do we want to have higher level token kinds that wrap a specific set of instructions ?

Take this (surely malformed) snippet:

(func "foo" (param $x i32) (result f32)
     i32.reinterpret/f32
    ....)

Whould the tokens be: LPAR, FUNC, STRING "foo", ..., RPAR, REINTERPRET "i32" "f32", ... or LPAR, FUNC, STRING "foo", ..., RPAR, I32_REINTERPRET_F32, .... That is, in the first case, the tokens of kind REINTERPRET carry additional information about their specificity (source type, ...) which has the benefice of providing a more modular scanner and parsing API but adds (unnecessary ?) complexity to type design and internal processing.

I will probably be able to work on this later in January.
What are your thoughts ?

@sbinet
Copy link
Contributor

sbinet commented Jan 6, 2018

I think I would go with the (IMHO) simple solution, ie: I32_REINTERPRET_F32

@sbinet
Copy link
Contributor

sbinet commented Aug 7, 2018

we now have a way to marshal wasm.Module to a wasm binary file.
it would be really great if someone could pick up the parsing of wa(s)t files, so we could synthesize a wasm.Module from that :)

@Spriithy?

@Spriithy
Copy link
Contributor Author

Spriithy commented Aug 7, 2018

Hi there, I am currently in holidays for the rest of the summer and starting a new position in early september. I can give a quick look maybe between the 21st-27th of August. It would be great if someone could pick up where I left off, unfortunately

@sbinet
Copy link
Contributor

sbinet commented Feb 26, 2020

needs go-interpreter/license#4

any status update?

@Spriithy
Copy link
Contributor Author

I'm afraid I won't have the time to work on this.

I liked your work on GoHEP, I hope the project will live on even after your recent annoucement.

@sbinet
Copy link
Contributor

sbinet commented Feb 27, 2020

No worries.
If you don't mind, I'll adopt that PR after having added you to the list of authors.

(I didn't realize you had a HEP background (or an eye towards it)... :) Thanks for your kind words about GoHEP)

@Spriithy
Copy link
Contributor Author

Sure, I might give a look back at this if I find the time some day.
Physics rocks ;)

sbinet and others added 6 commits February 27, 2020 19:30
…nto tokens

 - `tokens.go`: Defines the Token type generated by the scanner and several utility functions around it
 - `test.wast`: A .wast test file extracted from [wasm's spec tests](https://github.com/WebAssembly/spec/blob/master/test/core/names.wast)
 - `scanner.go`: Implements the actual scanner and utilty functions regarding runes
 - `scanner_test.go`: A dead simple test case against the `test.wast` file. It doesn't verify correctness yet.
Added some straightforward changes submitted by @sbinet, namely:

 - License headers
 - Renaming of exported function `NextToken` to `Next`
 - Defer rewind-ing in `Scanner.peek()` to work as expected in eof / error cases
 - Ascii color codes removal
 - Added several new token kind constants to list different typed instructions such as `i64.reinterpret/f64` or `i32.add`
 - Implemented the `(*Scanner).scanTypedReserved()` method to identify well and ill-formed typed instructions
 - Better Scanner test interface, the test suite now takes a `-wast-file` flag that must point to the `.wast` file to Scan and test against.

In order to test:
 - Create (or use a [`spec/test/core`](https://github.com/WebAssembly/spec/tree/master/test/core) ) wast file
 - Then run in the wast directory `go test -wast-file=<file-name>.wast`
@sbinet
Copy link
Contributor

sbinet commented Feb 27, 2020

Codecov Report

Merging #50 into master will decrease coverage by 0.90%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #50      +/-   ##
==========================================
- Coverage   70.24%   69.34%   -0.91%     
==========================================
  Files          49       51       +2     
  Lines        5499     5871     +372     
==========================================
+ Hits         3863     4071     +208     
- Misses       1283     1439     +156     
- Partials      353      361       +8     
Impacted Files Coverage Δ
wast/token.go 65.00% <0.00%> (ø)
wast/scanner.go 55.39% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4ca9570...b91d2df. Read the comment docs.

@codecov-io
Copy link

codecov-io commented Feb 27, 2020

Codecov Report

Merging #50 into master will decrease coverage by 0.89%.
The diff coverage is 56.14%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master      #50     +/-   ##
=========================================
- Coverage   70.24%   69.35%   -0.9%     
=========================================
  Files          49       51      +2     
  Lines        5499     5873    +374     
=========================================
+ Hits         3863     4073    +210     
- Misses       1283     1441    +158     
- Partials      353      359      +6
Impacted Files Coverage Δ
wast/write.go 70.15% <ø> (ø) ⬆️
wast/scanner.go 55.64% <55.64%> (ø)
wast/token.go 65% <65%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4ca9570...3f81fd9. Read the comment docs.

@sbinet sbinet merged commit 6803234 into go-interpreter:master Feb 27, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants