Why is the input to the NN structured differently from original alphaGo zero? #20

barrybecker4 · 2019-04-12T13:02:42Z

I noticed that the way that the game state is sent to the input of the NN (see ZeroEncoder) is different from what is described here.
In the cheat sheet, there are 14 layers that represent the positions of the black and white stones for the last seven moves (7 for black + 7 for white). In ScalphaGoZero, there are 8 layers that represent the stones with different numbers of liberties (4 for black stones with liberties 1, 2, 3, 4+ respectively, and same for white). Also, ScalphaGoZer has a layer for illegal ko moves that is not indicated by the cheat sheet.

I can see how both approaches might be useful. Which of these approaches gives better results? Which of these approaches matches the original Deepmind implementation more closely?

* initial commit. * Support board sizes other than 19x19 (#5) Now just pass boardSize to GoBoard and other places since the board is always square. Allow passing the size to the DL builder so that any size board can be supported. * Pring board while playing (#4)(#1) Added board serializer and now print the board while running the trials to give feedback about what is happening. Avoid NoSuchElementException in ZeroTreenode when the branch does not contain the move (a pass or resign). In ZeroAgent.selectBranch return pass if there are no valid moves. Use "diagonals" instead of "corners" to refer to the diagonal placements from a point on the grid. Add GoString tests. * Pring board while playing, and don't use hardcoded Zobrist hashes (#3)(#2) Simplified Zobrist hash. Fixed board serialization. * Refactor out NeighborTables to a separate class. * Make GoBoard be immutable and use immutable collections internally. This prevents side effects when calling selectMove in the agent from adding additional moves. * Add a human agent so that a human can play against the trained AI (#6) Refactor TerritoryCalculator out to separate class. Fix bug/typo in GameResult that was causing the wrong final score to be presented. Use immutable collections in ZeroAgent and ZeroTreeNode. Show more info about the final game result. Refactor out class for retrieving user input. * Fix bug in removeString where liberties where not being added to adjacent opponent strings corresponding to the now removed stones. Add some tests for this case. Avoid redundant use of "this". * Use Point instead of tuple2 everywhere to help simplify the code and also speak the language of the domain. * Upgrade to dl4j-beta3 (from beta-2) * Fix bug in ko detection and add test. Fix the scoring to account for captures and add tests. Have GoBoard retain the number of captures for each player. Remove the equals method from GoBoard. It is an immutable case class, so does not need a custom equals method. Refactor the grid out to from GoBoard into separate class. Prompt for board size. Enable tests. Tests pass now that upgraded to dl4j-beta3. * Add tests for ZeroAgent encoding and decoding. Make GameState an immutable case class. Removed unused method to generate legalMoves. * Request size of board, number of layers, and number of episodes when running the main program to test. Make axis coordinate labeling consistent with the ML for Go book. * Allow saving and restoring the saved model. When selecting the next move, do it with a skewed probability distribution - otherwise the same sequence of moves is selected every time! * Fix bug in ZeroEncoder, and added tests (#12) Make ZeroTreeNode a case class. * in GameState make allPreviousState part of the immutable state and remove the equals method (#13) Small change to willCapture to make it easier to step into in debugger. * Make ZeroTreeNode a regular class again because it has mutable state. Split out method for recordVisitCounts. Add some debug info. * Fix zeroEncoder test to that it correctly recognizes the ko when encoded. * Fix bug where ancestor counts were not correctly update (#15) Add toString impl for Branch. Add flag to turn on debug prints. In ZeroAgent, select randomly from among all moves with most visits since there is often more than one. Add a version of constructor for Play that accepts to Ints. This simplifies construction in many places. Add tests for ZeroTreeNode, ZeroSimulation, and ZeroAgent. * Don't allow filling in own eyes during training (#17) This simple optimization should save a lot of training effort. * Do training in mini-batches to avoid running out of memory if the number of episodes is large (#18). * Simplify ZeroEncoder tests. * Rename KerasModel to KerasModelImporter so that the name of the class matches what it does. * Implement Monte Carlo playouts (#20) Instead of just using the value from the NN, do MC playouts at the leaves during exploration. Fixed bug in GameState, where the winner was returning the reverse of what it was supposed to. Fix tests. * Merge with upstream. Fix the model_size_5_layers_2_test.model file at a specific point so that the tests do not start failing as that model continues to evolve with more training. * Fix failing test when running from command line. * Comment tests that fail on build server, but not locally.

Instead of just using the value from the NN, do MC playouts at the leaves during exploration. Fixed bug in GameState, where the winner was returning the reverse of what it was supposed to. Fix tests.

barrybecker4 changed the title ~~Why is the input to the to the NN structured differently from original alphaGo zero?~~ Why is the input to the NN structured differently from original alphaGo zero? Apr 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why is the input to the NN structured differently from original alphaGo zero? #20

Why is the input to the NN structured differently from original alphaGo zero? #20

barrybecker4 commented Apr 12, 2019

Why is the input to the NN structured differently from original alphaGo zero? #20

Why is the input to the NN structured differently from original alphaGo zero? #20

Comments

barrybecker4 commented Apr 12, 2019