LOLA breaks when changing number of actions and/or states #6

jazzbob · 2018-10-11T14:03:46Z

I try to edit IPD to a setup with four actions. This yields a 4x4 payoff matrix and a 17-dimensional input, which breaks both LOLA and LOLA-DiCE implementations.

train_exact.py assumes that NUM_ACTIONS = 4 and NUM_STATES = 5 in the environment.
Isn't the number of states also depending on number of actions: NUM_STATES = NUM_ACTIONS ** 2 + 1?
As the payoff for agent 2 is simply the transposed payoff matrix: Does the game have to by symmetric? Or are different payoffs per agent possible in the current implementation?

jakobnicolaus · 2018-10-11T22:24:07Z

Thanks for the comment! If you send a pull-request with a fix we can include the generalisation to an arbitrary number of actions.

Also, the payoff matrix of the second agent is currently the transpose for convenience reasons. Feel free to change it to account for non-symmetric payoffs.

jazzbob · 2018-10-12T15:19:23Z

Thanks for your prompt answer. I'll send a PR when I get there. There may be a lack of time and/or ability, though. :)

jakobnicolaus · 2018-10-12T16:58:28Z

Great - let us know if you get stuck.

jazzbob · 2018-10-12T17:58:38Z

For DiCE, it seems like there is an issue in policy.py in lines 105 to 115. The logits created by Sonnet's BatchApply (line 106) use a default n_dims=2. Changing that yields an exception in the Sonnet code. I did not dig into that yet.

For now, I worked around that by changing the logits in policy.py. While this seems to work syntactically (i.e. no exceptions at runtime, at least simple baselines are learned), I wonder whether policy semantics are preserved currently. To be precise, I mostly wonder about line 109. What is happening here?

logits = tf.concat([logits, tf.zeros_like(logits)], -1)

Do you see an option to get the logits in a shape that represents the number of actions in the environment without setting n_dims for Sonnet's BatchApply?

On a side note, I removed the masking (lines 113 to 115), as in my case, all actions are available to agents at all times. Would this harm DiCE's semantics?

jakobnicolaus · 2018-10-23T12:53:16Z

Sorry for the delay on our end.

It would be good to keep the masking in the code, since our environments in principle support this.
You only need one parameter to parameterise a binary random variable, but the sampling expects 2 logits for the different options. That's what the line is for you are referring to.

I think the easiest way to go about all of this is to make the weights of size: #of actions x #of states.
Does that make sense?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LOLA breaks when changing number of actions and/or states #6

LOLA breaks when changing number of actions and/or states #6

jazzbob commented Oct 11, 2018 •

edited

Loading

jakobnicolaus commented Oct 11, 2018 •

edited

Loading

jazzbob commented Oct 12, 2018 •

edited

Loading

jakobnicolaus commented Oct 12, 2018

jazzbob commented Oct 12, 2018 •

edited

Loading

jakobnicolaus commented Oct 23, 2018

LOLA breaks when changing number of actions and/or states #6

LOLA breaks when changing number of actions and/or states #6

Comments

jazzbob commented Oct 11, 2018 • edited Loading

jakobnicolaus commented Oct 11, 2018 • edited Loading

jazzbob commented Oct 12, 2018 • edited Loading

jakobnicolaus commented Oct 12, 2018

jazzbob commented Oct 12, 2018 • edited Loading

jakobnicolaus commented Oct 23, 2018

jazzbob commented Oct 11, 2018 •

edited

Loading

jakobnicolaus commented Oct 11, 2018 •

edited

Loading

jazzbob commented Oct 12, 2018 •

edited

Loading

jazzbob commented Oct 12, 2018 •

edited

Loading