Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coin Game #7

Open
LUKELIEM opened this issue Nov 5, 2018 · 7 comments
Open

Coin Game #7

LUKELIEM opened this issue Nov 5, 2018 · 7 comments

Comments

@LUKELIEM
Copy link

LUKELIEM commented Nov 5, 2018

Can you suggest a sample command line to run Coin Game?

I tried running just:

python scripts/run_lola.py --exp_name=CoinGame --no-exact

and it seems to be updating parameters and using up all the CPUs and not showing any indication what the progress is.

Logging to logs/CoinGame/seed-0
values (600000, 240)
main0/input_proc/Conv/weights:0 (3, 3, 3, 20)
main0/input_proc/Conv/BatchNorm/beta:0 (20,)
main0/input_proc/Conv_1/weights:0 (3, 3, 20, 20)
main0/input_proc/Conv_1/BatchNorm/beta:0 (20,)
main0/input_proc/fully_connected/weights:0 (240, 1)
main0/input_proc/fully_connected/biases:0 (1,)
main0/rnn/wx:0 (240, 128)
main0/rnn/wh:0 (32, 128)
main0/rnn/b:0 (128,)
main0/fully_connected/weights:0 (32, 4)
main0/fully_connected/biases:0 (4,)
values (4000, 240)
main0/input_proc/Conv/weights:0 (3, 3, 3, 20)
main0/input_proc/Conv/BatchNorm/beta:0 (20,)
main0/input_proc/Conv_1/weights:0 (3, 3, 20, 20)
main0/input_proc/Conv_1/BatchNorm/beta:0 (20,)
main0/input_proc/fully_connected/weights:0 (240, 1)
main0/input_proc/fully_connected/biases:0 (1,)
main0/rnn/wx:0 (240, 128)
main0/rnn/wh:0 (32, 128)
main0/rnn/b:0 (128,)
main0/fully_connected/weights:0 (32, 4)
main0/fully_connected/biases:0 (4,)
values (600000, 240)
main1/input_proc/Conv/weights:0 (3, 3, 3, 20)
main1/input_proc/Conv/BatchNorm/beta:0 (20,)
main1/input_proc/Conv_1/weights:0 (3, 3, 20, 20)
main1/input_proc/Conv_1/BatchNorm/beta:0 (20,)
main1/input_proc/fully_connected/weights:0 (240, 1)
main1/input_proc/fully_connected/biases:0 (1,)
main1/rnn/wx:0 (240, 128)
main1/rnn/wh:0 (32, 128)
main1/rnn/b:0 (128,)
main1/fully_connected/weights:0 (32, 4)
main1/fully_connected/biases:0 (4,)
values (4000, 240)
main1/input_proc/Conv/weights:0 (3, 3, 3, 20)
main1/input_proc/Conv/BatchNorm/beta:0 (20,)
main1/input_proc/Conv_1/weights:0 (3, 3, 20, 20)
main1/input_proc/Conv_1/BatchNorm/beta:0 (20,)
main1/input_proc/fully_connected/weights:0 (240, 1)
main1/input_proc/fully_connected/biases:0 (1,)
main1/rnn/wx:0 (240, 128)
main1/rnn/wh:0 (32, 128)
main1/rnn/b:0 (128,)
main1/fully_connected/weights:0 (32, 4)
main1/fully_connected/biases:0 (4,)
2018-11-04 16:36:10.603357: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
update params
update params
update params
update params
^C
Aborted!

@alshedivat
Copy link
Owner

That's expected behavior. CoinGame takes a while to run. The code logs stats every 20 updates.

@LUKELIEM
Copy link
Author

LUKELIEM commented Nov 5, 2018

Does the code make use of GPU? Since the policy network is an RNN, I suppose it will not help much. How long will it run typically?

Intel® Core™ i7-7700K CPU @ 4.20GHz × 8

@LUKELIEM
Copy link
Author

LUKELIEM commented Nov 5, 2018

There also seems to be some discrepancy about the reward structure of the Coin Game in your code versus that described in the paper:

  • Specifically, if the coin is red, it appears that if the red agent were to grab it, then it will get 1 point regardless of what blue agent does.
  • Conversely, if the coin is blue, if the reg agent were to grab it, it gets 1 point but the blue agent will get -2 point even if it also grabs the coin in the same move.

Am I reading the code correctly, or am I missing something?

    # Compute rewards
    reward_red, reward_blue = [], []
    for i in range(self.batch_size):
        generate = False
        if self.red_coin[i]:
            # If the coin is red,
            if self._same_pos(self.red_pos[i], self.coin_pos[i]):
                # If red agent grabs the coin (regardless what blue agent does):
                #    red gets +1, blue gets 0
                generate = True
                reward_red.append(1)
                reward_blue.append(0)
            elif self._same_pos(self.blue_pos[i], self.coin_pos[i]):
                # If blue agent grabs the coin, but red agent does not:
                #    blue gets +1, red gets -2
                generate = True
                reward_red.append(-2)
                reward_blue.append(1)
            else:
                # In all other cases
                #    both blue and red get 0
                reward_red.append(0)
                reward_blue.append(0)

        else:
            # If the coin is blue,
            if self._same_pos(self.red_pos[i], self.coin_pos[i]):
                # If red agent grabs the coin (regardless what blue agent does):
                #    red gets +1, blue gets -2
                generate = True
                reward_red.append(1)
                reward_blue.append(-2)
            elif self._same_pos(self.blue_pos[i], self.coin_pos[i]):
                # If blue agents grabs the coin, but red agent does not:
                #    blue gets +1, red gets 0                    
                generate = True
                reward_red.append(0)
                reward_blue.append(1)
            else:
                # In all other cases
                #    both blue and red get 0
                reward_red.append(0)
                reward_blue.append(0)

        if generate:
            # Regenerate a coin if an agent has grabbed the coin
            self._generate_coin(i)

@alshedivat
Copy link
Owner

alshedivat commented Nov 5, 2018

@LUKELIEM, our original experiments took a few days (someone independently reproduced our results using this codebase during the summer).

Re: your comment about potential bias in rewards, I believe #5 must have fixed it.

@LUKELIEM
Copy link
Author

LUKELIEM commented Nov 5, 2018 via email

@alshedivat
Copy link
Owner

I see. I believe the fixed version of coin game is in lola_dice/envs/coin_game.py.

We should've reconciled environments in lola_dice/envs and lola/envs from the beginning, but never got around to it. A contribution would be very much welcome!

@LUKELIEM
Copy link
Author

LUKELIEM commented Nov 5, 2018

Thanks, you are right. It has been fixed in lola_dice/envs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants