Coin Game #7

LUKELIEM · 2018-11-05T00:44:04Z

Can you suggest a sample command line to run Coin Game?

I tried running just:

python scripts/run_lola.py --exp_name=CoinGame --no-exact

and it seems to be updating parameters and using up all the CPUs and not showing any indication what the progress is.

Logging to logs/CoinGame/seed-0
values (600000, 240)
main0/input_proc/Conv/weights:0 (3, 3, 3, 20)
main0/input_proc/Conv/BatchNorm/beta:0 (20,)
main0/input_proc/Conv_1/weights:0 (3, 3, 20, 20)
main0/input_proc/Conv_1/BatchNorm/beta:0 (20,)
main0/input_proc/fully_connected/weights:0 (240, 1)
main0/input_proc/fully_connected/biases:0 (1,)
main0/rnn/wx:0 (240, 128)
main0/rnn/wh:0 (32, 128)
main0/rnn/b:0 (128,)
main0/fully_connected/weights:0 (32, 4)
main0/fully_connected/biases:0 (4,)
values (4000, 240)
main0/input_proc/Conv/weights:0 (3, 3, 3, 20)
main0/input_proc/Conv/BatchNorm/beta:0 (20,)
main0/input_proc/Conv_1/weights:0 (3, 3, 20, 20)
main0/input_proc/Conv_1/BatchNorm/beta:0 (20,)
main0/input_proc/fully_connected/weights:0 (240, 1)
main0/input_proc/fully_connected/biases:0 (1,)
main0/rnn/wx:0 (240, 128)
main0/rnn/wh:0 (32, 128)
main0/rnn/b:0 (128,)
main0/fully_connected/weights:0 (32, 4)
main0/fully_connected/biases:0 (4,)
values (600000, 240)
main1/input_proc/Conv/weights:0 (3, 3, 3, 20)
main1/input_proc/Conv/BatchNorm/beta:0 (20,)
main1/input_proc/Conv_1/weights:0 (3, 3, 20, 20)
main1/input_proc/Conv_1/BatchNorm/beta:0 (20,)
main1/input_proc/fully_connected/weights:0 (240, 1)
main1/input_proc/fully_connected/biases:0 (1,)
main1/rnn/wx:0 (240, 128)
main1/rnn/wh:0 (32, 128)
main1/rnn/b:0 (128,)
main1/fully_connected/weights:0 (32, 4)
main1/fully_connected/biases:0 (4,)
values (4000, 240)
main1/input_proc/Conv/weights:0 (3, 3, 3, 20)
main1/input_proc/Conv/BatchNorm/beta:0 (20,)
main1/input_proc/Conv_1/weights:0 (3, 3, 20, 20)
main1/input_proc/Conv_1/BatchNorm/beta:0 (20,)
main1/input_proc/fully_connected/weights:0 (240, 1)
main1/input_proc/fully_connected/biases:0 (1,)
main1/rnn/wx:0 (240, 128)
main1/rnn/wh:0 (32, 128)
main1/rnn/b:0 (128,)
main1/fully_connected/weights:0 (32, 4)
main1/fully_connected/biases:0 (4,)
2018-11-04 16:36:10.603357: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
update params
update params
update params
update params
^C
Aborted!

alshedivat · 2018-11-05T00:56:06Z

That's expected behavior. CoinGame takes a while to run. The code logs stats every 20 updates.

LUKELIEM · 2018-11-05T01:20:02Z

Does the code make use of GPU? Since the policy network is an RNN, I suppose it will not help much. How long will it run typically?

Intel® Core™ i7-7700K CPU @ 4.20GHz × 8

LUKELIEM · 2018-11-05T01:44:14Z

There also seems to be some discrepancy about the reward structure of the Coin Game in your code versus that described in the paper:

Specifically, if the coin is red, it appears that if the red agent were to grab it, then it will get 1 point regardless of what blue agent does.
Conversely, if the coin is blue, if the reg agent were to grab it, it gets 1 point but the blue agent will get -2 point even if it also grabs the coin in the same move.

Am I reading the code correctly, or am I missing something?

    # Compute rewards
    reward_red, reward_blue = [], []
    for i in range(self.batch_size):
        generate = False
        if self.red_coin[i]:
            # If the coin is red,
            if self._same_pos(self.red_pos[i], self.coin_pos[i]):
                # If red agent grabs the coin (regardless what blue agent does):
                #    red gets +1, blue gets 0
                generate = True
                reward_red.append(1)
                reward_blue.append(0)
            elif self._same_pos(self.blue_pos[i], self.coin_pos[i]):
                # If blue agent grabs the coin, but red agent does not:
                #    blue gets +1, red gets -2
                generate = True
                reward_red.append(-2)
                reward_blue.append(1)
            else:
                # In all other cases
                #    both blue and red get 0
                reward_red.append(0)
                reward_blue.append(0)

        else:
            # If the coin is blue,
            if self._same_pos(self.red_pos[i], self.coin_pos[i]):
                # If red agent grabs the coin (regardless what blue agent does):
                #    red gets +1, blue gets -2
                generate = True
                reward_red.append(1)
                reward_blue.append(-2)
            elif self._same_pos(self.blue_pos[i], self.coin_pos[i]):
                # If blue agents grabs the coin, but red agent does not:
                #    blue gets +1, red gets 0                    
                generate = True
                reward_red.append(0)
                reward_blue.append(1)
            else:
                # In all other cases
                #    both blue and red get 0
                reward_red.append(0)
                reward_blue.append(0)

        if generate:
            # Regenerate a coin if an agent has grabbed the coin
            self._generate_coin(i)

alshedivat · 2018-11-05T02:27:38Z

@LUKELIEM, our original experiments took a few days (someone independently reproduced our results using this codebase during the summer).

Re: your comment about potential bias in rewards, I believe #5 must have fixed it.

LUKELIEM · 2018-11-05T02:54:15Z

Your code coin_game.py on GitHub is still the same code with the same issue.

…

On Sun, Nov 4, 2018 at 6:27 PM Maruan ***@***.***> wrote: @LUKELIEM <https://github.com/LUKELIEM>, our original experiments took a few days (someone independently reproduced our results using this code base during the summer). Re: your comment about potential bias in rewards, I believe #5 <#5> must have fixed it. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#7 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AJ-SbfCwpCbHeOYM0kklILrr9K8k4rXUks5ur6IbgaJpZM4YNm0u> .

alshedivat · 2018-11-05T03:01:23Z

I see. I believe the fixed version of coin game is in lola_dice/envs/coin_game.py.

We should've reconciled environments in lola_dice/envs and lola/envs from the beginning, but never got around to it. A contribution would be very much welcome!

LUKELIEM · 2018-11-05T20:34:01Z

Thanks, you are right. It has been fixed in lola_dice/envs

alshedivat mentioned this issue Apr 4, 2020

Player blue and red are not currently symmetrical #9

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Coin Game #7

Coin Game #7

LUKELIEM commented Nov 5, 2018 •

edited

Loading

alshedivat commented Nov 5, 2018

LUKELIEM commented Nov 5, 2018

LUKELIEM commented Nov 5, 2018

alshedivat commented Nov 5, 2018 •

edited

Loading

LUKELIEM commented Nov 5, 2018 via email

alshedivat commented Nov 5, 2018

LUKELIEM commented Nov 5, 2018

Coin Game #7

Coin Game #7

Comments

LUKELIEM commented Nov 5, 2018 • edited Loading

alshedivat commented Nov 5, 2018

LUKELIEM commented Nov 5, 2018

LUKELIEM commented Nov 5, 2018

alshedivat commented Nov 5, 2018 • edited Loading

LUKELIEM commented Nov 5, 2018 via email

alshedivat commented Nov 5, 2018

LUKELIEM commented Nov 5, 2018

LUKELIEM commented Nov 5, 2018 •

edited

Loading

alshedivat commented Nov 5, 2018 •

edited

Loading