ReinforcementLearning.jl icon indicating copy to clipboard operation
ReinforcementLearning.jl copied to clipboard

TicTacToeEnv allows illegal moves

Open colintbowers opened this issue 2 years ago • 0 comments

Apologies in advance if this is deliberate behavior, but it seemed odd to me that TicTacToeEnv allows illegal moves. For example:

env = TicTacToeEnv()
env(1)
env(1)

allows o and x to go in the same square (top-left). Note that a call to:

is_terminated(env)

will now error with the message:

ERROR: KeyError: key TicTacToeEnv([0 1 1; 1 1 1; 1 1 1;;; 1 0 0; 0 0 0; 0 0 0;;; 1 0 0; 0 0 0; 0 0 0], ReinforcementLearningEnvironments.Cross()) not found

An implication of this behavior is:

env = TicTacToeEnv()
env(1)
env(2)
env(1)

will now place the board in a state where there is one o and one x, but it is o's turn.

I'm brand new to this package, but it seems to me that the fix should just be to add a check to the top of the function:

function (env::TicTacToeEnv)(action::CartesianIndex{2})
    env.board[action, 1] = false
    env.board[action, Base.to_index(env, env.player)] = true
    env.player = !env.player
end

maybe something like !env.board[action,1] && error("some message"), since if that square is already false, it means a move has already been played there. Or perhaps there is some other desired return when an illegal move is played. I'm not very familiar with this package (or topic) yet.

Cheers,

Colin

colintbowers avatar Nov 14 '23 10:11 colintbowers