open_spiel icon indicating copy to clipboard operation
open_spiel copied to clipboard

Parcheesi

Open Jazeem opened this issue 4 years ago • 6 comments

Adding new game Parcheesi which is a variant of Ludo with two dices. The game is a good example for a perfect information game with chance element that is played by 4 players. Players rely on skill as well as luck similar to Backgammon.

P.S. Wasn't sure how much of spiel.h needs to be implemented for a game UndoAction, ObservationTensor is yet to be implemented. Otherwise have analysed multiple playthroughs of the game using example.py and verified that it works well.

Jazeem avatar Jul 18 '21 20:07 Jazeem

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

:memo: Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

google-cla[bot] avatar Jul 18 '21 20:07 google-cla[bot]

@googlebot I signed it!

Jazeem avatar Jul 19 '21 16:07 Jazeem

Thanks for the PR and interest in OpenSpiel!

We note that your request relates to an implementation of an existing board game. That existing game may be subject to IP rights (e.g., copyrights, trademarks) owned by another party. Accordingly, we need to consult with our internal legal team to determine whether we have the rights to merge this request into the base code. We will reply after we make that determination.

Thanks again.

lanctot avatar Jul 21 '21 00:07 lanctot

Hi @lanctot

I understand the legal situation and thank you for the consideration! Meanwhile, would you be able to suggest one of the open_spiel algorithms that is suited to build a strong player for this kind of game.

Jazeem avatar Jul 21 '21 07:07 Jazeem

Hmm, I assume you want to use RL? Because you can always go the classic route with *-minimax and a heuristic eval function.

I was going to say AlphaZero.. but seems like from the code, the number of distinct actions is 3737352 = 159870. That number of outputs for the policy might be very (too?) big. Is that as compact as it can be?

For games like this it would be great to have something like a version of AlphaZero (or e.g. even TreeStrap, or DQN) that trained a value net only (no policy, and no state-action value) that you can just use with search. Then if the branching factor is small enough you could just do search using the value function.

I'd say if you can get the actions space to within a few thousand then you probably can just use AlphaZero. Otherwise, the best thing to do: add the ability for one of those algorithms to only use the value net (if you choose DQN then you'll have to do a search to choose the action rather than max over Q-values). If you do the latter, it would make a fantastic addition to the library and I'd encourage you to contribute it.

lanctot avatar Jul 21 '21 08:07 lanctot

I guess it would be fun to try both a heuristic eval function based on strategies I've learned playing the game and see if an RL method can be better than that.

Distinct actions is not as compact as it can be. I'm new to RL and working on this is part of a learning exercise too. I was focusing on keeping the code simple and didn't really consider the ramifications of having a large action space.

I will read up and learn more about DQN and see if I can make the modification to make it work only with a value net. It would be fantastic to make a contribution to this library.

Jazeem avatar Jul 21 '21 09:07 Jazeem