Connect 4 & State Representation problem

Gusty · 18-02-2003 06:30PM #1

Hi all,
This is my first thread so hello to everybody.
Ok here is my long winded question that I need somebody to help me with.
I am developing a Connect 4 game with an AI player. This player will be built through Reinforcement Learning, more specifically Temporal Difference.
My problem is with the representation of the board and all its states. I need to represent and store all states, their values and possible successor states.
The real problem is with the large number of possible states. I should be able to lessen this problem by taking advantage of the symmetries that occur.
To explain this properly here is a solution that does not use symmetries. As each state is encounter it is turned into a string 42 chars long (6*7 board with 1 char per square), y for a yellow piece, r for red and _ if its empty. The hashcode function will then give this state a unique id number. I would then store all this info in a linked list or some type of dynamic memory store. By info I mean hashcode, state value and a list of successor states (also hashcodes).
I’m not sure if I have explained this clearly so if you need clarification on anything I will try to explain further. Any help you can possibly give would be very much appreciated.

Thanks,
Gus.

Hobbes · 18-02-2003 06:50PM

It's an intresting dilemma, I had some ideas but when I did a search on some of your keywords I notice it's actually a collage project.

So good luck!

Here's some hints.

1) Post some of your stuff that you have done and explain what you have problems with.

2) If your going to post homework/collage projects, try not to be verbatim as it's easier to spot.

Talliesin · 18-02-2003 06:53PM

Okay. This one got taken locked during the period yesterday when there were a good few "do my homework" threads to take care of.
It is indeed a fair question, having dealing with part of the problem and putting forward an initial suggestion.

So I'm going to re-open it.

Talliesin · 19-02-2003 07:24PM

Is this primarily a problem as far as storage goes, or as far as search time, or both?

Storage could be reduced by using 1 char to represent 4 squares:

0x00 - blank blank blank blank
0x01 - blank blank blank red
0x02 - blank blank blank yellow
0x04 - blank blank red blank

... and so on. This reduces the size of each board state to 11 chars (10.5 plus 4 unused bits - assuming an 8bit char).

Indeed one could go further in compressing the representation, using a variety of different mechanisms. Huffman and token-based compression would be possibilities. If you use the same tokens for the entire system then you won't have the overheads that often makes them unsuitable for compression of small strings.

One possibility would be to take advantage of the fact that the above system has some impossible values (which we can therefore use for special purposes), and that Connect4 works from the bottom up, so there would often be a point at which there is nothing but blanks left (assuming the string represents the board from the bottom up). Hence we could use a value of 3 for any of the bit-pairs above to mean "everything after this is blank".
Another possibility would be to use a special value to mean "mirror image of ->".
If we have the address of a board state's representation (pBStat) and we are storing another that is it's mirror image at pBMirror then we could make the first char of the representation 0xFF, and copy pBStat to reinterpret_cast<BoardState*>(pBMirror + 1).
Then if we see 0xFF as the first char we obtain the pointer to its mirror from the next sizeof(BoardState*) chars.

Also, are you only storing these states as they arise, or pre-emptively. If you are being pre-emptive then be careful not to store invalid states (5 or more pieces the same colour in a line).

Wrapper classes could handle all of the above, so the code doing the processing could see the representations as the char[6][7] you started with if that is more convenient.

Zab · 19-02-2003 10:10PM

It's important for you to decide how much computuation you are willing to put into encoding/unencoding these states.

Talliesin mentions that you could get four blocks into a byte. You could actually get five ( as 3^5 < 2^8 ) and perhaps use a lookup table to unencode them. This would give you nine bytes for a state rather then 11.... although you may end up packing them later anyway.

You could also get an entire column into a single byte, which would mean an entire state would only be seven bytes. The low six bits would just mean yellow or red, and the seventh ( I'm taking the lowest bit to be bit 1 ) would be your 'blank' marker. All bits contiguous to the blank marker, and equal to it, would be blank. This would leave the bit eight free, which would fit in with something along the lines of Talleisin's mirror concept. Also, this would make it quite easy to detect mirror and inverse situations, as each column would have a comparable value, without having to decode the state. Of course, this situation doesn't allow for floating pieces, but then neither does Connect 4!

Zab.

Talliesin · 20-02-2003 01:27PM

Ah yeah!
The lack of floating bits is used in my suggestion in the "blank from here on" marker, but I like the column-per-byte approach.

Some proper testing is going to have to be done on the encoding/decoding overheads...

Gusty · 20-02-2003 05:20PM

Thanks for the help lads,
I won't even pretend to understand half the stuff you told me there. Way over my head. Not to worry, I know I have my work cut out for me and your help will point me in the right direction.
I hopefully will get a bit done on this project tonight and I will post a progress report for those who may be interested.
You will have to give me a chance to work on these ideas.
Many thanks again,
Gus.

Hobbes · 20-02-2003 06:36PM

Seeing as everyone is commenting.

I would take a guess and say that you would only have to memorise a smaller grid.

Think about it, only 4x4 combinations are valid so everything outside that range can be ignored, however as filling a space or leaving a space open is also a tactic you would need to memorise 5x4 grid like so.

OOOOO
OOOOO
OOOOO
OOOOO

You could then move that grid around anywhere on the board lining up the bottom with the lowest part of the open space and comparing then lining up with highest piece and comparing.

Talliesin · 20-02-2003 07:06PM

Originally posted by Hobbes
Think about it, only 4x4 combinations are valid

Not true. Consider a case where there are two places that could be threatening in a few moves time on opposite sides of the board. "Thinking" about both to determine which to deal with would require a view of the entire board.

It did give me one idea though. At various points calculating the moves to guarantee a win becomes trivial. If you can identify one of those points where you can say "red will win in two moves" there is no need to store the moves after that, just record it as a winning position for red.

Hobbes · 20-02-2003 08:21PM

Originally posted by Talliesin
Not true. Consider a case where there are two places that could be threatening in a few moves time on opposite sides of the board. "Thinking" about both to determine which to deal with would require a view of the entire board.

Not really, 4x4 is the combinations to win (nice idea about the red, would cut it down more). However you would never need to see past 5 columns to determine what is a good move.

So with a 5x4 grid, you would scan the columns lining up the grid at the lowest point, then the highest point.
1-5,2-6,3-7,4-8

It shouldn't matter if there are threatening moves across the board as the grid would find them and the 5x4 combinations would all you ever need to know.

Another thing to do prehaps is if a possible win spot exists based on the other person dropping a piece into a blank area you could have the UI go on defensive to force other player to run out of options.

... Actually if 4x4 is the valid win situations you could mirror it on the X and Y axis to cut the number of combinations to store.

Zab · 21-02-2003 12:44AM

We seem to have left whose turn it is out of the board state.... although there always seems to be plenty of free bits around.

Gusty, most of what is here isn't too complicated, but you do have to worry about the entire application, while this is just a small part of it! Keep in mind that all this just lessens the amount of memory needed to store states. You're still not going to be able to store every possible state in memory, so you'll have to decide how many states you are going to want to store, or if it is a case of the more the better. What language are you using anyway? And is this a class project ( ie a lot of poeple being assigned the same project ) or are you the only one doing connect 4?

Hobbes, I'm not convinced that there is no strategy that would depend on pieces that wouldn't fit in a 5x4 grid. Also, you seem to be straying into classical AI, whereas Gusty has stated he is writing an application based on Reinforcement Learning. And I'm not too sure what you mean by mirroring it on the X and Y axis. If you mean rotating the grid, surely you would end up with a different situation ( pieces being dropped from a different direction and floating pieces )?

Zab.

Gusty · 21-02-2003 02:26AM

This project is my final year project and I am doing it alone. 2 others are doing similar projects (draughts) but are different to mine in respect to the symmetries that C4 presents. Plus they are using Neural Networks and differnt forms of Reinforcement Learning.
I'll be doing this project in Java.

I don't need to worry too much about winning combinations in regards the AI player. Basically all I need to solve at the moment is an efficent way of storing states. Whats happens is the Reinforcement Learning agent learns (calculates an optimal policy) as it plays every game. It stores all states as it comes across them and assigns a value to a state based on the outcome of the game. Based on the outcome it receives a reward eg 1 for win, 0 draw, -1 loss. It then backs this up to all states that were encountered in this game. (value of states are changed accordingly).
So I will need to be able to store a large amount of states. With each state I need to associate a value and also a list of possible next states. That is I'm in a current state & I would have a max of 7 moves, I look up all these next states and find its value then choose the max. (or a random one if the policy is exploritive)

The reason I would like to use symmetries is to cut down the amount of states I need to store. I mean there would be a lot of replication, but the more I think about it the more impossible it seems to use symmetries with proper effectiveness. Maybe Talliesin orignal solution would work best.?

You're still not going to be able to store every possible state in memory, so you'll have to decide how many states you are going to want to store, or if it is a case of the more the better

Well I could write all states to file and load what was needed...or something like that, I'm too tired to think anymore tonight.

Hobbes · 21-02-2003 10:26AM

Originally posted by Zab

Hobbes, I'm not convinced that there is no strategy that would depend on pieces that wouldn't fit in a 5x4 grid.

There is no win stragedy that doesn't fall outside of a 5x4 grid. Everything outside that grid is just a matter of timing as to who goes next to force a win. Of course you have to move that grid around to determine all places on the board but the objective was to store possible wins in the least amount of data. (actually now that I think of it, it may be possible to get the grid smaller again).

The AI would only need to learn those patterns to determine a possible win solution.

And I'm not too sure what you mean by mirroring it on the X and Y axis. If you mean rotating the grid, surely you would end up with a different situation ( pieces being dropped from a different direction and floating pieces )?

I thnk you have to stop thinking about playing a game of connect 4.

Your not, your storing data. There would be certain combinations where flipping on the Y-Axis would help. If you can keep your number of states down you could then assign weights to the patterns so the AI can check the most possible used solutions which would eventually throw away useless patterns.

Another option to drop the number of states to store is to flip the colours. As a possible win condition is automatically a possible loose condition with the colours reversed. So you could assign the inverse weight to the reverse.

Zab · 21-02-2003 06:10PM

Hobbes, you're not doing much to convince me! Take the following example:

3|       |
2|OO   OO|
1|XX   XX|
----------
  abcdefg

X to play. The winning move is to drop the X in the d column, but there no way you can see this if you are only looking at a 5x4 grid. There are plenty of other combinatorial situations that would range for more than 5 columns.

As we want to store this data as compact as possible, we certainly don't want to stop thinking about playing connect four. If we do, we give up all the optimisations due to the nature of connect four, such as the fact that pieces can't float, and we cannot have five of the same color in a row ( indeed, it may never be necessary to store a state that has four in a row ).

It's not that I don't think you couldn't build a reasonable AI based on a 5x4 grid, it's just that I don't think it could be optimal ( at least without lots of other complex datastructures ). Indeed, Gusty may have to do something along these lines!

Gusty, I'm not too sure that you are grasping the scope of the data involved. If we presume that, say, six bytes is the optimal storage space for a complete connect 4 board. If you were to store all possible six byte states, you would end up with 1,688,849,860,263,936 bytes. Which is a tad more than a modern workstation can handle. If we could knock it down to three or four bytes, then we might start being realisitic ( but it would still be a lot ). I would imagine that this is a common problem with the type of application you are writing, so you may want to look into how it is usually solved. One possibility might be to just store the moves in some sort of tree structure. Each move would just be a number between 1 and 7, which fits nicely into 3 bits. Although I'm not too sure that this will fit in with reinforcement learning ( don't know that much about it ). Also, the AI would have to be playing from the start ( or at least know what all the previous moves were ).

Also, Java makes this all the more difficult, as you can't do as much "fancy" work with memory locations. You may have to make massive arrays of bytes and use a class that can work with indexes into these arrays as a sort of pointer. This would allow you to use arrays ( the most efficient way of storing lots of data in java ) rather then the object collection classes.

Zab

Zab · 21-02-2003 08:34PM

Oops. Forget that bit about the moves, as it doesn't make any sense at all!

Zab.

silverside · 22-02-2003 10:05PM

I think it would be better to store the board status at time t as the sequence of moves Y0,R0,Y1,R1, up to Yt-1, Rt-1 where Yt,Rt respectively denote the column selected by yellow and red players at move t. Now the number of feasible sequences in this representation is no more than in any other representation.
It will be something much less than 7^42 as most games will terminate WELL BEFORE move 42 and columns will fill up. One does not store each cell's status explicitly but obviously it can be regenerated trivially as required.
Possibly there are identities which can be identified among these representations also. e.g Y1,R2,Y3,R4 == Y1,R4,Y3,R2 i.e. it doesnt matter what order the tokens are dropped into the columns as long as the columns are different. Or one could explicitly search among all previous paths for matches, this would seem feasible at early stages of the game.

Also on can take advantage of obvious winning strategies at each move (is this 1-deep minimax?) to decide the value of each state. Obviously an open 3 is a winner, as is 2<-3 open->2.

You can evaluate the value of any state as the best of each of its sub-blocks where the sub-blocks are limited by a horizontal YRY or a horizontal AxB where x is a cell that is unreachable at the current time. Then possibly you could define all 5X5 blocks as winning/losing/unknown/needs non-local information.

Obviously there is the global symmetry column n == column 6-n. However I don't think symmetries will help much, I think it would be better to allocate a value to each state using e.g. an open 2 worth 0.25 points, etc etc.

How do you plan to train your algorithm? by brute force? By competing against other algorithms or human opponents?

Anyway I may be totally off track but would be interested to see what you come up with.

silverside · 22-02-2003 10:15PM

Also I wouldnt worry too much about the exact memory representation of each state. i.e. I expect you will be using half- or quarter byte variable, but this will all be handled by a wrapper class (as previously mentioned) which you can optimise later once you know your AI algorithm is working. a factor of 10 or so in memory usage is not IMO worth worrying about, better to have easily maintainable code.

silverside · 22-02-2003 10:21PM

Also I wouldnt worry too much about the exact memory representation of each state. i.e. I expect you will be using half- or quarter byte variable, but this will all be handled by a wrapper class (as previously mentioned) which you can optimise later once you know your AI algorithm is working. a factor of 10 or so in memory usage is not IMO worth worrying about, better to have easily maintainable code.

str8_away · 24-02-2003 04:46PM

Just a thought

How many permutation could there be?
We can think a game is just a selection of these permutations in certen order.

Each square on the board can be one of three state and there are 42 squares so the max amount of permutation would be 3^42.

2^64 < 3^42 < 2^72

So you could present ALL permutations in 72bits or 9 bytes.
But you don't need ALL permutations because there are no floating pieces, also once the winer is determend the game is finished.

So there are LOTS of permutations you can cut off.
After cutting off those permutation you might able to present ALL POSSIBLE state with just 8bytes.

you could store A game with an array of 42 items (42 squares = 42 moves) with each item being 8 byte, So each game would take 336 byte max.

you could easily determend who's turn by the index being odd/even number.

Connect 4 & State Representation problem

Comments