Upper Confidence Stump
I’ve been pursuing a pointer I got to UCT, or upper confidence bounds applied to trees. The first pass doesn’t really have trees, though. It it just comes up with composite win/loss rates for each action. I discovered I had to uncompress my history a little to make full use of this. I had been compressing actions to a single state point, but this lost information on sub-decisions, such as whether a loan was taken.
This comes to some right conclusions (don’t take loans) and some wrong conclusions (discarding cards is better than playing them?) The sad thing is that the way it’s playing – digging itself into a resource-exhausted hole, those conclusions are actually correct. In short the stump is stumped, and I’m going to have introduce some context to make sense of it.
The tricky part is that it is collecting data on states that may never be visited again. With a deck of cards in the game, the chances of running across an identical situation are slim. The chances of running across a similar situation are high, however, so I’m searching for a clustering method to hash down the large state space into something more manageable that I can build my trees on.
While searching for comparable work in Ruby, I ran across Igvita, a blog with a number of articles on clustering and filtering algorithms, among many other interesting things Ruby.
Strange as it may seem, this is about the first time I’ve needed real computer science. Thus far I’ve been focused on games and other practical applications concerned mostly with input/output formatting. We spent all that time on algorithms in school, learning about all different varieties of binary trees, and yet, I can’t recall ever needing to implement even a basic binary tree. Still, I’m starting to see why I sometimes see algorithms as a hiring point; they are what allows you to solve the hard problem, rather than simple grunt work problems.
Mostly, I spent some time tracking down some breakage in the card location tracking, and it’s tests. I’m starting to wonder if I need to build some dedicated debugging support. Essentially make the domain language a little more real by being able to debug things at a higher level. This could be a large nasty problem, however.
State of peril
The debugging work also made me wonder if perhaps I need to do something about the use of mutable state. Usually the stuff I’m debugging has changeable variables; sometimes they are the game state in yggdrasil, where I could look at a little bit of history if I built the mechanisms, or the odd variables that aren’t tracked. One case from the weekend was the world matching for components, which broke down when I started to introduce multiple runs.
The world, the central yggdrasil store, doesn’t exist when the game components get created. Yet in order to track their location, they need to have versioned state. I was lazily assigning the world the first time it was relevant; another citizen object was available to query at that point. Unfortunately, each trial executes in it’s own world, and the state variable that tracks the home deck, having been set once the first time around, never got reset, so cards ended up split between the two worlds.
In the more mundane code, I found a few places where I was doing things that made no sense at all – my early win-rates were solidly 0.33 because I was entering data for every player at every choice point – even though only one player made the choice.
Legend of the lost iterator
Reviewing the changes for writing up this review, I discovered that the last edit from my previous cycle was setting up a state iterator, which I have since failed to use anywhere. Something to address when I resume, I suppose.