AI Fighting Game: now with “Reinforcement Learning”

As usual, you can find our latest work at
http://www.cs.ucla.edu/~billyh/AIFighter.zip

The entire project can be found at
http://www.cs.ucla.edu/~billyh/AIFightingGame.zip

In response to gameplay testing feedback [that it was difficult to train the AI and that the non-active player got bored], I’ve added a teaching mechanism for the inactive player. The inactive player controls a small “teacher” version of their character in the lower corner of the screen. The teacher can either cheer their AI on or admonish their AI and tell them what they should be doing. The cheering raises the AI’s morale meter [which currently does nothing, but soon will have some flashy effect when full]. Undiscriminating cheering raises morale slightly, but cheering after the AI accomplishes something (like hitting the other player or dodging an arrow) raises morale by a significant amount.

The inactive player can use morale to teach the AI. If the inactive player presses down on the Dpad, their next action is tracked and the AI will perform/learn that action. This costs morale, but can get the AI out of a situation where it is stuck doing something stupid.

How this works:
The normal AI algorithm is to construct data points from the active player’s actions and then construct a decision tree between rounds to fit that data. Then the decision tree is used to drive the AI in the next round. In the “reinforcement learning” model, the inactive player (who in the previous round was the active player, and whose data points from that round were used to train the current AI) orders the AI to perform a new action. Given the next action of the inactive player, we perform two actions. First of all, we insert a new datapoint with a very high weight into our training set. This will assure that in subsequent rounds the AI will conform to the teaching. Secondly we mutilate the AI’s existing decision tree by replacing the leaf representing the current conditions (actually, the conditions that existed at the time when the inactive player performed the training) with the taught action. This will force the AI to take the prescribed action in the current round.

A word on weights on data points in decision trees:
The normal decision tree algorithm (say, from Russell and Norvig) doesn’t have weights associated with data points. Currently, we use the weights to resolve ambiguities. For example, if from the starting position you put down your controller for 10 seconds and then charge the other player, there will be several “Stand Still” data points along with one “Move towards Player” data point for the same exact set of starting conditions. Before building the decision tree, we resolve all such ambiguities by considering the weighted sum of actions data points that conflict [that is, have different actions for the exact same set of conditions]. The highest weighted action for an ambiguous point is chosen as the action for that point. Each round we multiply the weight of all existing data points by 0.5, lowering the effect of previous rounds, but maintaining training in situations that we’ve only encountered once.

I’m considering another possible change to the decision tree/weighting algorithm by using the weighings in the actual information gain decision tree algorithm which chooses which condition to split on. The hope is that this would better generalize (high weight) learning from teaching actions, because the algorithm would first separate the high weight data points from the rest of the dataset.

A note on getting AIFighter running from home:
Have two Xbox360 controllers to plug into your PC. It runs on one, but many features (like the teachers) are disabled.
Download the XNA framework from Microsoft. No XNA game will run without it.

Please send feedback via comments. I realize that some things (cough, save and load) don’t work yet, but I’m eager for playtesting.

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*