The 2009 Mario AI Competition

Page created by Danielle Stone

Arts & Entertainment

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

The 2009 Mario AI Competition
                                Julian Togelius, Sergey Karakovskiy and Robin Baumgarten

   Abstract— This paper describes the 2009 Mario AI Competi-           benchmarking of learning and other AI algorithms. Bench-
tion, which was run in association with the IEEE Games Inno-           marking is frequently done using very simple testbed prob-
vation Conference and the IEEE Symposium on Computational              lems, that might or might not capture the complexity of real-
Intelligence and Games. The focus of the competition was on
developing controllers that could play a version of Super Mario        world problems. When researchers report results on more
Bros as well as possible. We describe the motivations for holding      complex problems, the technical complexities of access-
this competition, the challenges associated with developing            ing, running and interfacing to the benchmarking software
artificial intelligence for platform games, the software and           might prevent independent validation of and comparison
API developed for the competition, the competition rules and           with the published results. Here, competitions have the role
organization, the submitted controllers and the results. We
conclude the paper by discussing what the outcomes of the              of providing software, interfaces and scoring procedures to
competition can teach us both about developing platform game           fairly and independently evaluate competing algorithms and
AI and about organizing game AI competitions. The first two            development methodologies.
authors are the organizers of the competition, while the third             Another strong incentive for running these competitions
author is the winner of the competition.                               is that they motivate researchers. Existing algorithms get
  Keywords: Mario, platform games, competitions, A*,                   applied to new areas, and the effort needed to participate in
evolutionary algorithms                                                a competition is (or at least, should be) less than it takes to
                       I. I NTRODUCTION                                write new experimental software, do experiments and write
                                                                       a completely new paper. Competitions might even bring new
   For the past several years, a number of game-related
                                                                       researchers into the computational intelligence fields, both
competitions have been organized in conjunction with major
                                                                       academics and non-academics. One of the reasons for this is
international conferences on computational intelligence (CI)
                                                                       that game-based competitions simply look cool.
and artificial intelligence for games (Game AI). In these
                                                                           The particular competition described in this paper is
competitions, competitors are invited to submit their best
                                                                       similar in aims and organization to some other game-related
controllers or strategies for a particular game; the controllers
                                                                       competitions (in particular the simulated car racing compe-
are then ranked based on how well they play the game, alone
                                                                       titions) but differs in that it is built on a platform game, and
or in competition with other controllers. The competitions
                                                                       thus relates to the particular challenges an agent faces while
are based on popular board games (such as Go and Othello)
                                                                       playing such games.
or video games (such as Pac-Man, Unreal Tournament and
various car racing games [1], [2]).                                    A. AI for platform games
   In most of these competitions, competitors submit con-
                                                                          Platform games can be defined as games where the player
trollers that interface to an API built by the organizers of
                                                                       controls a character/avatar, usually with humanoid form, in
the competition. The winner of the competition becomes the
                                                                       an environment characterized by differences in altitude be-
person or team that submitted the controller that played the
                                                                       tween surfaces (“platforms”) interspersed by holes/gaps. The
game best, either on its own (for single-player games such as
                                                                       character can typically move horizontally (walk) and jump,
Pac-Man) or against others (in adversarial games such as Go).
                                                                       and sometimes perform other actions as well; the game world
Usually, prizes of a few hundred US dollars are associated
                                                                       features gravity, meaning that it is seldom straightforward to
with each competition, and a certificate is always awarded.
                                                                       negotiate large gaps or altitude differences.
There is no requirement that the submitted controllers be
                                                                          To our best knowledge, there have not been any previous
based on any particular type of algorithm, but in many cases
                                                                       competitions focusing on platform game AI. The only pub-
the winners turn out to include computational intelligence
                                                                       lished paper on AI for platform games we know of is a recent
(typically neural networks and/or evolutionary computation)
                                                                       paper by the first two authors of the current paper, where we
in one form or another. The submitting teams tend to
                                                                       described experiments in evolving neural network controllers
comprise both students, faculty members and persons not
                                                                       for the same game as was used in the competition, using
currently in academia (e.g. working as software developers).
                                                                       an earlier version of the API [3]. Some other papers have
   There are several reasons for holding such competitions
                                                                       described uses of AI techniques for automatic generation of
as part of the regular events organized by the computational
                                                                       levels for platform games [4], [5], [6].
intelligence community. A main motivation is to improve
                                                                          Most commercial platform games incorporate little or no
   JT is with IT University of Copenhagen, Rued Langgaards Vej 7,      AI. The main reason for this is probably that most platform
2300 Copenhagen S, Denmark. SK is with IDSIA, Galleria 2, 6928         games are not adversarial; a single player controls a single
Manno-Lugano, Switzerland. RB is with Imperial College of Science
and Technology, London, United Kingdom. Emails: julian@togelius.com,   character who makes its way through a sequence of levels,
sergey@idsia.ch, robin.baumgarten06@doc.ic.ac.uk                       with his success dependent only on the player’s skill. The

obstacles that have to be overcome typically revolve around shells if jumped on, and these shells can then be picked up
the environment (gaps to be jumped over, items to be found by Mario and thrown at other enemies to kill them.
etc) and NPC enemies; however, in most platform games Certain items are scattered around the levels, either out in
these enemies move according to preset patterns or simple the open, or hidden inside blocks of brick and only appearing
homing behaviours. when Mario jumps at these blocks from below so that he
Though apparently an under-studied topic, artificial intelli- smashes his head into them. Available items include coins,
gence for controlling the player character in platform games mushrooms which make Mario grow Big, and flowers which
is certainly not without interest. From a game development make Mario turn into the Fire state if he is already Big.
perspective, it would be valuable to be able to automatically
A. Automatic level generation
create controllers that play in the style of particular human
players. This could be used both to guide players when While implementing most features of Super Mario Bros,
they get stuck (cf. Nintendo’s recent “Demo Play” feature, the standout feature of Infinite Mario Bros is the automatic
introduced to cope with the increasingly diverse demographic generation of levels. Every time a new game is started,
distribution of players) and to automatically test new game levels are randomly generated by traversing a fixed width
levels and features as part of an algorithm to automatically and adding features (such as blocks, gaps and opponents)
tune or create content for a platform game. according to certain heuristics. The level generation can be
From an AI and reinforcement learning perspective, plat- parameterized, including the desired difficulty of the level,
form games represent interesting challenges as they have which affects the number and placement of holes, enemies
high-dimensional state and observation spaces and relatively and obstacles. In our modified version of Infinite Mario
high-dimensional action spaces, and require the execution of Bros we can specify the random seed of the level generator,
different skills in sequence. Further, they can be made into making sure that any particular level can be recreated by
good testbeds as they can typically be executed much faster simply using the same see. Unfortunately, the current level
than real time and tuned to different difficulty levels. We generation algorithm is somewhat limited; for example, it
will go into more detail on this in the next section, where we cannot produce levels that include dead ends, which would
describe the specific platform game used in this competition. require back-tracking to get out of.

II. I NFINITE M ARIO B ROS B. Challenges for AIs (and humans)
The competition uses a modified version of Markus Pers- Several features make Super Mario Bros particularly inter-
son’s Infinite Mario Bros, which is a public domain clone esting from an AI perspective. The most important of these is
of Nintendo’s classic platform game Super Mario Bros. The the potentially very rich and high-dimensional environment
original Infinite Mario Bros is playable on the web, where representation. When a human player plays the game, he
Java source code is also available1 . views a small part of the current level from the side,
The gameplay in Super Mario Bros consists in moving the with the screen centered on Mario. Still, this view often
player-controlled character, Mario, through two-dimensional includes dozens of objects such as brick blocks, enemies
levels, which are viewed sideways. Mario can walk and run to and collectable items. The static environment (grass, pipes,
the right and left, jump, and (depending on which state he is brick blocks etc.) and the coins are laid out in a grid (of
in) shoot fireballs. Gravity acts on Mario, making it necessary which the standard screen covers approximately 22 ∗ 22
to jump over holes to get past them. Mario can be in one of cells), whereas moving items (most enemies, as well as the
three states: Small, Big (can crush some objects by jumping mushroom power-ups) move continuously at pixel resolution.
into them from below), and Fire (can shoot fireballs). The action space, while discrete, is also rather large. In
The main goal of each level is to get to the end of the the original Nintendo game, the player controls Mario with
level, which means traversing it from left to right. Auxiliary a D-pad (up, down, right, left) and two buttons (A, B). The A
goals include collecting as many as possible of the coins button initiates a jump (the height of the jump is determined
that are scattered around the level, finishing the level as fast partly by how long it is pressed); the B button initiates
as possible, and collecting the highest score, which in part running mode and, if Mario is in the Fire state, shoots a
depends on number of collected coins and killed enemies. fireball. Disregarding the unused up direction, this means that
Complicating matters is the presence of holes and moving the information to be supplied by the controller at each time
enemies. If Mario falls down a hole, he loses a life. If he step is five bits, yielding 25 = 32 possible actions, though
touches an enemy, he gets hurt; this means losing a life if he some of these are nonsensical (e.g. left together with right).
is currently in the Small state. Otherwise, his state degrades Another interesting feature is that there is a smooth learn-
from Fire to Big or from Big to Small. However, if he jumps ing curve between levels, both in terms of which behaviours
and lands on an enemy, different things could happen. Most are necessary and their required degree of refinement. For
enemies (e.g. goombas, cannon balls) die from this treatment; example, to complete a very simple Mario level (with no
others (e.g. piranha plants) are not vulnerable to this and enemies and only small and few holes and obstacles) it might
proceed to hurt Mario; finally, turtles withdraw into their be enough to keep walking right and jumping whenever
there is something (hole or obstacle) immediately in front
1 http://www.mojang.com/notch/mario/ of Mario. A controller that does this should be easy to

// always the same dimensionality 22x22
// always centered on the agent
public byte[][] getCompleteObservation();
public byte[][] getEnemiesObservation();
public byte[][] getLevelSceneObservation();
public float[] getMarioFloatPos();
public float[] getEnemiesFloatPos();
public boolean isMarioOnGround();
public boolean mayMarioJump();

Fig. 1. The Environment Java interface, which contains the observation,
i.e the information the controller can use to decide which action to take.

learn. To complete the same level while collecting as many
as possible of the coins present on the same level likely
demands some planning skills, such as smashing a power-
up block to retrieve a mushroom that makes Mario Big so                      Fig. 2. Visualization of the granularity of the sensor grid. The top six
                                                                             environment sensors would output 0 and the lower three input 1. All of
that he can retrieve the coins hidden behind a brick block,                  the enemy sensors would output 0, as even if all 49 enemy sensors were
and jumping up on a platform to collect the coins there and                  consulted none of them would reach all the way to the body of the turtle,
then going back to collect the coins hidden under it. More                   which is four blocks below Mario. None of the simplified observations
                                                                             register the coins, but the complete observation would. Note that larger
advanced levels, including most of those in the original Super               sensor grids can be used (up to 22 × 22), if the controller can handle the
Mario Bros game, require a varied behaviour repertoire just                  information.
to complete. These levels might include concentrations of
enemies of different kinds which can only be passed by
timing Mario’s passage precisely; arrangements of holes and                         IV. C OMPETITION ORGANIZATION AND RULES
platforms that require complicated sequences of jumps to                        The organization and rules of the competition sought to
pass; dead ends that require backtracking; and so on.                        fulfill the following objectives:
                                                                                1) Ease of participation. We wanted researchers of dif-
                     III. C OMPETITION API
                                                                                     ferent kinds and of different levels of experience to
   In order to be able to run the competition (and in order to                       be able to participate, students as well as withered
use the game for other experiments), the organizers modified                         professors who haven’t written much code in a decade.
Infinite Mario Bros rather heavily and constructed an API                       2) Transparency. We wanted as much as possible to
that would enable it to be easily interfaced to learning                             be publicly known about the competing entries, the
algorithms and competitors’ controllers. The modifications                           benchmark software, the organization of the competi-
included removing the real-time element of the game so that                          tion etc. This can be seen as an end in itself, but a
it can be “stepped” forward by the learning algorithm, re-                           more transparent competition also makes it easier to
moving the dependency on graphical output, and substantial                           detect cheaters, to exploit the scientific results of the
refactoring (the developer of the game did not anticipate that                       competition and to reuse the software developed for it.
the game would be turned into an RL benchmark). Each time                       3) Ease of finding a winner. We wanted it to be unam-
step, which corresponds to 40 milliseconds of simulated time                         biguously clear how to rank all submitted entries and
(an update frequency of 25 fps), the controller receives a                           who won the competition.
description of the environment, and outputs an action. The                      4) Depth of challenge. We wanted there to be a real score
resulting software is a single-threaded Java application that                        difference between controllers of different quality, both
can easily be run on any major hardware architecture and                             at the top and the bottom of the high-score table.
operating system, with the key methods that a controller
                                                                                The competition web page hosts the rules, the down-
needs to implement specified in a single Java interface file
                                                                             loadable software and the final results of the competition2 .
(see figure 1). On an iMac from 2007, 5 − 20 full levels
                                                                             Additionally, a Google Group was set up to which all
can be played per second (several thousand times faster than
                                                                             technical and rules questions were to be posted, so that they
real-time). A TCP interface for controllers is also provided.
                                                                             came to the attention of and could be answered by both
   Figure 2 shows the size and layout relative to Mario of the               organizers and other competitors3 , and where the organizers
sensor grid. For each grid cell, the controller can choose to                posted news about the competition. The searchable archive
read the complete observation (using the getCompleteObser-                   of the discussion group functions as a repository of technical
vation() method) that returns an integer value representing                  information about the competition.
exactly what occupies that part of the environment (e.g. a
                                                                                Competitors were free to submit controllers written in any
section of a pipe), or one of the simplified observations
                                                                             programming language and using any development method-
(getEnemiesObservation() and getLevelSceneObservation())
that simply returns a zero or one signifying the presence of                   2 http://julian.togelius.com/mariocompetition2009

an enemy or an impassable part of the environment.                             3 http://groups.google.com/group/marioai

ology, as long as they could interface to an unmodified that one that took the least time to do so would win, and if
version of the competition software and control Mario in both took the same time the most efficient killer would win
real time on a standard desktop PC running Mac OS X or etc.
Windows XP. For competitors using only Java, there was a
V. M EDIA STRATEGY AND COVERAGE
standardized submission format. Any submission that didn’t
follow this format needed to be accompanied by detailed The competition got off of to a slow start. The organizers
instructions for how to run it. Additionally, the submission did not put a final version of the competition software online
needed to be accompanied by the score the competitors had until relatively late, and neglected advertising the competition
recorded themselves using the CompetitionScore class that until about a month and a half before the deadline of the
was part of the competition software, so that the organizers first phase. Once it occurred to the organizers to advertise
could verify that the submission ran as intended on their sys- their competition, they first tried the usual channels, mainly
tems. We also urged each competitor to submit a description mailing lists for academics working within computational
of how the controller works as a text file. intelligence. One of the organizers then posted a link to the
Competitors were urged to submit their controllers early, competition on his Facebook profile, which led one of his
and then re-submit improved versions. This was so that any friends to submit the link to the social media site Digg. This
problems that would have disqualified the controllers could link was instantly voted up on the front page of the site, and
be detected and rectified and the controllers resubmitted. No a vigorous discussion ensued on the link’s comment page.
submissions or resubmissions at all were accepted after the This also meant 20000 visitors to the competition website in
deadline (about a week before each competition event). The the first day, some of which joined the discussion group and
most common problems that mandated resubmission were: announced their intention to participate. Seeing the success of
this, the organizers submitted a story about the competition to
• The controller used a nonstandard submission format, the technology news site Slashdot. Slashdot is regularly read
and no running instructions were provided. by mainstream and technology news media, which meant
• The controller was submitted together with and entan- that within days there were stories about the competition in
gled in a complete version of the competition software. (among others) New Scientist4 , MSNBC5 and Le Monde6 .
• The controller timed out, i.e. took more than 40 ms per At this point, Robin Baumgarten had already finished a
time step on average on some level. first version of his controller, and posted a video on YouTube
of his controller guiding Mario through a level, complete
A. Scoring
with a visualization of the alternate path the A* algorithm
All entries were scored before the conference through were evaluating at each time step7 . The video was apparently
running them on 10 levels of increasing difficulty, and so enjoyable to watch that it quickly reached the top lists,
using the total distance travelled on these levels as the and has at the time of writing been viewed over 600000
score. The scoring procedure was deterministic, as the same times. The success of this video was a bit of a double-edged
random seed was used for all controllers, except in the sword; it drew hordes of people to the competition website,
few cases where the controller was nondeterministic. The and led to several announcements of intention to participate
CompetitionScore method uses a supplied random number (though far from everybody who said they would eventually
seed to generate the levels. Competitors were asked to submitted a controller), but it also caused some competitors
score their own submissions with seed 0 so that this score to drop out as Robin’s controller was so impressive that they
could be verified by the organizers, but the seed used for considered themselves to not stand a chance to win.
the competition scoring was not generated until after the
VI. S UBMITTED CONTROLLERS
submission deadline, so that competitors could not optimize
their controllers for beating a particular sequence of levels. This section describes the controllers that were submitted
For the second phase of the competition (the CIG phase) to the competition. We describe the versions of the controllers
we discovered some time before the submission deadline that were submitted to the CIG phase of the competition;
that two of the submitted controllers were able to clear in several cases, preliminary versions were submitted to
all levels for some random seeds. We therefore modified the ICE-GIC phase. The winning controller is described in
the CompetitionScore class so as to make it possible to some detail, whereas the other controllers are given shorter
differentiate better between high-scoring controllers. First of descriptions due to space consideration. In some cases, a
all, we increased the number of levels to 40, and varied controller is either very similar to an existing controller or
the length of the levels stochastically, so that controllers very little information was provided by the competitor; those
could not be optimized for a fixed level length. In case two controllers are described only very briefly here.
controllers still cleared all 40 levels, we defined three tie- 4 http://www.newscientist.com/article/dn17560-race-is-on-to-evolve-the-
breakers: game-time (not clock-time) left at the end of all 40 ultimate-mario.html
5 http://www.msnbc.msn.com/id/32451009/ns/technology and science-
levels, number of total kills, and mode sum (the sum of all
science/
Mario modes at the end of levels, where 0=small, 1=big and 6 http://www.lemonde.fr/technologies/article/2009/08/07/quand-c-est-l-
2=fire; a high mode sum indicates that the player has taken ordinateur-qui-joue-a-mario 1226413 651865.html
little damage). So if two controllers both cleared all levels, 7 http://www.youtube.com/watch?v=DlkMs4ZHHr8

A. Robin Baumgarten
As the goal of the competition was to complete as many
levels as possible, as fast as possible, without regarding
points or other bonuses, the problem can be seen as a path
optimisation problem: What is the quickest route through
the environment that doesn’t get Mario killed? A* search
is a well known, simple and fast algorithm to find shortest
paths which seemed perfect for the Mario competition. As
it turned out, A* search is fast and flexible enough to find
routes through the generated levels in real-time.
The implementation process can be separated into three
phases: (a) building a physics simulation including world
Fig. 3. Illustration of node generation in A* search. Each neighbour node
states and object movement, (b) using this simulation in an in the search is given by a combination of possible input commands.
A* planning algorithm, and (c) optimising the search engine
to fulfill real-time requirements with partial information.
a) Simulating the Game Physics: Due to being open-
source, the entire physics engine of Infinite Mario Bros is
directly accessible and can be used to simulate future world
states that correlate very closely with the actual future world
state. This was achieved by copying the entire physics engine
to the controller and removing all unnecessary parts, such as
rendering and some navigation code.
Associating simulated states of enemies with the received
coordinates from the API provided an implementation chal-
lenge. This association is needed because the API provides
only the location and not the current speed, which has to
be derived from consecutive recordings. Without accurate
enemy behaviour simulation, this can be difficult, especially
if there are several enemies in a close area.
Fig. 4. Illustration of node placement in A* search in Infinite Mario.
b) A* for Infinite Mario: The A* search algorithm is a The best node in the previous simulation step (right, speed) is inaccessible
widely used best-first graph search algorithm that finds a path because it contains an obstacle. Thus the next best node is chosen, which
results in a jump over the obstacle. The search process continues by
with the lowest cost between a pre-defined start node and one generating all possible neighbours. The speed of Mario at the current node
out of possibly several goal-nodes[7]. A* uses a heuristic distorts the positions of all following nodes.
that estimates the remaining distance to the goal nodes in
order to determine the sequence of nodes that are searched by
the algorithm. It does so by adding the estimated remaining executes during this simulation step. See figures 3 and 4
distance to the previous path cost from the start node to for an illustration of a typical scenario, figure 5 for an
the current node. This heuristic should be admissible (not in-game visualization. Note how backtracking takes place
overestimate this distance) for optimality to be guaranteed. when an obstacle is discovered, and how the location of the
However, if only near-optimality is required, this constraint subsequent neighbours is influenced by the current speed of
can be relaxed to speed up path-finding. With a heuristic that Mario in figure 4.
overestimates by x, the path will be at most x too long[8]. As only a small window of the world around Mario is
To implement an A* search for Infinite Mario, we have visible in the competition API, the level structure outside of
to define what a node and a neighbour of a node is, which this window is unknown. Therefore it does not make sense
nodes are goal nodes, and how the heuristic estimates the to plan further ahead than the edge of the current window.
remaining distance. Thus, a goal node for our planner is defined as any node
A node is defined by the entire world-state at a given mo- where Mario is outside (or just before) the right border of
ment in time, mainly characterised through Mario’s position, the window of the known environment.
speed, and state. Furthermore, (re)movable objects in the In the standard implementation of A*, the heuristic esti-
environment have to be taken into account, such as enemies, mates the remaining distance to the target. In Infinite Mario,
power-ups, coins and boxes. The only influence we have on one could just translate this as the horizontal distance to the
this environment is interaction through actions that Mario right border of the window. However, this does not lead to
can perform. These Mario actions consist of combinations a very accurate heuristic, as it ignores the current speed
of movements such as left, right, duck, jump and fire. of Mario. To make sure the bot has an admissible (i.e.,
Neighbours of a node are given by the world state after underestimating) heuristic, it would have to assume Mario
one (further) simulation step, applying an action that Mario runs with maximum speed towards the goal, which is often

B. Other A*-based controllers
Two other controllers were submitted that were based on
the A* algorithm. One was submitted by Peter Lawford
and the other by a team consisting of Andy Sloane, Caleb
Anderson and Peter Burns (we will refer to this team with
Andy’s name in the score tables). These submission were
inspired by Robin’s controller and used a similar overall
approach, but differed significantly in implementation.
C. Other hand-coded controllers
The majority of submitted controllers were hand-coded
and did (to our best knowledge) not use any learning al-
gorithms, nor much internal simulation of the environment.
Most of these continuously run rightwards but use heuristics
Fig. 5. Visualization of the future paths considered by the A* controller. to decide when and how to jump. Some were built on
Each read line shows a possible future trajectory for Mario, taking the one of the standard heuristic controllers supplied with the
dynamic nature of the world into account. competition software. The following are very (due to space
limitations) brief characterizations of each:
1) Trond Ellingsen: Rule-based controller. Determines a
not the case. Instead, a more accurate representation of the “danger level” of each gap or enemy and acts based on this.
distance to the goal can be given by simulating the time 2) Sergio Lopez: Rule-based. Answers the questions
required to reach the right border of the window. Here, the “should I jump?” and “which type of jump?” heuristically,
current speed of Mario is taken into account. The quickest by evaluating danger value and possible landing points.
solution to get to the goal would then be to accelerate as 3) Spencer Schumann: A standard reactive heuristic con-
fast as possible, and taking the required time as a heuristic. troller from the example software augmented with a calcula-
Similarly, the previous distance of a node to the starting node tion of desired jump length based on internal simulation of
is simply the time it took to reach the current node. Mario’s movement. Incomplete tracking of enemy positions.
c) Variable Optimisation: While the A* search al- 4) Mario Perez: Subsumption controller, inspired by con-
gorithm is quite solid and guarantees optimality, certain trollers used in behaviour-based robotics.
restrictions need to be put on its execution time to stay within 5) Michal Tulacek: Finite state machine with four states:
the allowed 40ms for each game update. These restrictions walk-forward, walk-backward, jump and jump-hole.
will likely lead to a non-optimal solution, so careful testing 6) Rafel Oliveira: Reactive controller; did not submit any
has to be undertaken to ensure that the search terminates. documentation.
7) Glenn Hartmann: Based on an example controller.
The first restriction used was to stop looking for solutions
Shoots continuously, jumps whenever needed.
when the time-limit had been reached, and using the node
with the best heuristic so far as a goal node. The linear level D. Learning-based controllers
design created by the level generator in Infinite Mario favors A number of controllers were submitted that were devel-
this approach, as it does not feature dead ends that would oped using learning algorithms in one way or another. These
require back-tracking of suboptimal paths. controllers exhibit a fascinating diversity; still, due to space
The requirement that the heuristic has to be admissible considerations and paucity of submitted documentation, also
has also been relaxed. A slightly overestimating heuristic in- these controllers will only be described summarily.
creases the speed of the algorithm in expense of accuracy[8]. 1) Matthew Erickson: Controller represented as expres-
In detail, the estimation of the remaining distance can be sion tree, evolved with fairly standard crossover-heavy Ge-
multiplied with a factor w >= 1. Experimentation with netic Programming, using CompetitionScore as fitness. The
different values for w, led to an optimal factor of w = 1.11. tree evaluates to four boolean values: left/right, jump, run
Another factor that has an effect on the processing time and duck. Nonterminal nodes where chosen among standard
required by A* is the frequency of plan recalculation. As arithmetic and conditional functions. The terminals were
mentioned above, limited information requires a recalculation simple hard-coded feature detectors.
of the plan once new information becomes available, i.e., 2) Douglas Hawkins: Based on a stack-based virtual
obstacles or enemies enter the window of visible information. machine, evolved with a genetic algorithm.
Experimentation indicated that the best balance between 3) Alexandru Paler: An intriguing combination of imita-
planning ahead and restarting the planning process to in- tion learning, based on data acquired from human playing,
corporate new information is given when the plan is re- and path-finding. A* is used to find the route to the end
created every two game updates. Planning longer in advance of the screen; the intermediate positions form inputs to a
occasionally led to reacting too late to previously unknown neural network (trained on human playing) which returns
threats, resulting in a lost life or slowdown of Mario. the number and type of key presses necessary to get there.

4) Sergey Polikarpov: Based on the “Cyberneuron” archi-                     VIII. D ISCUSSION AND ANALYSIS
tecture8 and trained with a form of reinforcement learning.          While the objective for the competitors was to design
A number of action sequences are generated, and each               or learn a controller that played Infinite Mario Bros as
is associated with a neuron; this neuron is penalized or           well as possible, the objective for the organizers was to
rewarded depending on Mario’s performance while the action         organize a competition that accurately tested the efficacy of
sequence is being executed.                                        various controller representations and learning algorithms for
   5) Erek Speed: Rule-based controller, evolved with a GA.
                                                                   controlling an agent in a platform game. Here we remark on
Maps the whole observation space (22 × 22) onto the action
                                                                   what we have learned in each of these respects, using the
space, resulting in a genome of more than 100 Mb.
                                                                   software in your teaching and the future of the competition.
                        VII. R ESULTS
   The first phase of the competition was associated with          A. AI for platform games
the ICE-GIC conference in London, and the results of the              By far the most surprising outcome of the competition was
competition presented at the conference. The results are           how well the A*-based controllers performed compared to all
presented in table I, and show that Robin Baumgarten’s             other controller architectures, including controllers based on
controller performed best, very closely followed by Peter          learning algorithms. As A* is a commonly used algorithm for
Lawford’s controller and closely followed by Andy Sloane           path-finding in many types of commercial computer games
et al.’s controller. We also include a simple evolved neural       (e.g. RTS and MMORPG games), one could see this as a
network controller and a very simple hard-coded heuristic          victory over “classical” AI over more fancy CI techniques
controller (ForwardJumpingAgent which was included with            which are rarely used in the games industry. However, one
the competition software and served as inspiration for some        could also observe that the type of levels generated by the
of the competitors) for comparison; none of the agents that        level generator are much less demanding and deceiving than
were not based on A* outperformed the heuristic controller.        those found in the real Super Mario Bros games and other
                                                                   similar games. All the levels could be cleared by constantly
                                 TABLE I
                                                                   running right and jumping at the right moments, there were
       R ESULTS OF THE ICE-GIC PHASE OF THE COMPETITION .
                                                                   no hidden objects and passages, and in particular there were
              Competitor              progress   ms/step           no dead ends that would require backtracking. All the A*-
              Robin Baumgarten        17264      5.62              based agents consume considerably more processing time
              Peter Lawford           17261      6.99              when in front of vertical walls, where most action sequences
              Andy Sloane             16219      15.19
              Sergio Lopez            12439      0.04              would not lead to the right end of the screen, suggesting that
              Mario Perez             8952       0.03              A* would break down when faced with a dead end.
              Rafael Oliveira         8251       ?                    While the sort of search in game-state space that the A*
              Michael Tulacek         6668       0.03
              Erek Speed              2896       0.03
                                                                   algorithm provides is likely to be an important component in
              Glenn Hartmann          1170       0.06              any agent capable of playing arbitrary Mario levels, it will
              Evolved neural net      7805       0.04              likely need to be complemented by some other mechanism
              ForwardJumpingAgent     9361       0.0007            for higher-level planning, and the architecture will probably
                                                                   benefit from tuning by e.g. evolutionary algorithms. Further,
   The second phase of the competition was associated with         the playing style exhibited by the A*-based agents is nothing
the CIG conference in Milan, Italy, and the results presented      like that exhibited by a human player (the creepy exactness
there. For this phase, we had changed the scoring procedure        and absence of deliberation seems part of what made the
as detailed in section IV-A. A wise move, as both Robin            YouTube video of Robin’s agent so popular). How to create
Baumgarten’s and Peter Lawford’s agent managed to finish           a controller that can (learn to) play a platform game in a
all of the levels, and Andy Sloane et al.’s came very close. In    human-like style is an industrially relevant (cf. Demo Play)
compliance with our own rules, Robin rather than Peter was         problem which has not been addressed by this competition.
declared the winner because of it being faster (having more
in-game time left at the of all levels). It should be noted that   B. Competition organization
Peter’s controller was better at killing enemies, though.
                                                                      Looking at the objectives enumerated in section IV, we
   The best controller that was not based on A*, that of
                                                                   consider that the first two objectives (ease of participation
Trond Ellingsen, scored less than half of the A* agents.
                                                                   and transparency) have been fulfilled, the third (ease of
The best agent developed using some form of learning or
                                                                   finding a winner) could have been met better and that we
optimization, that of Matthew Erickson, was even further
                                                                   largely failed at meeting the fourth (depth of challenge).
down the list. This suggests a massive victory of classic AI
                                                                      Ease of participation was mainly achieved through having
approaches over CI techniques. (At least as long as one does
                                                                   a simple web page, simple interfaces and letting all com-
not care much about computation time; if score is divided
                                                                   petition software be open source. Participation was greatly
by average time taken per time step, the extremely simple
                                                                   increased through the very successful media campaign, built
heuristic ForwardJumpingAgent wins the competition...)
                                                                   on social media. Transparency was achieved through forcing
  8 http://arxiv.org/abs/0907.0229                                 all submissions to be open source and publishing them on

TABLE II
R ESULTS OF THE ICE-GIC PHASE OF THE COMPETITION . E XPLANATION OF THE ACRONYMS IN THE “APROACH ” COLUMN : RB: RULE - BASED , GP:
GENETIC PROGRAMMING , NN: NEURAL NETWORK , SM: STATE MACHINE , L RS : LAYERED CONTROLLER , GA: GENETIC ALGORITHM .

Competitor approach progress levels time left kills mode
Robin Baumgarten A* 46564.8 40 4878 373 76
Peter Lawford A* 46564.8 40 4841 421 69
Andy Sloane A* 44735.5 38 4822 294 67
Trond Ellingsen RB 20599.2 11 5510 201 22
Sergio Lopez RB 18240.3 11 5119 83 17
Spencer Schumann RB 17010.5 8 6493 99 24
Matthew Erickson GP 12676.3 7 6017 80 37
Douglas Hawkins GP 12407.0 8 6190 90 32
Sergey Polikarpov NN 12203.3 3 6303 67 38
Mario Perez SM, Lrs 12060.2 4 4497 170 23
Alexandru Paler NN, A* 7358.9 3 4401 69 43
Michael Tulacek SM 6571.8 3 5965 52 14
Rafael Oliveira RB 6314.2 1 6692 36 9
Glenn Hartmann RB 1060.0 0 1134 8 71
Erek Speed GA out of memory

the web site after the end of the competition. However, of times on a single level, but only the score on the last
many competitors did not describe their agents in detail. attempt will count. The level will contain e.g. hidden blocks,
In future competitions, the structure of such descriptions shortcuts and dead ends. Thus, the scoring will reward
should be specified, and submissions that are not followed controllers that learn the ins and outs of a particular level.
by satisfactory descriptions should be disqualified. 3) The level generation track: The level generation track
differs substantially from the other tracks as what is tested is
C. Using the Mario AI Competition in your own teaching not controllers for the agent, but level generators that create
The Mario AI Competition web page, complete with the new Mario levels that should be fun for particular players.
competition software, rules and all submitted controllers, The generated levels will be evaluated by letting a set of
will remain in place for the foreseeable future. We actively human game testers play them live at the competition event.
encourage use of the rules and software for your own events,
R EFERENCES
and have noted that at least one local Mario AI Competition
has already launched (at UC San Diego). Additionally, the [1] J. Togelius, S. M. Lucas, H. Duc Thang, J. M. Garibaldi,
T. Nakashima, C. H. Tan, I. Elhanany, S. Berant, P. Hingston,
software is used for class projects in a number of AI courses R. M. MacCallum, T. Haferlach, A. Gowrisankar, and P. Burrow,
around the world. When organizing such events, it is worth “The 2007 ieee cec simulated car racing competition,” Genetic
remembering that the existing Google Group and its archive Programming and Evolvable Machines, 2008. [Online]. Available:
http://dx.doi.org/10.1007/s10710-008-9063-0
can serve as a useful technical resource, and that the result [2] D. Loiacono, J. Togelius, P. L. Lanzi, L. Kinnaird-Heether, S. M. Lucas,
tables in this paper provide a useful point of reference. We M. Simmerson, D. Perez, R. G. Reynolds, and Y. Saez, “The WCCI
appreciate if any such events link back to the original Mario 2008 simulated car racing competition,” in Proceedings of the IEEE
Symposium on Computational Intelligence and Games, 2008.
AI Competition web page, and students are encouraged to [3] J. Togelius, S. Karakaovskiy, J. Koutnik, and J. Schmidhuber, “Super
submit their agents to the next iteration of the competition. mario evolution,” in Proceedings of IEEE Symposium on Computational
Intelligence and Games (CIG), 2009.
D. Future competitions [4] K. Compton and M. Mateas, “Procedural level design for platform
games,” in Proceedings of the Artificial Intelligence and Interactive
The 2010 Mario AI Championship, like the 2009 compe- Digital Entertainment International Conference (AIIDE), 2006.
tition, uses a single website9 but is divided into three tracks: [5] G. Smith, M. Treanor, J. Whitehead, and M. Mateas, “Rhythm-based
level generation for 2d platformers,” in Proceedings of the International
1) The gameplay track: Similarly to the 2009 competition, Conference on Foundations of Digital Games, 2009.
this track is aimed at producing the controller that gets [6] C. Pedersen, J. Togelius, and G. Yannakakis, “Modeling player expe-
furthest on a sequence of levels. However, the competition rience in super mario bros,” in Proceedings of IEEE Symposium on
Computational Intelligence and Games (CIG), 2009.
software is modified so that some of the levels are substan- [7] P. Hart, N. Nilsson, and B. Raphael, “A formal basis for the heuristic
tially harder than the hardest levels in last year’s competition. determination of minimum cost paths,” IEEE transactions on Systems
2) The learning track: This track is similar to the game- Science and Cybernetics, vol. 4, no. 2, pp. 100–107, 1968.
[8] I. Millington and J. Funge, Artificial Intelligence for Games. Morgan
play track, but favours controllers that perform online learn- Kaufmann Pub, 2009.
ing. Each controller will be tested a large number (e.g. 1000)
9 http://www.marioai.org

You can also read