The 2009 Mario AI Competition
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
The 2009 Mario AI Competition Julian Togelius, Sergey Karakovskiy and Robin Baumgarten Abstract— This paper describes the 2009 Mario AI Competi- benchmarking of learning and other AI algorithms. Bench- tion, which was run in association with the IEEE Games Inno- marking is frequently done using very simple testbed prob- vation Conference and the IEEE Symposium on Computational lems, that might or might not capture the complexity of real- Intelligence and Games. The focus of the competition was on developing controllers that could play a version of Super Mario world problems. When researchers report results on more Bros as well as possible. We describe the motivations for holding complex problems, the technical complexities of access- this competition, the challenges associated with developing ing, running and interfacing to the benchmarking software artificial intelligence for platform games, the software and might prevent independent validation of and comparison API developed for the competition, the competition rules and with the published results. Here, competitions have the role organization, the submitted controllers and the results. We conclude the paper by discussing what the outcomes of the of providing software, interfaces and scoring procedures to competition can teach us both about developing platform game fairly and independently evaluate competing algorithms and AI and about organizing game AI competitions. The first two development methodologies. authors are the organizers of the competition, while the third Another strong incentive for running these competitions author is the winner of the competition. is that they motivate researchers. Existing algorithms get Keywords: Mario, platform games, competitions, A*, applied to new areas, and the effort needed to participate in evolutionary algorithms a competition is (or at least, should be) less than it takes to I. I NTRODUCTION write new experimental software, do experiments and write a completely new paper. Competitions might even bring new For the past several years, a number of game-related researchers into the computational intelligence fields, both competitions have been organized in conjunction with major academics and non-academics. One of the reasons for this is international conferences on computational intelligence (CI) that game-based competitions simply look cool. and artificial intelligence for games (Game AI). In these The particular competition described in this paper is competitions, competitors are invited to submit their best similar in aims and organization to some other game-related controllers or strategies for a particular game; the controllers competitions (in particular the simulated car racing compe- are then ranked based on how well they play the game, alone titions) but differs in that it is built on a platform game, and or in competition with other controllers. The competitions thus relates to the particular challenges an agent faces while are based on popular board games (such as Go and Othello) playing such games. or video games (such as Pac-Man, Unreal Tournament and various car racing games [1], [2]). A. AI for platform games In most of these competitions, competitors submit con- Platform games can be defined as games where the player trollers that interface to an API built by the organizers of controls a character/avatar, usually with humanoid form, in the competition. The winner of the competition becomes the an environment characterized by differences in altitude be- person or team that submitted the controller that played the tween surfaces (“platforms”) interspersed by holes/gaps. The game best, either on its own (for single-player games such as character can typically move horizontally (walk) and jump, Pac-Man) or against others (in adversarial games such as Go). and sometimes perform other actions as well; the game world Usually, prizes of a few hundred US dollars are associated features gravity, meaning that it is seldom straightforward to with each competition, and a certificate is always awarded. negotiate large gaps or altitude differences. There is no requirement that the submitted controllers be To our best knowledge, there have not been any previous based on any particular type of algorithm, but in many cases competitions focusing on platform game AI. The only pub- the winners turn out to include computational intelligence lished paper on AI for platform games we know of is a recent (typically neural networks and/or evolutionary computation) paper by the first two authors of the current paper, where we in one form or another. The submitting teams tend to described experiments in evolving neural network controllers comprise both students, faculty members and persons not for the same game as was used in the competition, using currently in academia (e.g. working as software developers). an earlier version of the API [3]. Some other papers have There are several reasons for holding such competitions described uses of AI techniques for automatic generation of as part of the regular events organized by the computational levels for platform games [4], [5], [6]. intelligence community. A main motivation is to improve Most commercial platform games incorporate little or no JT is with IT University of Copenhagen, Rued Langgaards Vej 7, AI. The main reason for this is probably that most platform 2300 Copenhagen S, Denmark. SK is with IDSIA, Galleria 2, 6928 games are not adversarial; a single player controls a single Manno-Lugano, Switzerland. RB is with Imperial College of Science and Technology, London, United Kingdom. Emails: julian@togelius.com, character who makes its way through a sequence of levels, sergey@idsia.ch, robin.baumgarten06@doc.ic.ac.uk with his success dependent only on the player’s skill. The
obstacles that have to be overcome typically revolve around shells if jumped on, and these shells can then be picked up the environment (gaps to be jumped over, items to be found by Mario and thrown at other enemies to kill them. etc) and NPC enemies; however, in most platform games Certain items are scattered around the levels, either out in these enemies move according to preset patterns or simple the open, or hidden inside blocks of brick and only appearing homing behaviours. when Mario jumps at these blocks from below so that he Though apparently an under-studied topic, artificial intelli- smashes his head into them. Available items include coins, gence for controlling the player character in platform games mushrooms which make Mario grow Big, and flowers which is certainly not without interest. From a game development make Mario turn into the Fire state if he is already Big. perspective, it would be valuable to be able to automatically A. Automatic level generation create controllers that play in the style of particular human players. This could be used both to guide players when While implementing most features of Super Mario Bros, they get stuck (cf. Nintendo’s recent “Demo Play” feature, the standout feature of Infinite Mario Bros is the automatic introduced to cope with the increasingly diverse demographic generation of levels. Every time a new game is started, distribution of players) and to automatically test new game levels are randomly generated by traversing a fixed width levels and features as part of an algorithm to automatically and adding features (such as blocks, gaps and opponents) tune or create content for a platform game. according to certain heuristics. The level generation can be From an AI and reinforcement learning perspective, plat- parameterized, including the desired difficulty of the level, form games represent interesting challenges as they have which affects the number and placement of holes, enemies high-dimensional state and observation spaces and relatively and obstacles. In our modified version of Infinite Mario high-dimensional action spaces, and require the execution of Bros we can specify the random seed of the level generator, different skills in sequence. Further, they can be made into making sure that any particular level can be recreated by good testbeds as they can typically be executed much faster simply using the same see. Unfortunately, the current level than real time and tuned to different difficulty levels. We generation algorithm is somewhat limited; for example, it will go into more detail on this in the next section, where we cannot produce levels that include dead ends, which would describe the specific platform game used in this competition. require back-tracking to get out of. II. I NFINITE M ARIO B ROS B. Challenges for AIs (and humans) The competition uses a modified version of Markus Pers- Several features make Super Mario Bros particularly inter- son’s Infinite Mario Bros, which is a public domain clone esting from an AI perspective. The most important of these is of Nintendo’s classic platform game Super Mario Bros. The the potentially very rich and high-dimensional environment original Infinite Mario Bros is playable on the web, where representation. When a human player plays the game, he Java source code is also available1 . views a small part of the current level from the side, The gameplay in Super Mario Bros consists in moving the with the screen centered on Mario. Still, this view often player-controlled character, Mario, through two-dimensional includes dozens of objects such as brick blocks, enemies levels, which are viewed sideways. Mario can walk and run to and collectable items. The static environment (grass, pipes, the right and left, jump, and (depending on which state he is brick blocks etc.) and the coins are laid out in a grid (of in) shoot fireballs. Gravity acts on Mario, making it necessary which the standard screen covers approximately 22 ∗ 22 to jump over holes to get past them. Mario can be in one of cells), whereas moving items (most enemies, as well as the three states: Small, Big (can crush some objects by jumping mushroom power-ups) move continuously at pixel resolution. into them from below), and Fire (can shoot fireballs). The action space, while discrete, is also rather large. In The main goal of each level is to get to the end of the the original Nintendo game, the player controls Mario with level, which means traversing it from left to right. Auxiliary a D-pad (up, down, right, left) and two buttons (A, B). The A goals include collecting as many as possible of the coins button initiates a jump (the height of the jump is determined that are scattered around the level, finishing the level as fast partly by how long it is pressed); the B button initiates as possible, and collecting the highest score, which in part running mode and, if Mario is in the Fire state, shoots a depends on number of collected coins and killed enemies. fireball. Disregarding the unused up direction, this means that Complicating matters is the presence of holes and moving the information to be supplied by the controller at each time enemies. If Mario falls down a hole, he loses a life. If he step is five bits, yielding 25 = 32 possible actions, though touches an enemy, he gets hurt; this means losing a life if he some of these are nonsensical (e.g. left together with right). is currently in the Small state. Otherwise, his state degrades Another interesting feature is that there is a smooth learn- from Fire to Big or from Big to Small. However, if he jumps ing curve between levels, both in terms of which behaviours and lands on an enemy, different things could happen. Most are necessary and their required degree of refinement. For enemies (e.g. goombas, cannon balls) die from this treatment; example, to complete a very simple Mario level (with no others (e.g. piranha plants) are not vulnerable to this and enemies and only small and few holes and obstacles) it might proceed to hurt Mario; finally, turtles withdraw into their be enough to keep walking right and jumping whenever there is something (hole or obstacle) immediately in front 1 http://www.mojang.com/notch/mario/ of Mario. A controller that does this should be easy to
// always the same dimensionality 22x22 // always centered on the agent public byte[][] getCompleteObservation(); public byte[][] getEnemiesObservation(); public byte[][] getLevelSceneObservation(); public float[] getMarioFloatPos(); public float[] getEnemiesFloatPos(); public boolean isMarioOnGround(); public boolean mayMarioJump(); Fig. 1. The Environment Java interface, which contains the observation, i.e the information the controller can use to decide which action to take. learn. To complete the same level while collecting as many as possible of the coins present on the same level likely demands some planning skills, such as smashing a power- up block to retrieve a mushroom that makes Mario Big so Fig. 2. Visualization of the granularity of the sensor grid. The top six environment sensors would output 0 and the lower three input 1. All of that he can retrieve the coins hidden behind a brick block, the enemy sensors would output 0, as even if all 49 enemy sensors were and jumping up on a platform to collect the coins there and consulted none of them would reach all the way to the body of the turtle, then going back to collect the coins hidden under it. More which is four blocks below Mario. None of the simplified observations register the coins, but the complete observation would. Note that larger advanced levels, including most of those in the original Super sensor grids can be used (up to 22 × 22), if the controller can handle the Mario Bros game, require a varied behaviour repertoire just information. to complete. These levels might include concentrations of enemies of different kinds which can only be passed by timing Mario’s passage precisely; arrangements of holes and IV. C OMPETITION ORGANIZATION AND RULES platforms that require complicated sequences of jumps to The organization and rules of the competition sought to pass; dead ends that require backtracking; and so on. fulfill the following objectives: 1) Ease of participation. We wanted researchers of dif- III. C OMPETITION API ferent kinds and of different levels of experience to In order to be able to run the competition (and in order to be able to participate, students as well as withered use the game for other experiments), the organizers modified professors who haven’t written much code in a decade. Infinite Mario Bros rather heavily and constructed an API 2) Transparency. We wanted as much as possible to that would enable it to be easily interfaced to learning be publicly known about the competing entries, the algorithms and competitors’ controllers. The modifications benchmark software, the organization of the competi- included removing the real-time element of the game so that tion etc. This can be seen as an end in itself, but a it can be “stepped” forward by the learning algorithm, re- more transparent competition also makes it easier to moving the dependency on graphical output, and substantial detect cheaters, to exploit the scientific results of the refactoring (the developer of the game did not anticipate that competition and to reuse the software developed for it. the game would be turned into an RL benchmark). Each time 3) Ease of finding a winner. We wanted it to be unam- step, which corresponds to 40 milliseconds of simulated time biguously clear how to rank all submitted entries and (an update frequency of 25 fps), the controller receives a who won the competition. description of the environment, and outputs an action. The 4) Depth of challenge. We wanted there to be a real score resulting software is a single-threaded Java application that difference between controllers of different quality, both can easily be run on any major hardware architecture and at the top and the bottom of the high-score table. operating system, with the key methods that a controller The competition web page hosts the rules, the down- needs to implement specified in a single Java interface file loadable software and the final results of the competition2 . (see figure 1). On an iMac from 2007, 5 − 20 full levels Additionally, a Google Group was set up to which all can be played per second (several thousand times faster than technical and rules questions were to be posted, so that they real-time). A TCP interface for controllers is also provided. came to the attention of and could be answered by both Figure 2 shows the size and layout relative to Mario of the organizers and other competitors3 , and where the organizers sensor grid. For each grid cell, the controller can choose to posted news about the competition. The searchable archive read the complete observation (using the getCompleteObser- of the discussion group functions as a repository of technical vation() method) that returns an integer value representing information about the competition. exactly what occupies that part of the environment (e.g. a Competitors were free to submit controllers written in any section of a pipe), or one of the simplified observations programming language and using any development method- (getEnemiesObservation() and getLevelSceneObservation()) that simply returns a zero or one signifying the presence of 2 http://julian.togelius.com/mariocompetition2009 an enemy or an impassable part of the environment. 3 http://groups.google.com/group/marioai
ology, as long as they could interface to an unmodified that one that took the least time to do so would win, and if version of the competition software and control Mario in both took the same time the most efficient killer would win real time on a standard desktop PC running Mac OS X or etc. Windows XP. For competitors using only Java, there was a V. M EDIA STRATEGY AND COVERAGE standardized submission format. Any submission that didn’t follow this format needed to be accompanied by detailed The competition got off of to a slow start. The organizers instructions for how to run it. Additionally, the submission did not put a final version of the competition software online needed to be accompanied by the score the competitors had until relatively late, and neglected advertising the competition recorded themselves using the CompetitionScore class that until about a month and a half before the deadline of the was part of the competition software, so that the organizers first phase. Once it occurred to the organizers to advertise could verify that the submission ran as intended on their sys- their competition, they first tried the usual channels, mainly tems. We also urged each competitor to submit a description mailing lists for academics working within computational of how the controller works as a text file. intelligence. One of the organizers then posted a link to the Competitors were urged to submit their controllers early, competition on his Facebook profile, which led one of his and then re-submit improved versions. This was so that any friends to submit the link to the social media site Digg. This problems that would have disqualified the controllers could link was instantly voted up on the front page of the site, and be detected and rectified and the controllers resubmitted. No a vigorous discussion ensued on the link’s comment page. submissions or resubmissions at all were accepted after the This also meant 20000 visitors to the competition website in deadline (about a week before each competition event). The the first day, some of which joined the discussion group and most common problems that mandated resubmission were: announced their intention to participate. Seeing the success of this, the organizers submitted a story about the competition to • The controller used a nonstandard submission format, the technology news site Slashdot. Slashdot is regularly read and no running instructions were provided. by mainstream and technology news media, which meant • The controller was submitted together with and entan- that within days there were stories about the competition in gled in a complete version of the competition software. (among others) New Scientist4 , MSNBC5 and Le Monde6 . • The controller timed out, i.e. took more than 40 ms per At this point, Robin Baumgarten had already finished a time step on average on some level. first version of his controller, and posted a video on YouTube of his controller guiding Mario through a level, complete A. Scoring with a visualization of the alternate path the A* algorithm All entries were scored before the conference through were evaluating at each time step7 . The video was apparently running them on 10 levels of increasing difficulty, and so enjoyable to watch that it quickly reached the top lists, using the total distance travelled on these levels as the and has at the time of writing been viewed over 600000 score. The scoring procedure was deterministic, as the same times. The success of this video was a bit of a double-edged random seed was used for all controllers, except in the sword; it drew hordes of people to the competition website, few cases where the controller was nondeterministic. The and led to several announcements of intention to participate CompetitionScore method uses a supplied random number (though far from everybody who said they would eventually seed to generate the levels. Competitors were asked to submitted a controller), but it also caused some competitors score their own submissions with seed 0 so that this score to drop out as Robin’s controller was so impressive that they could be verified by the organizers, but the seed used for considered themselves to not stand a chance to win. the competition scoring was not generated until after the VI. S UBMITTED CONTROLLERS submission deadline, so that competitors could not optimize their controllers for beating a particular sequence of levels. This section describes the controllers that were submitted For the second phase of the competition (the CIG phase) to the competition. We describe the versions of the controllers we discovered some time before the submission deadline that were submitted to the CIG phase of the competition; that two of the submitted controllers were able to clear in several cases, preliminary versions were submitted to all levels for some random seeds. We therefore modified the ICE-GIC phase. The winning controller is described in the CompetitionScore class so as to make it possible to some detail, whereas the other controllers are given shorter differentiate better between high-scoring controllers. First of descriptions due to space consideration. In some cases, a all, we increased the number of levels to 40, and varied controller is either very similar to an existing controller or the length of the levels stochastically, so that controllers very little information was provided by the competitor; those could not be optimized for a fixed level length. In case two controllers are described only very briefly here. controllers still cleared all 40 levels, we defined three tie- 4 http://www.newscientist.com/article/dn17560-race-is-on-to-evolve-the- breakers: game-time (not clock-time) left at the end of all 40 ultimate-mario.html 5 http://www.msnbc.msn.com/id/32451009/ns/technology and science- levels, number of total kills, and mode sum (the sum of all science/ Mario modes at the end of levels, where 0=small, 1=big and 6 http://www.lemonde.fr/technologies/article/2009/08/07/quand-c-est-l- 2=fire; a high mode sum indicates that the player has taken ordinateur-qui-joue-a-mario 1226413 651865.html little damage). So if two controllers both cleared all levels, 7 http://www.youtube.com/watch?v=DlkMs4ZHHr8
A. Robin Baumgarten As the goal of the competition was to complete as many levels as possible, as fast as possible, without regarding points or other bonuses, the problem can be seen as a path optimisation problem: What is the quickest route through the environment that doesn’t get Mario killed? A* search is a well known, simple and fast algorithm to find shortest paths which seemed perfect for the Mario competition. As it turned out, A* search is fast and flexible enough to find routes through the generated levels in real-time. The implementation process can be separated into three phases: (a) building a physics simulation including world Fig. 3. Illustration of node generation in A* search. Each neighbour node states and object movement, (b) using this simulation in an in the search is given by a combination of possible input commands. A* planning algorithm, and (c) optimising the search engine to fulfill real-time requirements with partial information. a) Simulating the Game Physics: Due to being open- source, the entire physics engine of Infinite Mario Bros is directly accessible and can be used to simulate future world states that correlate very closely with the actual future world state. This was achieved by copying the entire physics engine to the controller and removing all unnecessary parts, such as rendering and some navigation code. Associating simulated states of enemies with the received coordinates from the API provided an implementation chal- lenge. This association is needed because the API provides only the location and not the current speed, which has to be derived from consecutive recordings. Without accurate enemy behaviour simulation, this can be difficult, especially if there are several enemies in a close area. Fig. 4. Illustration of node placement in A* search in Infinite Mario. b) A* for Infinite Mario: The A* search algorithm is a The best node in the previous simulation step (right, speed) is inaccessible widely used best-first graph search algorithm that finds a path because it contains an obstacle. Thus the next best node is chosen, which results in a jump over the obstacle. The search process continues by with the lowest cost between a pre-defined start node and one generating all possible neighbours. The speed of Mario at the current node out of possibly several goal-nodes[7]. A* uses a heuristic distorts the positions of all following nodes. that estimates the remaining distance to the goal nodes in order to determine the sequence of nodes that are searched by the algorithm. It does so by adding the estimated remaining executes during this simulation step. See figures 3 and 4 distance to the previous path cost from the start node to for an illustration of a typical scenario, figure 5 for an the current node. This heuristic should be admissible (not in-game visualization. Note how backtracking takes place overestimate this distance) for optimality to be guaranteed. when an obstacle is discovered, and how the location of the However, if only near-optimality is required, this constraint subsequent neighbours is influenced by the current speed of can be relaxed to speed up path-finding. With a heuristic that Mario in figure 4. overestimates by x, the path will be at most x too long[8]. As only a small window of the world around Mario is To implement an A* search for Infinite Mario, we have visible in the competition API, the level structure outside of to define what a node and a neighbour of a node is, which this window is unknown. Therefore it does not make sense nodes are goal nodes, and how the heuristic estimates the to plan further ahead than the edge of the current window. remaining distance. Thus, a goal node for our planner is defined as any node A node is defined by the entire world-state at a given mo- where Mario is outside (or just before) the right border of ment in time, mainly characterised through Mario’s position, the window of the known environment. speed, and state. Furthermore, (re)movable objects in the In the standard implementation of A*, the heuristic esti- environment have to be taken into account, such as enemies, mates the remaining distance to the target. In Infinite Mario, power-ups, coins and boxes. The only influence we have on one could just translate this as the horizontal distance to the this environment is interaction through actions that Mario right border of the window. However, this does not lead to can perform. These Mario actions consist of combinations a very accurate heuristic, as it ignores the current speed of movements such as left, right, duck, jump and fire. of Mario. To make sure the bot has an admissible (i.e., Neighbours of a node are given by the world state after underestimating) heuristic, it would have to assume Mario one (further) simulation step, applying an action that Mario runs with maximum speed towards the goal, which is often
B. Other A*-based controllers Two other controllers were submitted that were based on the A* algorithm. One was submitted by Peter Lawford and the other by a team consisting of Andy Sloane, Caleb Anderson and Peter Burns (we will refer to this team with Andy’s name in the score tables). These submission were inspired by Robin’s controller and used a similar overall approach, but differed significantly in implementation. C. Other hand-coded controllers The majority of submitted controllers were hand-coded and did (to our best knowledge) not use any learning al- gorithms, nor much internal simulation of the environment. Most of these continuously run rightwards but use heuristics Fig. 5. Visualization of the future paths considered by the A* controller. to decide when and how to jump. Some were built on Each read line shows a possible future trajectory for Mario, taking the one of the standard heuristic controllers supplied with the dynamic nature of the world into account. competition software. The following are very (due to space limitations) brief characterizations of each: 1) Trond Ellingsen: Rule-based controller. Determines a not the case. Instead, a more accurate representation of the “danger level” of each gap or enemy and acts based on this. distance to the goal can be given by simulating the time 2) Sergio Lopez: Rule-based. Answers the questions required to reach the right border of the window. Here, the “should I jump?” and “which type of jump?” heuristically, current speed of Mario is taken into account. The quickest by evaluating danger value and possible landing points. solution to get to the goal would then be to accelerate as 3) Spencer Schumann: A standard reactive heuristic con- fast as possible, and taking the required time as a heuristic. troller from the example software augmented with a calcula- Similarly, the previous distance of a node to the starting node tion of desired jump length based on internal simulation of is simply the time it took to reach the current node. Mario’s movement. Incomplete tracking of enemy positions. c) Variable Optimisation: While the A* search al- 4) Mario Perez: Subsumption controller, inspired by con- gorithm is quite solid and guarantees optimality, certain trollers used in behaviour-based robotics. restrictions need to be put on its execution time to stay within 5) Michal Tulacek: Finite state machine with four states: the allowed 40ms for each game update. These restrictions walk-forward, walk-backward, jump and jump-hole. will likely lead to a non-optimal solution, so careful testing 6) Rafel Oliveira: Reactive controller; did not submit any has to be undertaken to ensure that the search terminates. documentation. 7) Glenn Hartmann: Based on an example controller. The first restriction used was to stop looking for solutions Shoots continuously, jumps whenever needed. when the time-limit had been reached, and using the node with the best heuristic so far as a goal node. The linear level D. Learning-based controllers design created by the level generator in Infinite Mario favors A number of controllers were submitted that were devel- this approach, as it does not feature dead ends that would oped using learning algorithms in one way or another. These require back-tracking of suboptimal paths. controllers exhibit a fascinating diversity; still, due to space The requirement that the heuristic has to be admissible considerations and paucity of submitted documentation, also has also been relaxed. A slightly overestimating heuristic in- these controllers will only be described summarily. creases the speed of the algorithm in expense of accuracy[8]. 1) Matthew Erickson: Controller represented as expres- In detail, the estimation of the remaining distance can be sion tree, evolved with fairly standard crossover-heavy Ge- multiplied with a factor w >= 1. Experimentation with netic Programming, using CompetitionScore as fitness. The different values for w, led to an optimal factor of w = 1.11. tree evaluates to four boolean values: left/right, jump, run Another factor that has an effect on the processing time and duck. Nonterminal nodes where chosen among standard required by A* is the frequency of plan recalculation. As arithmetic and conditional functions. The terminals were mentioned above, limited information requires a recalculation simple hard-coded feature detectors. of the plan once new information becomes available, i.e., 2) Douglas Hawkins: Based on a stack-based virtual obstacles or enemies enter the window of visible information. machine, evolved with a genetic algorithm. Experimentation indicated that the best balance between 3) Alexandru Paler: An intriguing combination of imita- planning ahead and restarting the planning process to in- tion learning, based on data acquired from human playing, corporate new information is given when the plan is re- and path-finding. A* is used to find the route to the end created every two game updates. Planning longer in advance of the screen; the intermediate positions form inputs to a occasionally led to reacting too late to previously unknown neural network (trained on human playing) which returns threats, resulting in a lost life or slowdown of Mario. the number and type of key presses necessary to get there.
4) Sergey Polikarpov: Based on the “Cyberneuron” archi- VIII. D ISCUSSION AND ANALYSIS tecture8 and trained with a form of reinforcement learning. While the objective for the competitors was to design A number of action sequences are generated, and each or learn a controller that played Infinite Mario Bros as is associated with a neuron; this neuron is penalized or well as possible, the objective for the organizers was to rewarded depending on Mario’s performance while the action organize a competition that accurately tested the efficacy of sequence is being executed. various controller representations and learning algorithms for 5) Erek Speed: Rule-based controller, evolved with a GA. controlling an agent in a platform game. Here we remark on Maps the whole observation space (22 × 22) onto the action what we have learned in each of these respects, using the space, resulting in a genome of more than 100 Mb. software in your teaching and the future of the competition. VII. R ESULTS The first phase of the competition was associated with A. AI for platform games the ICE-GIC conference in London, and the results of the By far the most surprising outcome of the competition was competition presented at the conference. The results are how well the A*-based controllers performed compared to all presented in table I, and show that Robin Baumgarten’s other controller architectures, including controllers based on controller performed best, very closely followed by Peter learning algorithms. As A* is a commonly used algorithm for Lawford’s controller and closely followed by Andy Sloane path-finding in many types of commercial computer games et al.’s controller. We also include a simple evolved neural (e.g. RTS and MMORPG games), one could see this as a network controller and a very simple hard-coded heuristic victory over “classical” AI over more fancy CI techniques controller (ForwardJumpingAgent which was included with which are rarely used in the games industry. However, one the competition software and served as inspiration for some could also observe that the type of levels generated by the of the competitors) for comparison; none of the agents that level generator are much less demanding and deceiving than were not based on A* outperformed the heuristic controller. those found in the real Super Mario Bros games and other similar games. All the levels could be cleared by constantly TABLE I running right and jumping at the right moments, there were R ESULTS OF THE ICE-GIC PHASE OF THE COMPETITION . no hidden objects and passages, and in particular there were Competitor progress ms/step no dead ends that would require backtracking. All the A*- Robin Baumgarten 17264 5.62 based agents consume considerably more processing time Peter Lawford 17261 6.99 when in front of vertical walls, where most action sequences Andy Sloane 16219 15.19 Sergio Lopez 12439 0.04 would not lead to the right end of the screen, suggesting that Mario Perez 8952 0.03 A* would break down when faced with a dead end. Rafael Oliveira 8251 ? While the sort of search in game-state space that the A* Michael Tulacek 6668 0.03 Erek Speed 2896 0.03 algorithm provides is likely to be an important component in Glenn Hartmann 1170 0.06 any agent capable of playing arbitrary Mario levels, it will Evolved neural net 7805 0.04 likely need to be complemented by some other mechanism ForwardJumpingAgent 9361 0.0007 for higher-level planning, and the architecture will probably benefit from tuning by e.g. evolutionary algorithms. Further, The second phase of the competition was associated with the playing style exhibited by the A*-based agents is nothing the CIG conference in Milan, Italy, and the results presented like that exhibited by a human player (the creepy exactness there. For this phase, we had changed the scoring procedure and absence of deliberation seems part of what made the as detailed in section IV-A. A wise move, as both Robin YouTube video of Robin’s agent so popular). How to create Baumgarten’s and Peter Lawford’s agent managed to finish a controller that can (learn to) play a platform game in a all of the levels, and Andy Sloane et al.’s came very close. In human-like style is an industrially relevant (cf. Demo Play) compliance with our own rules, Robin rather than Peter was problem which has not been addressed by this competition. declared the winner because of it being faster (having more in-game time left at the of all levels). It should be noted that B. Competition organization Peter’s controller was better at killing enemies, though. Looking at the objectives enumerated in section IV, we The best controller that was not based on A*, that of consider that the first two objectives (ease of participation Trond Ellingsen, scored less than half of the A* agents. and transparency) have been fulfilled, the third (ease of The best agent developed using some form of learning or finding a winner) could have been met better and that we optimization, that of Matthew Erickson, was even further largely failed at meeting the fourth (depth of challenge). down the list. This suggests a massive victory of classic AI Ease of participation was mainly achieved through having approaches over CI techniques. (At least as long as one does a simple web page, simple interfaces and letting all com- not care much about computation time; if score is divided petition software be open source. Participation was greatly by average time taken per time step, the extremely simple increased through the very successful media campaign, built heuristic ForwardJumpingAgent wins the competition...) on social media. Transparency was achieved through forcing 8 http://arxiv.org/abs/0907.0229 all submissions to be open source and publishing them on
TABLE II R ESULTS OF THE ICE-GIC PHASE OF THE COMPETITION . E XPLANATION OF THE ACRONYMS IN THE “APROACH ” COLUMN : RB: RULE - BASED , GP: GENETIC PROGRAMMING , NN: NEURAL NETWORK , SM: STATE MACHINE , L RS : LAYERED CONTROLLER , GA: GENETIC ALGORITHM . Competitor approach progress levels time left kills mode Robin Baumgarten A* 46564.8 40 4878 373 76 Peter Lawford A* 46564.8 40 4841 421 69 Andy Sloane A* 44735.5 38 4822 294 67 Trond Ellingsen RB 20599.2 11 5510 201 22 Sergio Lopez RB 18240.3 11 5119 83 17 Spencer Schumann RB 17010.5 8 6493 99 24 Matthew Erickson GP 12676.3 7 6017 80 37 Douglas Hawkins GP 12407.0 8 6190 90 32 Sergey Polikarpov NN 12203.3 3 6303 67 38 Mario Perez SM, Lrs 12060.2 4 4497 170 23 Alexandru Paler NN, A* 7358.9 3 4401 69 43 Michael Tulacek SM 6571.8 3 5965 52 14 Rafael Oliveira RB 6314.2 1 6692 36 9 Glenn Hartmann RB 1060.0 0 1134 8 71 Erek Speed GA out of memory the web site after the end of the competition. However, of times on a single level, but only the score on the last many competitors did not describe their agents in detail. attempt will count. The level will contain e.g. hidden blocks, In future competitions, the structure of such descriptions shortcuts and dead ends. Thus, the scoring will reward should be specified, and submissions that are not followed controllers that learn the ins and outs of a particular level. by satisfactory descriptions should be disqualified. 3) The level generation track: The level generation track differs substantially from the other tracks as what is tested is C. Using the Mario AI Competition in your own teaching not controllers for the agent, but level generators that create The Mario AI Competition web page, complete with the new Mario levels that should be fun for particular players. competition software, rules and all submitted controllers, The generated levels will be evaluated by letting a set of will remain in place for the foreseeable future. We actively human game testers play them live at the competition event. encourage use of the rules and software for your own events, R EFERENCES and have noted that at least one local Mario AI Competition has already launched (at UC San Diego). Additionally, the [1] J. Togelius, S. M. Lucas, H. Duc Thang, J. M. Garibaldi, T. Nakashima, C. H. Tan, I. Elhanany, S. Berant, P. Hingston, software is used for class projects in a number of AI courses R. M. MacCallum, T. Haferlach, A. Gowrisankar, and P. Burrow, around the world. When organizing such events, it is worth “The 2007 ieee cec simulated car racing competition,” Genetic remembering that the existing Google Group and its archive Programming and Evolvable Machines, 2008. [Online]. Available: http://dx.doi.org/10.1007/s10710-008-9063-0 can serve as a useful technical resource, and that the result [2] D. Loiacono, J. Togelius, P. L. Lanzi, L. Kinnaird-Heether, S. M. Lucas, tables in this paper provide a useful point of reference. We M. Simmerson, D. Perez, R. G. Reynolds, and Y. Saez, “The WCCI appreciate if any such events link back to the original Mario 2008 simulated car racing competition,” in Proceedings of the IEEE Symposium on Computational Intelligence and Games, 2008. AI Competition web page, and students are encouraged to [3] J. Togelius, S. Karakaovskiy, J. Koutnik, and J. Schmidhuber, “Super submit their agents to the next iteration of the competition. mario evolution,” in Proceedings of IEEE Symposium on Computational Intelligence and Games (CIG), 2009. D. Future competitions [4] K. Compton and M. Mateas, “Procedural level design for platform games,” in Proceedings of the Artificial Intelligence and Interactive The 2010 Mario AI Championship, like the 2009 compe- Digital Entertainment International Conference (AIIDE), 2006. tition, uses a single website9 but is divided into three tracks: [5] G. Smith, M. Treanor, J. Whitehead, and M. Mateas, “Rhythm-based level generation for 2d platformers,” in Proceedings of the International 1) The gameplay track: Similarly to the 2009 competition, Conference on Foundations of Digital Games, 2009. this track is aimed at producing the controller that gets [6] C. Pedersen, J. Togelius, and G. Yannakakis, “Modeling player expe- furthest on a sequence of levels. However, the competition rience in super mario bros,” in Proceedings of IEEE Symposium on Computational Intelligence and Games (CIG), 2009. software is modified so that some of the levels are substan- [7] P. Hart, N. Nilsson, and B. Raphael, “A formal basis for the heuristic tially harder than the hardest levels in last year’s competition. determination of minimum cost paths,” IEEE transactions on Systems 2) The learning track: This track is similar to the game- Science and Cybernetics, vol. 4, no. 2, pp. 100–107, 1968. [8] I. Millington and J. Funge, Artificial Intelligence for Games. Morgan play track, but favours controllers that perform online learn- Kaufmann Pub, 2009. ing. Each controller will be tested a large number (e.g. 1000) 9 http://www.marioai.org
You can also read