BINAURAL VERSUS STEREO AUDIO IN NAVIGATION IN A 3D GAME: DIFFERENCES IN PERCEPTION AND LOCALIZATION OF SOUND - LUDVIG WIDMAN - DIVA
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Binaural versus Stereo Audio in Navigation in a 3D Game: Differences in Perception and Localization of Sound Ludvig Widman Audio Technology, bachelor's level 2021 Luleå University of Technology Department of Social Sciences, Technology and Arts
Ludvig Widman S0038F mail@ludvigwidman.se Spring 2021 Abstract Recent advancements in audio technology for computer games has made possible for implementations with binaural audio. Compared to regular stereo sound, binaural audio offers possibilities for a player to experience spatial sound, including sounds along the vertical plane, using their own headphones. A computer game prototype called “Crystal Gatherer” was created for this study to explore the possibilities of binaural audio imple- mentation regarding localization and perception of objects that make sound in a 3D game. The game featured two similar game levels, with the difference that one used binaural sound, and the other stereo sound. The levels consisted of a dark space that the player could navigate freely with the objective to find objects that make sound, called “crystals”, as fast as they could. An experiment was conducted with 14 test sub- jects that played the game, qualitative and quantitative data was collected, including the time the players took to complete the game levels, respectively, and answers about how they experienced the levels. A majority of test subjects reported that they per- ceived a difference between the levels. No significant difference was found between the levels in terms of efficacy of finding the objects that made sound. Some test subjects stated that they found localization was better in the binaural level of the game, others found the stereo level to be better in this respect. The study shows that there can exist possibilities for binaural audio to change the perception of audio in computer games. 2
Ludvig Widman S0038F mail@ludvigwidman.se Spring 2021 Acknowledgements I would like to thank to everyone who participated in and supported this study, and in particular: My supervisor, Jon Allan, whose enthusiasm and thoughtful feedback was of great help to this study. My kind and supportive classmates, who you could always turn to if you hit a snag, thank you! 3
Ludvig Widman S0038F mail@ludvigwidman.se Spring 2021 Table of Contents Abstract .....................................................................................................................2 1. Introduction ............................................................................................................6 1.1. Theoretical Background ...................................................................................6 1.1.1. HRTF and Binaural Sound .........................................................................6 1.1.2. Resources in Technology for Implementing Binaural Sound in Computer Games ................................................................................................................7 1.2. Previous Work on Binaural Audio in Navigation in Computer Games..................8 1.2.1. Findings in Efficacy and User Experience for Navigation Supported by Binaural Sound, and Future Work ........................................................................9 1.3. Research Question.........................................................................................11 1.4. Purpose .........................................................................................................11 2. Method ................................................................................................................11 2.1. The Game Prototype “Crystal Gatherer”..........................................................11 2.1.1. Introduction to the Game Prototype .........................................................11 2.1.2. Construction of the Game Levels .............................................................12 2.1.3. Crystal Arrangement and Packaging of the Game.....................................17 2.1.4. Sounds for “Crystal Gatherer” ..................................................................17 2.1.5. Baseline Test ...........................................................................................18 2.2. Informal Pre-Study .........................................................................................19 2.3. Experiment Execution & Data Collection .........................................................19 2.4. Regarding the COVID-19 Recommendations ..................................................20 2.5. Research Ethics .............................................................................................20 2.6. Data Analysis Methods ...................................................................................20 3. Results & Analysis ................................................................................................21 3.1. Time Results for Levels X and Z ......................................................................21 3.1.1 Analysis of Time Results for Levels X and Z ..Fel! Bokmärket är inte definierat. 3.2. Level Difficulty Results ....................................................................................22 3.2.1. Level Difficulty Analysis ............................................................................23 3.3. Perception of Level Difference, Results and Coding ........................................23 3.3.1. Analysis of Qualitative Level Difference Perception ....................................26 3.4. Quantitative Level Preference Results .............................................................26 3.4.1. Quantitative Level Preference Analysis......................................................27 3.5. Qualitative Level Preference, Results and Coding ............................................27 3.5.1. Analysis of Qualitative Level Preference ....................................................29 3.6. Baseline Test Results and Overview................................................................29 3.6.1. Baseline Test Analysis..............................................................................30 4. Conclusion ...........................................................................................................30 4
Ludvig Widman S0038F mail@ludvigwidman.se Spring 2021 5. Discussion............................................................................................................31 6. References ...........................................................................................................32 7. Appendix..............................................................................................................34 5
Ludvig Widman S0038F mail@ludvigwidman.se Spring 2021 Binaural versus Stereo Audio in Navigation in a 3D Game: Differences in Perception and Localization of Sound Degree project, bachelor’s level by Ludvig Widman 1. Introduction In recent years, audio technology for computer games has made notable advance- ments (Farkaš, 2018), moving from standard stereo audio with relatively simple imple- mentations to intricate technologies such as realistic reverb, occlusion and air absorp- tion (Gustafson & Cancar, 2020). Today, there are freely available plug-ins for major game development engines that makes available tools for building complex spatial au- dio environments, in triple-A and indie-level games alike. Spatialized audio has become an important aspect to consider when designing a new game, as it may give the player a more life-like and engaging experience. In a study by Gustafson and Cancar (2020) about localizing audio sources in a first-person shooter (FPS) game, binaural audio is investigated as a way to enhance the experience of localizing sound sources, and per- haps increase the performance of the player (as in more efficiently find where oppo- nents are located in the game). The advancements in technology in binaural audio for computer games, and the previous research presented in this thesis motivates further work in the research area. 1.1. Theoretical Background Our ability to localize sound can be broken down into interaural time difference (ITD), in- teraural intensity difference (IID) and spectral cues (Cheng, & Wakefield, 1999). ITD is defined as the difference in arrival time between the ears for a wavefront, and similarly, IID is defined as the difference in amplitude between the ears for a sound. The spectral cues refer to our ears and head’s filtering effect on a sound from a given angle. A sound heard from the front, for example, is filtered differently by our pinna, than a sound heard from the rear. Additionally, our head causes acoustic shadowing for sounds incoming from relevant angles. All of these spectral cues, in conjunction with the ITD and IID, we have learned to interpret, which is why we can lozalize sound with relative accuracy. It is worth mentioning that our abilities to localize sound differ be- tween individuals. For example, a person with degraded hearing may have more diffi- culties in localizing sound due to not hearing fully the spectral cues. 1.1.1. HRTF and Binaural Sound Binaural audio is a broad term for describing human hearing. A central concept to bin- aural hearing is the head-related transfer function (HRTF). HRTF is a model for the sonic characteristics of when sound is projected onto our heads and ears (Rumsey & McCor- mick, 2009). Every angle of incidence produces a different timbre and time difference between our ears. In traditional stereo audio, only level and time difference between the two channels enable us to perceive direction. When the timbral differences of our ears are present however, a more complex spatial reproduction emerges, these techniques for mimicking HRTF conditions in audio reproductions are often called binaural (Rum- sey & McCormick, 2009). 6
Ludvig Widman S0038F mail@ludvigwidman.se Spring 2021 One way to record binaural sound is by using a so called “dummy head” (Georg Neu- mann GmbH, 2020), made in a shape and material that has similar qualities to that of a human head. Microphones are placed inside the artificial pinnae of the dummy head. The pinnae are made to resemble those of an average person. Recorded sound with the dummy head will have the filtering of the head and the pinna, and is intended to produce a life-like sensation of hearing the sound source with one’s own ears. Of course, every human pair of ears are different, and due to this, the binaural effect of a dummy head may be partly or fully unheard by a listener, since they may have a drasti- cally different filtering produced by their ears and heads. Using small microphones that are inserted into the ears of a listener, custom HRTF recordings can also be made (ITA HRTF-database, 2020). This is made by playing signals from speakers that move around the listener in all angles, and recording the signals at the microphones. The re- cording is then analyzed to produce a custom HRTF-profile. This makes for potentially very believable spatial sound reproductions. Binaural audio reproduction has its limitations, as stated by Rumsey & McCormick (2014), as every person’s ears are different, mentioned above. Additionally, when the visual cues are not present, the binaural audio can be confusing. Moreover, head- phones have different equalizations and frequency responses which can distort the HRTF’s frequency spectrum. Also, if a head-tracking solution is not incorporated (such as in a VR-headset application) the cues that are created when one moves one’s head, important for localization, are lost. Binaural audio over speaker listening also introduces problems, as the natural crossfeed between stereo speakers, and room reflections, dis- tort the phase and spectrum of HRTF-filtered sound. In the case of this study, the word binaural is used in the sense that objects that make sound are HRTF-filtered by a computer program. This is a limited aspect on binaural audio, as real-life binaural audio has a larger range of factors impacting the sound, in- cluding a listener’s unique physical properties such as head, pinnae and body shape. 1.1.2. Resources in Technology for Implementing Binaural Sound in Computer Games Steam Audio (Valve Software, 2020) supports HRTF rendering, ambisonic decoding, auditory occlusion and reverb, and is available as a plugin both in game development engines Unreal Engine and Unity, as well as in industry-standard audio middlewares Fmod and Wwise. Another plugin is Resonance Audio, an open source project that started at Google (Gorzel, 2019). It uses ambisonic technology and a repository of head-related impulse responses (HRIRs) derived from the SADIE database (SADIE, 2020). The SADIE database makes available HRIRs of a Neumann KU100 dummy head microphone. Resonance audio is available as a plugin for Unreal Engine and Unity, Fmod and Wwise. Both Steam Audio and Resonance Audio plug-ins offer the option to replace the HRIRs used, opening up the possibility for customized HRTF ex- periences. HRTF-related and binaural sound has made great progress with the advancement of computing power made available for audio processing in games. Compared to sur- round sound, utilizing multiple speakers and complicated audio interfaces, binaural au- dio can use two audio channels (as used for regular stereo audio) and, in most cases, headphones. This is a relatively inexpensive and simple way for a consumer to experi- ence spatial audio. The recent technology enabling the experience of localized sound in games poses questions about how binaural audio can be used as a tool for in-game 7
Ludvig Widman S0038F mail@ludvigwidman.se Spring 2021 navigation and spatial awareness. In the previous work, reviewed below, three separate studies are considered, where researchers have made different implementations for binaural audio, showing possibilities for binaural sound as support for navigation in a variety of games. 1.2. Previous Work on Binaural Audio in Navigation in Computer Games In a study part of the Inclusive Game Design research, Bergqvist (2014) investigates the difference between using a system with stereo or binaural audio in an auditory display while navigating a computer game (Bergqvist, 2014). An important aspect of the study is to investigate ways to make computer games accessible to all, including those with visual impairment. An experiment is constructed for the study and is tested on 22 test subjects, that are able to see, but are using blindfolds. The experiment makes use of two prototypes, which are games with stereo and binaural audio respectively. The game displays a room, and the player is meant to find different objects that make sound within the room. The author hypothesizes that it should be possible to increase the speed and accuracy of the user’s navigation in a non-visual system by using binau- ral audio. An experiment is conducted, comparing the two prototypes that were similar in every aspect except the sound. The games are run on a tablet device with a screen showing different rooms filled with objects. In the game, while displaying a room, some objects make sound, such as a purring cat. The player is instructed to find different ob- jects through dialogue in the game. When the player runs their finger across the screen, the closer the object they get, the stronger the sound produced. When the players’ fin- ger is located on an object, a notification sound is played, to let them know that they have found it. In the binaural version, the sounds are HRTF-filtered, which, according to the hypothesis of the paper, may help the test subject find the object faster. The participants in the study were divided into two groups. One group got to play the binaural version of the game, while the other played the stereo sound version. Several aspects of the playing of the game were recorded, such as the amount of touches on the screen, and time elapsed in each room of the game. Upon analyzing these results, no significant difference was found in efficacy of interaction between the groups. It is discussed that this might be because the participants did not perceive the binaural sound, or that they relied more on the volume difference when finding the objects. It is also mentioned that some test subjects did not understand what the sounds in the game were meant to represent, such as when the sound of a radio were interpreted as background noise. In another study by Drossos et al. (2015) which has a focus on games for blind or visu- ally impaired people, a computer game resembling the well-known game of Tic-Tac- Toe is constructed. The authors argue that computer games are a large part of the so- cial lives of children and those with visual impairment largely are left out of this social context. The study has a clear focus on blind or visually impaired groups of young us- ers, as children with visual impairment participates in the study, and impart their spe- cific experiences on the result. The game uses binaural sounds to display information about the state of the game to the user, as a blind person needs to hold the current game state in memory. To help them revisit the game state, sounds are played to in- form them when they move the cursor on the game board. A visual interface is also constructed to be of help to those with a partial visual impairment. In contrast to Bergqvist (2014), this study does not compare this game to a version with regular ste- reo sound. The game was played by twelve test subjects and interviews were con- ducted afterwards. The result of the interviews and the following analysis was that a 8
Ludvig Widman S0038F mail@ludvigwidman.se Spring 2021 significant majority of the test subjects found the game interesting and that the interface of the game was useable. The authors discuss that the interviews also showed that the result is highly affected by the difference in spatial awareness between visually impaired people, and that there is a need for direct audio feedback of any change in the game, in order for it to make sense to blind people. In a study by Gustafson & Cancar (2020), the authors investigate how binaural audio changes the perception of localization by players in a first-person shooter (FPS) game. An example is given that players of a popular online FPS game are finding it difficult to know where their enemies are, if they are above or below the player. A 3D game is constructed for the study in two versions, one with regular stereo sound and one with binaural sound. The goal of the game is to locate sound sources in different vertical lev- els. Test subjects play the both versions of the game and both qualitative and quantita- tive data is collected. Data is collected quantitatively while playing the game and via qualitative interviews after the test subjects have finished playing. The constructed game takes the form of a 3D model of a multi-story building that the player can walk in- side of. There are four stories in the building, including a cellar which is below the player. The vertical levels are joined by a staircase, which acts as an opening for sound to travel between the floors, however, the player is restricted to only be able to move in the ground floor of the building. This is, according to the authors, because the player is meant to answer questions about which story they think the sound is coming from. The FPS-standard controls of using the computer keyboard keys “WASD” for walking, and mouse movement for looking around is applied in the game. The game is made with the Unity game engine and Steam audio to enable binaural sound. The sound of foot- steps was recorded and edited to play in a loop inside of the game, and was placed on different stories for each round of the game. The test was conducted twice for each test subject, once with stereo and once with binaural sound. The test subjects were not allowed to know what was tested and what was changed during the test. In the game, questions were posed, asking which story the sound seemed to be coming from. The test subjects then gave their answer verbally to the researcher conducting the test. The result from the quantitative data was that the players did not significantly localize the sound better with binaural sound. The qualitative interviews showed that a majority of the players found it confusing to locate the sound in the game. It is discussed that since all test subjects used their own computers and headphones for the test, results may have been impacted, as the players might not have fully perceived the binaural sound. Furthermore, the authors discuss that if the sound source (the footsteps) had moved around in the game, the players might have more easily localized it, because of the additional changes in binaural timbre. 1.2.1. Findings in Efficacy and User Experience for Navigation Supported by Binaural Sound, and Future Work Binaural audio, a central concept in all three studies above, can convey information where objects are located in both a two- and three-dimensional system, like a regular stereo recording, but unlike stereo, binaural audio has a more detailed positioning of sound, including localization on the vertical axis. Described above are three different studies that all have investigations into navigation supported by binaural sound. The three studies bear a resemblance in their way to use binaural audio to convey infor- mation about the location of game elements in the game environment, but the game dynamics of the studies’ respective games are very different. Berqvist uses a “find the spot on the screen”-type of interaction, while Drossos et al. (2015) uses a game that is 9
Ludvig Widman S0038F mail@ludvigwidman.se Spring 2021 more like a strategic tabletop game. Gustafson and Cancar (2020), however, use bin- aural audio in the context of a FPS game. As shown by Drossos et al. (2015), differences in spatial awareness between test sub- jects vary to a large degree for non-sighted or visually impaired people. Comparably, as seen in the study by Bergqvist (2014), sighted test subjects using blindfolds also differ in their perception of binaural sound. Some do not perceive the binaural effect, perhaps due to differences in pinna between people, and models used by binaural sound pro- cessing computer programs. Gustafson and Cancar (2020) also describe that the hard- ware and software setup used by a potential player of a game with binaural audio can impact the perception of the binaural audio. For example, the quality of headphones might influence the result, or as Farkaš (2018) mentions, that unless treated with spe- cial filtering, binaural audio does not translate well over speakers. In the study by Drossos et a. (2018), it is mentioned that binaural audio does not work equally well for all directions and sounds. This would seem to be plausible since binaural audio use fil- ters that make minute changes to a sounds’ frequency spectrum (Rumsey & McCor- mick, 2014), and for example, sounds poor in overtones (a sinewave-like sound for ex- ample) has little material for the binaural filter to work with, and thus it is harder for a hu- man to interpret the intended direction of the sound: Cheng & Wakefield (1999) states that a sound needs to be broadband and contain frequencies over 7 kHz to be possi- ble to localize on an elevation in the median plane. In the three studies, significant evidence of binaural audio heightening efficacy of navi- gating within a game cannot be observed. Both studies by Bergqvist (2014) and Gus- tafson and Cancar (2020) conclude that there are no results that can be interpreted as binaural sound being more efficient than stereo, with respect to localizing sound sources. However, in terms of gaming experience, the test subjects in Gustafson and Cancar’s (2020) study preferred the version with binaural sound. Bergqvist (2014) did not account for preference between the stereo and binaural version, since the study used an un-paired t-test, and the test subjects did not play both versions of the game. The game model used by Gustafson and Cancar (2020) raises questions whether a better a method exists to determine if binaural audio can improve navigation in 3D computer games. In their game, the player was constrained only to move on the ground level of the building. The objective was then to guess which floor the footsteps were coming from. This is quite a hard task, even in reality, as reflections and absorp- tion by walls, floor and ceiling can produce a very complex auditory situation. For ex- ample, one can appreciate the difficulty in localizing the sound of a neighbor’s music, playing from an adjoining apartment. If the player could move freely and navigate to- wards the sound source in the game by Gustafson and Cancar (2020), and in so doing, validate their guesses, it is possible a difference would show in the results regarding ef- ficacy of navigation, between the binaural and stereo audio version. Bergqvist (2014) suggests future work could investigate how binaural audio could make a difference in navigation a three-dimensional environment. This, together with the questions posed by Gustafson and Cancar’s (2020) study suggests that studies about the experience of navigation in a three-dimensional environment, with stereo and binau- ral audio respectively, would be relevant to consider. It is also notable that there is, in these studies, relatively little said about the subjective experience of navigation with bin- aural sound, as it may change how the game is perceived when played. 10
Ludvig Widman S0038F mail@ludvigwidman.se Spring 2021 1.3. Research Question The research question for this thesis is how binaural audio in a 3D computer game can change a player’s perception or game experience, if binaural audio is preferred over standard stereo, and if binaural audio could aid the player in localizing objects that makes sound. To inform this research question a game prototype with binaural and stereo sound respectively will be created. In the form of a hypothesis, the research question may be stated as: The binaural version of the game prototype will be preferred by the test subjects, improve their gaming experience and aid them in localizing the ob- jects in the game. Test subjects will play this game and the data collected will be ana- lyzed to confirm or deny this hypothesis. 1.4. Purpose In the previous work described in detail in this paper, there are no significant results which points to binaural audio being more effective in localizing objects in a game, or that the binaural audio alters the gaming experience in a considerable fashion. How- ever, drawing on the results shared in these studies, and using a different approach to investigate this area, there can be other research outcomes. The experiment design is a significant part of this study, as it is intended to build on the experiences of the previ- ous work in this research area, and be ecologically valid to a large degree. 2. Method 2.1. The Game Prototype “Crystal Gatherer” 2.1.1. Introduction to the Game Prototype As stimuli, the study uses a prototype in the form of a 3D game with two levels with binaural and stereo audio respectively. The game consists of a space that the player can navigate freely, floating or flying, using standard WASD-and-mouse computer con- trols. The objective is to navigate to objects, resembling crystals, that emit sound, and “gather” them, thus confirming where they are in the space, as fast as possible. The game outputs the time elapsed to finish the level at the end of the game. The location of the objects differs between the stereo and binaural versions of the game, so that the players cannot memorize their locations. The game uses a very dark space, and a lim- ited light radius around the player, so that the player only sees what is in their proximity, thus using their hearing in a more active way to find objects. The player is only be able to move slowly, so that the potential strategy to rapidly fly at random to find the objects is discouraged. As mentioned above, the game prototype is meant to be ecologically valid, in the context of a computer game. However, the goal of the game prototype is not to resemble the most popular FPS games of today, but rather to look and feel like a commercial- or indie-level computer game in terms of visual interface, controls, graphics and sound. This is partly achieved by using a popular game engine (Unreal Engine, 2021) and give attention to developing the graphical and audible aspects of the game. In addition, the sound in the game, emitted by the objects, is made to be broad-band, in order to efficiently be HRTF-filtered (Cheng & Wakefield, 1999). The sounds are also intended to be engaging, reminding of a commercial game sound design that the play- ers are used to. One half of the crystals has droning, continuous sounds, and one half emits intermittent bursts of sound, so that both types of sound emission impact the 11
Ludvig Widman S0038F mail@ludvigwidman.se Spring 2021 test. The game starts with a training level where no crystals make sound, to let the player learn the game controls and prepare them for the objective of the game. The or- der of the binaural and the stereo level is also switched for each test subject, so that the effect of the order is negated. 2.1.2. Construction of the Game Levels The game was built on the First Person Example Map template, provided by Unreal En- gine version 4.26 (Epic Games, 2021) in the standard installation of the game engine. Resonance Audio v.1.0 (Resonance Audio, 2021) was activated as plugin to the Unreal Engine project. The choice of Unreal Engine as platform was influenced by the author’s previous experience of development with this game engine. Resonance Audio was chosen because of its comprehensive documentation and compatibility with both Win- dows and Macintosh computer systems. For the movement, the player was set to play as the Unreal engine asset Spectator Pawn and the max speed parameter was set to 500 in order to enable slow movement. The Spectator Pawn enables free movement in the game space, using WASD and mouse control. Additionally, it features collision, which is used to keep the player inside the bounds of the game space and also notify the game when a crystal has been ac- quired. A point light was attached to the Spectator Pawn to enable a spherical light that lets the player see the crystals when they are in their immediate proximity. A game menu was created, with a looping soundtrack. This menu provides access to the Trial Level, and the levels X and Z (which are the levels with binaural and stereo sound respectively). Figure 1. The menu screen for the game prototype. The levels (trial level excluded) have a basic shape of a rectangular room with sides that are 8000 Unreal units long, and a height of 5000 Unreal units. The walls were then clad in polygon shapes, which make up the cave-like space. The result was a roughly spherical space that is enclosed by the outer walls. The polygon shapes' collision pa- rameters are set to limit the players movement, so that the player cannot end up out- side the room. 12
Ludvig Widman S0038F mail@ludvigwidman.se Spring 2021 Figure 2. Wireframe view of the geometry making up the space in the game, with the rectangular room and the cladding of polygon shapes. The magenta shapes are the crystals placed in the game level. At the start of the game, a timer is activated in both levels, and the count of elapsed seconds are shown on the screen. An Unreal event checks continuously if all crystals are eliminated. Upon eliminating (or “gathering”) all crystals, the game ends and the timer stops. When the level is completed, the game pauses and the time elapsed shows on the screen. A message on the screen tells the player to note down the time elapsed, in order to fill in the survey form. Figure 3. Overview of a crystal, with its diamond-like shape and sound component in the list to the left. 13
Ludvig Widman S0038F mail@ludvigwidman.se Spring 2021 The crystals are comprised of a diamond-like shape and has a Sound Cue embedded in them that enables them to play a looping sound. The crystals in the binaural level share an Attenuation Settings file that enables the Resonance Audio plugin and con- trols the properties of the sound playback, such as volume and spatialization. The crys- tals in the stereo level have a corresponding Attenuation Setting. Both the binaural and stereo settings use a shared Falloff Distance. The Falloff Distance dictates the change in volume as the player moves closer or further away from a sound source. This allows a setting for the crystals’ sounds to be heard from a large portion of the space that the player can move in. The Spatialization Method, that sets the sounds spatial playback function, is set to Panning for the stereo Attenuation Setting and Binaural for the binau- ral Attenuation Setting. Figure 4. Wireframe view of the game level with falloff distance visualized by the yellow circle around the crystal in the game. Furthermore, both Attenuation Settings use the Natural Sound setting for Attenuation Function. This provides a volume falloff that is meant to be similar to the physical vol- ume falloff (Epic Games, 2021). In other respects than Spatialization Method and the properties described above, the settings are left to their default value, provided by the game template First Person Example Map, see figure 5 and 6. 14
Ludvig Widman S0038F mail@ludvigwidman.se Spring 2021 Figure 5. Attenuation settings for the binaural crystals in the game. Figure 6. Attenuation settings for the stereo crystals in the game. 15
Ludvig Widman S0038F mail@ludvigwidman.se Spring 2021 The settings file for the Resonance Audio plugin are set to HRTF as Spatialization Method. The Pattern and Sharpness parameters are left their default values, generating an omnidirectional sound source. In the plugin’s settings, the Quality Mode is set to Binaural High Quality. Figure 7. Settings for Resonance Audio plugin. Figure 8. Flowchart diagram of settings for audio in Unreal Engine for "Crystal Gatherer". 16
Ludvig Widman S0038F mail@ludvigwidman.se Spring 2021 2.1.3. Crystal Arrangement and Packaging of the Game The sound crystals are arranged around two cubes (invisible in the game), so that the distance between them is the same between both versions of the game. However, in the stereo version, the cube is rotated and the sound sources shift positions, so that the player cannot memorize the locations. Since it is dark in the game, and very hard to memorize the characteristics of the space, this amount of sound source location ran- domization was deemed adequate. The game was built and packaged as an executable for Windows operating system, and the option Include Prerequisites was checked in the Packaging Settings in Unreal Engine. This provides an installer to the game that checks whether necessary programs to run the game exist on the players computer, and offers to install them, thus increas- ing the usability of the game. 2.1.4. Sounds for “Crystal Gatherer” The sounds in the game were constructed to be evoke interest in the player and con- tribute to the ecological validity of the game. The sounds for the crystals are seamless loops of synthesized sounds, created to be broad in frequency spectrum and rich in texture. This contributes to sounds being possible to localize on an elevation in the me- dian plane (Cheng & Wakefield, 1999). All of the sounds created are rich in overtone harmonics to provide a good material for the HRTF to work with. Half of the six sounds are continuous droning sounds and the other half are intermittent, burst-like sounds. They were all mastered to have a similar frequency spectrum and a loudness of -14 LUFS. The sounds are monophonic to correspond with the point source sound in the game. Figure 9. Overview of the spectrum of each sound used for the crystals. Three are constant droning and three are intermittent sounding. A similar spectral balance can be observed be- tween the sounds. The sounds are mastered to have a loudness of -14 LUFS. A short sound for when a crystal is “gathered” was also added for conveying feedback of game mechanics to the player. This sound was synthesized from filtered noise, mak- ing a “swoosh”-like sound. This sound plays when the player comes in contact with a crystal and “gathers” it. The sound is not localized in the game, and uses the Play 17
Ludvig Widman S0038F mail@ludvigwidman.se Spring 2021 Sound 2D-function in Unreal Engine. A music piece for the menu of the game was also created and mastered to -14 LUFS. 2.1.5. Baseline Test In the findings in the previous work, presented above, there has been doubts whether the game itself or some problem in perceiving binaural audio may have impacted the results. Therefore, a baseline test was constructed to gain information whether the lo- cation conveyed by the binaural audio is perceived by the player. The baseline test fea- tured two recordings of sound played from different locations in the game setting fol- lowed by questions to verify which locations the sound originated from. The baseline test was taken by the test subjects after the experiment, so that it did not introduce preconceptions about the theme of the experiment. The sound for the baseline test is that of a voice saying “Strawberry”. The sound was equalized to have a full spectrum and clear treble and was mastered to a loudness of -14 LUFS. The sound was rec- orded from Unreal Engine in two positions, above and to the left, and behind, with the Resonance Audio plugin activated. Four repetitions of the word were recorded for each position, with very slight alterations in viewing angle in between, to introduce small vari- ations in the HRTF filtering effect. A version of the sound without the HRTF-filtering from Resonance Audio was also included as a reference. Figure 10. Locations of the baseline test sounds. The player is represented by the image of the lightbulb, located in the center of the cube, used as a guide grid. The spheres represent the sound. In the image to the left, it is behind the player, and in the image to the right, it is above and to the left of the player. Figure 11. The multiple-choice answers for the baseline test in the Google Forms (Google, 2021) survey. 18
Ludvig Widman S0038F mail@ludvigwidman.se Spring 2021 2.2. Informal Pre-Study Test feedback was given in an informal setting with four colleagues in the Sound Engi- neering Programme. Two of them pointed out that the game was very hard to finish. To aid the player a little, and prevent fatigue, the light radius of the player, which allows them to see the crystals that are close by, was extended somewhat. One player in- formed that they found it difficult to understand their movements in the complete dark of the levels. To lessen this confusion, a random scattering of small triangle shapes was put in the levels. They only show in the immediate proximity of the player, and gives vis- ual cues about the players movement, but does not contribute to the localization of the crystals. Another player suggested that the menu could have music, so that the player automatically adjusts their volume when entering the game. This was also imple- mented. Another comment was that the game needed to specify that headphones should be used, thereafter, a high visibility banner with the text "Use headphones!" was put into the menu. In conversation with supervisor J. Allan (personal communication, 9 March -21), after Allan had been testing the game, it was brought up that the panning in Unreal Engine seems to drop in perceived volume when a sound is played from the front, both in the stereo and the binaural level. This same phenomenon could be experienced on two dif- ferent computers with different headphones. It would seem that this is an inherent property of the game engine, and modifications to the sound plugins both for stereo and binaural sound would have to be implemented to amend this. Considering the scope of this project this potential source of error was accepted into the experiment design. 2.3. Experiment Execution & Data Collection To replicate the setting in which computer games are played, the game was played on a computer with headphones that the test subject freely could adjust the volume on. The test equipment at the tests taking place at Piteå Musikhögskola, computer lab L131, featured a Dell workstation computer, mouse and keyboard, and Beyerdynamic DT-770 closed-back headphones. This was meant to resemble a typical gaming envi- ronment. The distance tests used the participants own hardware setups. Seven test subjects out of the fourteen in total chose to participate remotely. The test was executed by giving the player instructions through a Google survey form (Google, 2021), which informed the test subject about what level in the game to play first. After completing a level (Trial Level excluded) the player was asked to return to the form to fill out questions about the level they just had played, before continuing to the next. After the game levels were completed, and the questions regarding these were answered, the test subject was asked to play the sounds for the baseline test. The baseline test was taken after playing the game levels. The test subjects were first asked to play the test sound as reference (as it was not HRTF-filtered), and then the two binaural sounds. The test subjects were informed that they only were permitted to listen to these once (each test sound was repeated four times in the recording). To avoid the result being affected by preconceptions about the audio technology, the difference between the two game levels was not mentioned in the survey. Considering test subject demography, test subjects with some experience of first-person games were encouraged to participate. A question about the test subjects’ experience with 19
Ludvig Widman S0038F mail@ludvigwidman.se Spring 2021 first-person perspective computer games was asked, to make sure that the partici- pants were familiar with the controls for this type of game. The test subjects were stu- dents at Piteå Musikhögskola and contacts on the authors social media platforms. Both quantitative and qualitative data were gathered from the test subjects’ interaction with the game in order to inform the research question. Quantitate data was gathered by measuring the time the players used to gather the objects. Qualitative data was col- lected by letting test subjects write about their experiences playing the levels. The table below is an overview of the data collected in the survey: Table 1. Data collection overview. Time elapsed for completing Level X Level Z each level (data acquired from the game) Level of difficulty, Level X/Z 1 (very easy) 2 3 4 5 (very hard) Did you experience any differ- Yes No ence between the levels? If, yes, please explain the differ- ence using a few sentences. Preference 1 (Strong pref- 2 3 (No 4 5 (Strong erence for X) prefer- preference ence) for Z) Please explain your preference, or lack thereof. Please answer using a few sentences. Where does Sound 2/3 appear Multiple to come from? (Baseline Test) choices of di- rections. How long have you been play- One year or Two ing First-Person Perspective less. years or computer games? more. 2.4. Regarding the COVID-19 Recommendations In order to follow the recommendations by the Swedish authorities and Folkhälsomyn- digheten (www.folkhalsomyndigheten.se) the test was designed to work from the test subject’s own computers, should they choose to participate in the test remotely. The test on location was carried out with social distancing and the equipment was disin- fected between each test subject. 2.5. Research Ethics The test participants were informed of the conditions of the test, and their free choice to end their participation at any time during the test. The test was conducted according to the guidelines of Vetenskapsrådets Forskningsetiska Principer (2002). The game used in the test contains no depiction of violence that might be stressful or harmful to test participants. 2.6. Data Analysis Methods A paired t-test, two-tailed, was used to analyze the time data to see whether a signifi- cant difference exists between the game levels. A null hypothesis of 0 was used, thus 20
Ludvig Widman S0038F mail@ludvigwidman.se Spring 2021 assuming that no significant difference between time elapsed for the levels exist. Fur- thermore a a-value of 0.05 was used. The qualitative data content was analyzed by using a grounded approach (Denscombe, 2016). Themes in the answers between the test subjects were coded and categorized. The answers then were put into a table, sorting them by code, where after common phenomena could be identified between the answers. The answers from those test subjects who chose to answer in Swedish were translated to English. Some typing er- rors identified in the answers were also corrected for readability. Translation and editing were done with the intention to not distort the answers. The full, unedited, answers are available in the Appendix of this paper. 3. Results & Analysis 16 test subjects participated in the test. Two test subjects were excluded from the re- sults, one because of demographic selection, as this test subject reported less than a year’s experience with first-person perspective computer games. The other participant was excluded because of reported technical issues playing the game (lag), which greatly hindered their playing of the game. See 7. Appendix for the full survey results. The binaural level is referred to as “Level X” and the stereo level is referred to as “Level Z”. The order of which the test subjects played the levels results in two groups of test subjects, they are referred to as “X Group”, that played Level X first, and “Z Group” that played Level Z first. In the following table and diagrams, areas marked in blue are “X Group”, that played Level X first, and green are “Z Group” that played Level Z first. 3.1. Time Results for Levels X and Z Table 2. Time results for completing Levels X and Z. Results marked in blue are “X Group”, that played Level X first, and green are “Z group” that played Level Z first. Test Subject no Time Elapsed Binaural Time Elapsed Stereo Level (Level X) (in sec- Level (Level Z) (in sec- onds) onds) 1 148 68 2 104 100 3 114 64 4 304 274 5 184 88 6 300 134 7 238 214 8 88 110 9 244 232 10 114 232 11 80 140 12 232 124 13 218 272 14 156 172 Mean 180 s 159 s Median 170 s 137 s 21
Ludvig Widman S0038F mail@ludvigwidman.se Spring 2021 3.1.1 Analysis of Time Results for Levels X and Z A paired t-test, executed in Excel (Microsoft, 2021) rendered a P-value of 0.31 (two tailed), which is larger than the a-value of 0.05 used in this study. Therefore, the null- hypothesis could not be rejected, and no significant difference between the elapsed times for the levels can be stated. 3.2. Level Difficulty Results Table 3. Level difficulty results. Results marked in blue are “X Group”, that played Level X first, and green are “Z group” that played Level Z first. Level X Difficulty Amount of test subjects (X Group) Amount of test subjects (Z Group) 1 (Very easy) 0 0 2 0 5 3 4 1 4 3 0 5 (Very hard) 0 1 Median Difficulty 3 Difficulty 2 Level Z Difficulty 1 (Very Easy) 1 0 2 4 1 3 1 3 4 1 3 5 (Very hard) 0 0 Median Difficulty 2 Difficulty 3 Level X Difficulty 6 Amount of test subjects 5 4 3 2 1 0 1 (Very easy) 2 3 4 5 (Very hard) X Group Z Group Figure 12. Perceived difficulty for Level X. 22
Ludvig Widman S0038F mail@ludvigwidman.se Spring 2021 Level Z Difficulty 5 Amount of test subjects 4 3 2 1 0 1 (Very easy) 2 3 4 5 (Very hard) X Group Z Group Figure 13. Perceived difficulty for Level Z. 3.2.1. Level Difficulty Analysis Perceived level difficulty can be viewed as centered around the middle difficulty. As can be seen in the median of the groups X and Z, the test subjects found their first level played somewhat more difficult. There does not seem to exist a clear difference be- tween the levels’ perceived difficulty with regard to stereo or binaural sound. 3.3. Perception of Level Difference, Results and Coding Did you experience any difference between the levels X and Z? No; 2 Yes; 12 Figure 14. Amount of test subjects that experienced a difference between the levels. 23
Ludvig Widman S0038F mail@ludvigwidman.se Spring 2021 The qualitative answers from the test subjects explaining their perception of difference between the game levels were categorized into five codes, derived from themes identi- fied in the answers, see table 4. • Code: Vertical aspect Answers that from point to a perceived difference in localization along the vertical axis. • Code: Easier to locate in X Answers that points to Level X being perceived to be easier in localizing the crystals. • Code: Easier to locate in Z Answers that points to Level Z being perceived to be easier in localizing the crystals. • Code: Learning factor Answers that reflects upon that the player has learned the interaction with the game in the previous level. • Code: Lack of perceived difference This code is used for answers that express lack of perceived difference. 24
Table 4. Code Table: Perceived Level Difference. Results marked in blue are “X Group”, and green are “Z group”. (1)=test subject no. (N)=Did not perceive a difference between the levels. Code: Vertical aspect Code: Easier to locate in X Code: Easier to locate in Z Code: Learn- Code: Lack of perceived differ- ing factor ence (8) ”It was easier to find the crystals in X, because (8) ”It was easier to find the crystals in X, because (3) “In level Z it felt easier to hear where the sounds (13) “X felt (9)(N) ”I did not.” one could easier understand where the crystals one could easier understand where the crystals came from. During level X I could distinguish the di- somewhat where vertically.” (translated from Swedish) where vertically.” (translated from Swedish) rection of crystals to some extent, but it was very easier and hard to figure out if they were above, below or behind had bit clearer me.” sound direc- tion could also be because I played X last.” (11) “In level Z it was difficult to hear if the crystals (10) “I thought that Level X was much easier, be- (4) “I found Z easier in tracking down the crystals with (2)(N) “I didn't feel a difference be- were above or under me. So I used a method cause I had gotten the hang of how to play and the sounds. I don't really know why, but I found it tween the levels, I noticed a big where I traveled towards the sound to make the where the crystals tended to be when I heard easier to navigate in this level.” difference in how hard it was to sound as loud as possible, then I tried going up or them in different ways.” (translated from Swedish) find them depending on if the down based on guessing, to see if it was there. In sound source was constantly Level X This was different. I thought I could locate making a sound or if it was pul- the sound by looking up or down which made sating instead. Pulsating sound things easier.” was harder to locate.” (12) “I perceived a difference in height where the (11) “In level Z it was difficult to hear if the crystals (5) “In Z it was easier to locate sounds for some rea- crystals in X had greater distance in levels up or were above or under me. So I used a method son. In X I thought that the sounds originated from down. There may have been less lighting in X as where I traveled towards the sound to make the above where the crystals were.” well but I am unsure.” sound as loud as possible, then I tried going up or down based on guessing, to see if it was there. In Level X This was different. I thought I could locate the sound by looking up or down which made things easier.” (13) “X felt somewhat easier and had bit clearer (6) “Some of the sounds differed that was being sound direction could also be because I played X used, though some stayed the same. Otherwise last.” mostly that level Z seemed to be louder. It was easier to hear the different crystals from anywhere in the room, so perhaps the decay curve of distance was longer in Z, in that the sound could travel longer more than it actually being louder. Though it might have been it being louder.” (14) “Difficult to pin-point, but it felt like the sound (7) “Z felt easier to localize in, it might've been due to in level X was more three-dimensional, making it the positioning of the crystals - or I was just more easier to locate the crystals.” used to the sounds after playing level X to begin with. It was hard to say whether or not there was anything particularly different between the levels - or if it due to the experience of having played X that made Z eas- ier.”
Ludvig Widman S0038F mail@ludvigwidman.se Spring 2021 3.3.1. Analysis of Qualitative Level Difference Perception A majority of test subjects answered that they perceived a difference between the game levels. Comparable to the results in table 5 there are answers that seem to indi- cate that the test subjects find their last level played different in that it was easier to complete. There are three answers which point to test subjects noticing a difference in localization in the vertical axis in Level X, two of these seem to find the localization in the vertical to be better using binaural sound. There are five answers that indicate a perception of Level Z to be easier, one of these notes that Level Z seemed to be louder, and this made localization easier. The percep- tion of increased loudness would seem to be correct, since the sounds in Level Z are not HRTF-filtered, which in some angles causes attenuation in specific frequency bands to occur; without HRTF-filtering the sum of the crystals’ sound in a given position in the game would be louder, even though the same .wav-files are used in both levels. 3.4. Quantitative Level Preference Results Level Preference 4 Amount of test subjects 3 2 1 0 Strong Moderate No preference Moderate Strong preference for Z preference for Z preference for X preference for X X Group Z Group Figure 15. Level preference. 26
Ludvig Widman S0038F mail@ludvigwidman.se Spring 2021 Simplified Level Preference 5 Amount of test subjects 4 3 2 1 0 Preference for Z No Preference Preference for X X Group Z Group Figure 16. Simplified level preference: All test subjects with some preference for either level was put into their corresponding category. Test subjects with no preference were put into “No Pref- erence” category. 3.4.1. Quantitative Level Preference Analysis As can be seen from figure 15 there is a relatively even distribution of preference for each level, with five test subjects with any preference for either level. Two test subjects had no preference. There may be a connection between the order of levels played and preference, since most of the test subjects who preferred Level X played Level Z first, and vice versa. This could be a result of the players getting familiar with the sounds in the game, and also having familiarized themselves with playing the dark levels in the game. 3.5. Qualitative Level Preference, Results and Coding The answers from the test subjects explaining their preference were coded into five codes, derived from themes identified in the answers, see table 5. • Code: Confusion in Z Answers from test subjects who seem to find the localization in Level Z confusing in the vertical axis. • Code: X being easier/more fair Answers that points to level X being easier, either in localizing the crystals or some other factor. • Code: Z being easier/more fair Answers that points to level Z being easier, either in localizing the crystals or some other factor. • Code: Lack of preference due to lack of perceived difference This code is used for answers that seem to express lack of preference, due to not find- ing a clear difference between the levels. • Code: Learning factor Answers that reflects upon that the player has learned the interaction with the game in the previous level. 27
Table 5. Code Table: Level Preference. Results marked in blue are “X Group”, and green are “Z group”. (1)=test subject no. (SX)=Strong preference for X, (MX)=Moderate preference for X, (NP)=No preference, (MX)=Moderate preference for Z, (SZ)=Strong prefence for Z. Code: Confu- Code: X being easier/more fair Code: Z being easier/more fair Code: Lack of preference due to Code: Learning sion in Z lack of perceived difference factor (8)(SX) “I experi- (10)(SX) ”I liked that it [Level X] was a bit easier. I also (12)(MZ) “It was difficult to know where the sound was coming (9)(NP) ”I did not” (1)(MZ) “I had an enced that one think that the level was a bit easier because I got from as there are bigger differences in the sound level on a hori- easier time on Z, became con- stuck a few times on Level Z, which was annoying.” zontal plane than vertical. I didn’t feel as if the increased difficulty but I also think fused much (Translated from Swedish”) was as gratifying as I had no control over this "issue". “ that it gave me easier in Level Z more help and in the case tools to find the where the crys- targets.” tal was above or below you” (Translated from Swedish) (11)(SX) “It was a lot easier to find the crystals in level (1)(MZ) “I had an easier time on Z, but I also think that it gave me (2)(NP) “Didn't notice a big differ- X with my hearing, it felt more fair and realistic.” more help and tools to find the targets.” ence in the levels, most of the dif- ference was the different sound sources, if they were pulsating or constant and I feel like both levels had the same amount of both.” (13)(MX) “It felt like it was easier to follow the direc- (3)(SZ) “For the same reason as the difference I noted earlier be- tion of the sound in X” tween level Z and X. (In level Z it felt easier to hear where the sounds came from. During level X I could distinguish the direc- tion of crystals to some extent, but it was very hard to figure out if they were above, below or behind me.)” (14)(MX) “Level X felt more like a game where I could (4)(MZ) “I like the idea of the game, and I felt Z was a bit easier to use my ears to find the approximate correct direc- "understand" and locate than X.” tion, while I had to use more trial and error in Level Z.” (6)(MX) “I would prefer a little more of a challenge (5)(MZ) “Just easier to locate the crystals by sound. I don't really which the level X gave me. Though it was the first know why.” level I tried, I took more notes during that play through, which may have affected the time. Though people that like to just play the game with lower skill challenge and play the game less intense, I think they would prefer Z, though my choice would be X.” (7)(MZ) "As mentioned above, they felt very similar - but Z was easier to localize in and it felt like I could easier find my way to the crystals. Given this, I guess I slightly prefer Z, just because it feels good to being able to localize yourself in a dark space with nothing but sound. I would probably have the play the levels through to determine any particular "strong" preference between the two.”
You can also read