BINAURAL VERSUS STEREO AUDIO IN NAVIGATION IN A 3D GAME: DIFFERENCES IN PERCEPTION AND LOCALIZATION OF SOUND - LUDVIG WIDMAN - DIVA

Page created by Sam Larson

IT & Technique

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

BINAURAL VERSUS STEREO AUDIO IN NAVIGATION IN A 3D GAME: DIFFERENCES IN PERCEPTION AND LOCALIZATION OF SOUND - LUDVIG WIDMAN - DIVA

Binaural versus Stereo Audio in
  Navigation in a 3D Game:
 Differences in Perception and
     Localization of Sound

                Ludvig Widman

       Audio Technology, bachelor's level
                     2021

              Luleå University of Technology
      Department of Social Sciences, Technology and Arts

Ludvig Widman S0038F
mail@ludvigwidman.se Spring 2021

Abstract
Recent advancements in audio technology for computer games has made possible for
implementations with binaural audio. Compared to regular stereo sound, binaural audio
offers possibilities for a player to experience spatial sound, including sounds along the
vertical plane, using their own headphones. A computer game prototype called “Crystal
Gatherer” was created for this study to explore the possibilities of binaural audio imple-
mentation regarding localization and perception of objects that make sound in a 3D
game. The game featured two similar game levels, with the difference that one used
binaural sound, and the other stereo sound. The levels consisted of a dark space that
the player could navigate freely with the objective to find objects that make sound,
called “crystals”, as fast as they could. An experiment was conducted with 14 test sub-
jects that played the game, qualitative and quantitative data was collected, including
the time the players took to complete the game levels, respectively, and answers about
how they experienced the levels. A majority of test subjects reported that they per-
ceived a difference between the levels. No significant difference was found between the
levels in terms of efficacy of finding the objects that made sound. Some test subjects
stated that they found localization was better in the binaural level of the game, others
found the stereo level to be better in this respect. The study shows that there can exist
possibilities for binaural audio to change the perception of audio in computer games.

Ludvig Widman                                                                     S0038F
mail@ludvigwidman.se                                                          Spring 2021

Acknowledgements

I would like to thank to everyone who participated in and supported this study, and in
particular:

My supervisor, Jon Allan, whose enthusiasm and thoughtful feedback was of great help
to this study.

My kind and supportive classmates, who you could always turn to if you hit a snag,
thank you!

                                                                                         3

Ludvig Widman S0038F
mail@ludvigwidman.se Spring 2021

Table of Contents

Abstract .....................................................................................................................2
1. Introduction ............................................................................................................6
1.1. Theoretical Background ...................................................................................6
1.1.1. HRTF and Binaural Sound .........................................................................6
1.1.2. Resources in Technology for Implementing Binaural Sound in Computer
Games ................................................................................................................7
1.2. Previous Work on Binaural Audio in Navigation in Computer Games..................8
1.2.1. Findings in Efficacy and User Experience for Navigation Supported by
Binaural Sound, and Future Work ........................................................................9
1.3. Research Question.........................................................................................11
1.4. Purpose .........................................................................................................11
2. Method ................................................................................................................11
2.1. The Game Prototype “Crystal Gatherer”..........................................................11
2.1.1. Introduction to the Game Prototype .........................................................11
2.1.2. Construction of the Game Levels .............................................................12
2.1.3. Crystal Arrangement and Packaging of the Game.....................................17
2.1.4. Sounds for “Crystal Gatherer” ..................................................................17
2.1.5. Baseline Test ...........................................................................................18
2.2. Informal Pre-Study .........................................................................................19
2.3. Experiment Execution & Data Collection .........................................................19
2.4. Regarding the COVID-19 Recommendations ..................................................20
2.5. Research Ethics .............................................................................................20
2.6. Data Analysis Methods ...................................................................................20
3. Results & Analysis ................................................................................................21
3.1. Time Results for Levels X and Z ......................................................................21
3.1.1 Analysis of Time Results for Levels X and Z ..Fel! Bokmärket är inte definierat.
3.2. Level Difficulty Results ....................................................................................22
3.2.1. Level Difficulty Analysis ............................................................................23
3.3. Perception of Level Difference, Results and Coding ........................................23
3.3.1. Analysis of Qualitative Level Difference Perception ....................................26
3.4. Quantitative Level Preference Results .............................................................26
3.4.1. Quantitative Level Preference Analysis......................................................27
3.5. Qualitative Level Preference, Results and Coding ............................................27
3.5.1. Analysis of Qualitative Level Preference ....................................................29
3.6. Baseline Test Results and Overview................................................................29
3.6.1. Baseline Test Analysis..............................................................................30
4. Conclusion ...........................................................................................................30

Ludvig Widman                                                                                                   S0038F
mail@ludvigwidman.se                                                                                        Spring 2021

5. Discussion............................................................................................................31
6. References ...........................................................................................................32
7. Appendix..............................................................................................................34

                                                                                                                         5

Ludvig Widman S0038F
mail@ludvigwidman.se Spring 2021

Binaural versus Stereo Audio in Navigation in a 3D Game:
Differences in Perception and Localization of Sound
Degree project, bachelor’s level

by Ludvig Widman

1. Introduction
In recent years, audio technology for computer games has made notable advance-
ments (Farkaš, 2018), moving from standard stereo audio with relatively simple imple-
mentations to intricate technologies such as realistic reverb, occlusion and air absorp-
tion (Gustafson & Cancar, 2020). Today, there are freely available plug-ins for major
game development engines that makes available tools for building complex spatial au-
dio environments, in triple-A and indie-level games alike. Spatialized audio has become
an important aspect to consider when designing a new game, as it may give the player
a more life-like and engaging experience. In a study by Gustafson and Cancar (2020)
about localizing audio sources in a first-person shooter (FPS) game, binaural audio is
investigated as a way to enhance the experience of localizing sound sources, and per-
haps increase the performance of the player (as in more efficiently find where oppo-
nents are located in the game). The advancements in technology in binaural audio for
computer games, and the previous research presented in this thesis motivates further
work in the research area.

1.1. Theoretical Background
Our ability to localize sound can be broken down into interaural time difference (ITD), in-
teraural intensity difference (IID) and spectral cues (Cheng, & Wakefield, 1999). ITD is
defined as the difference in arrival time between the ears for a wavefront, and similarly,
IID is defined as the difference in amplitude between the ears for a sound. The spectral
cues refer to our ears and head’s filtering effect on a sound from a given angle. A
sound heard from the front, for example, is filtered differently by our pinna, than a
sound heard from the rear. Additionally, our head causes acoustic shadowing for
sounds incoming from relevant angles. All of these spectral cues, in conjunction with
the ITD and IID, we have learned to interpret, which is why we can lozalize sound with
relative accuracy. It is worth mentioning that our abilities to localize sound differ be-
tween individuals. For example, a person with degraded hearing may have more diffi-
culties in localizing sound due to not hearing fully the spectral cues.

1.1.1. HRTF and Binaural Sound
Binaural audio is a broad term for describing human hearing. A central concept to bin-
aural hearing is the head-related transfer function (HRTF). HRTF is a model for the sonic
characteristics of when sound is projected onto our heads and ears (Rumsey & McCor-
mick, 2009). Every angle of incidence produces a different timbre and time difference
between our ears. In traditional stereo audio, only level and time difference between the
two channels enable us to perceive direction. When the timbral differences of our ears
are present however, a more complex spatial reproduction emerges, these techniques
for mimicking HRTF conditions in audio reproductions are often called binaural (Rum-
sey & McCormick, 2009).

Ludvig Widman S0038F
mail@ludvigwidman.se Spring 2021

One way to record binaural sound is by using a so called “dummy head” (Georg Neu-
mann GmbH, 2020), made in a shape and material that has similar qualities to that of a
human head. Microphones are placed inside the artificial pinnae of the dummy head.
The pinnae are made to resemble those of an average person. Recorded sound with
the dummy head will have the filtering of the head and the pinna, and is intended to
produce a life-like sensation of hearing the sound source with one’s own ears. Of
course, every human pair of ears are different, and due to this, the binaural effect of a
dummy head may be partly or fully unheard by a listener, since they may have a drasti-
cally different filtering produced by their ears and heads. Using small microphones that
are inserted into the ears of a listener, custom HRTF recordings can also be made (ITA
HRTF-database, 2020). This is made by playing signals from speakers that move
around the listener in all angles, and recording the signals at the microphones. The re-
cording is then analyzed to produce a custom HRTF-profile. This makes for potentially
very believable spatial sound reproductions.

Binaural audio reproduction has its limitations, as stated by Rumsey & McCormick
(2014), as every person’s ears are different, mentioned above. Additionally, when the
visual cues are not present, the binaural audio can be confusing. Moreover, head-
phones have different equalizations and frequency responses which can distort the
HRTF’s frequency spectrum. Also, if a head-tracking solution is not incorporated (such
as in a VR-headset application) the cues that are created when one moves one’s head,
important for localization, are lost. Binaural audio over speaker listening also introduces
problems, as the natural crossfeed between stereo speakers, and room reflections, dis-
tort the phase and spectrum of HRTF-filtered sound.

In the case of this study, the word binaural is used in the sense that objects that make
sound are HRTF-filtered by a computer program. This is a limited aspect on binaural
audio, as real-life binaural audio has a larger range of factors impacting the sound, in-
cluding a listener’s unique physical properties such as head, pinnae and body shape.

1.1.2. Resources in Technology for Implementing Binaural Sound in
Computer Games
Steam Audio (Valve Software, 2020) supports HRTF rendering, ambisonic decoding,
auditory occlusion and reverb, and is available as a plugin both in game development
engines Unreal Engine and Unity, as well as in industry-standard audio middlewares
Fmod and Wwise. Another plugin is Resonance Audio, an open source project that
started at Google (Gorzel, 2019). It uses ambisonic technology and a repository of
head-related impulse responses (HRIRs) derived from the SADIE database (SADIE,
2020). The SADIE database makes available HRIRs of a Neumann KU100 dummy
head microphone. Resonance audio is available as a plugin for Unreal Engine and
Unity, Fmod and Wwise. Both Steam Audio and Resonance Audio plug-ins offer the
option to replace the HRIRs used, opening up the possibility for customized HRTF ex-
periences.

HRTF-related and binaural sound has made great progress with the advancement of
computing power made available for audio processing in games. Compared to sur-
round sound, utilizing multiple speakers and complicated audio interfaces, binaural au-
dio can use two audio channels (as used for regular stereo audio) and, in most cases,
headphones. This is a relatively inexpensive and simple way for a consumer to experi-
ence spatial audio. The recent technology enabling the experience of localized sound in
games poses questions about how binaural audio can be used as a tool for in-game

Ludvig Widman S0038F
mail@ludvigwidman.se Spring 2021

navigation and spatial awareness. In the previous work, reviewed below, three separate
studies are considered, where researchers have made different implementations for
binaural audio, showing possibilities for binaural sound as support for navigation in a
variety of games.

1.2. Previous Work on Binaural Audio in Navigation in Computer Games
In a study part of the Inclusive Game Design research, Bergqvist (2014) investigates the
difference between using a system with stereo or binaural audio in an auditory display
while navigating a computer game (Bergqvist, 2014). An important aspect of the study
is to investigate ways to make computer games accessible to all, including those with
visual impairment. An experiment is constructed for the study and is tested on 22 test
subjects, that are able to see, but are using blindfolds. The experiment makes use of
two prototypes, which are games with stereo and binaural audio respectively. The
game displays a room, and the player is meant to find different objects that make
sound within the room. The author hypothesizes that it should be possible to increase
the speed and accuracy of the user’s navigation in a non-visual system by using binau-
ral audio. An experiment is conducted, comparing the two prototypes that were similar
in every aspect except the sound. The games are run on a tablet device with a screen
showing different rooms filled with objects. In the game, while displaying a room, some
objects make sound, such as a purring cat. The player is instructed to find different ob-
jects through dialogue in the game. When the player runs their finger across the screen,
the closer the object they get, the stronger the sound produced. When the players’ fin-
ger is located on an object, a notification sound is played, to let them know that they
have found it. In the binaural version, the sounds are HRTF-filtered, which, according to
the hypothesis of the paper, may help the test subject find the object faster.

The participants in the study were divided into two groups. One group got to play the
binaural version of the game, while the other played the stereo sound version. Several
aspects of the playing of the game were recorded, such as the amount of touches on
the screen, and time elapsed in each room of the game. Upon analyzing these results,
no significant difference was found in efficacy of interaction between the groups. It is
discussed that this might be because the participants did not perceive the binaural
sound, or that they relied more on the volume difference when finding the objects. It is
also mentioned that some test subjects did not understand what the sounds in the
game were meant to represent, such as when the sound of a radio were interpreted as
background noise.

In another study by Drossos et al. (2015) which has a focus on games for blind or visu-
ally impaired people, a computer game resembling the well-known game of Tic-Tac-
Toe is constructed. The authors argue that computer games are a large part of the so-
cial lives of children and those with visual impairment largely are left out of this social
context. The study has a clear focus on blind or visually impaired groups of young us-
ers, as children with visual impairment participates in the study, and impart their spe-
cific experiences on the result. The game uses binaural sounds to display information
about the state of the game to the user, as a blind person needs to hold the current
game state in memory. To help them revisit the game state, sounds are played to in-
form them when they move the cursor on the game board. A visual interface is also
constructed to be of help to those with a partial visual impairment. In contrast to
Bergqvist (2014), this study does not compare this game to a version with regular ste-
reo sound. The game was played by twelve test subjects and interviews were con-
ducted afterwards. The result of the interviews and the following analysis was that a

Ludvig Widman S0038F
mail@ludvigwidman.se Spring 2021

significant majority of the test subjects found the game interesting and that the interface
of the game was useable. The authors discuss that the interviews also showed that the
result is highly affected by the difference in spatial awareness between visually impaired
people, and that there is a need for direct audio feedback of any change in the game,
in order for it to make sense to blind people.

In a study by Gustafson & Cancar (2020), the authors investigate how binaural audio
changes the perception of localization by players in a first-person shooter (FPS) game.
An example is given that players of a popular online FPS game are finding it difficult to
know where their enemies are, if they are above or below the player. A 3D game is
constructed for the study in two versions, one with regular stereo sound and one with
binaural sound. The goal of the game is to locate sound sources in different vertical lev-
els. Test subjects play the both versions of the game and both qualitative and quantita-
tive data is collected. Data is collected quantitatively while playing the game and via
qualitative interviews after the test subjects have finished playing. The constructed
game takes the form of a 3D model of a multi-story building that the player can walk in-
side of. There are four stories in the building, including a cellar which is below the
player. The vertical levels are joined by a staircase, which acts as an opening for sound
to travel between the floors, however, the player is restricted to only be able to move in
the ground floor of the building. This is, according to the authors, because the player is
meant to answer questions about which story they think the sound is coming from. The
FPS-standard controls of using the computer keyboard keys “WASD” for walking, and
mouse movement for looking around is applied in the game. The game is made with
the Unity game engine and Steam audio to enable binaural sound. The sound of foot-
steps was recorded and edited to play in a loop inside of the game, and was placed on
different stories for each round of the game. The test was conducted twice for each
test subject, once with stereo and once with binaural sound. The test subjects were not
allowed to know what was tested and what was changed during the test. In the game,
questions were posed, asking which story the sound seemed to be coming from. The
test subjects then gave their answer verbally to the researcher conducting the test.
The result from the quantitative data was that the players did not significantly localize
the sound better with binaural sound. The qualitative interviews showed that a majority
of the players found it confusing to locate the sound in the game. It is discussed that
since all test subjects used their own computers and headphones for the test, results
may have been impacted, as the players might not have fully perceived the binaural
sound. Furthermore, the authors discuss that if the sound source (the footsteps) had
moved around in the game, the players might have more easily localized it, because of
the additional changes in binaural timbre.

1.2.1. Findings in Efficacy and User Experience for Navigation Supported
by Binaural Sound, and Future Work
Binaural audio, a central concept in all three studies above, can convey information
where objects are located in both a two- and three-dimensional system, like a regular
stereo recording, but unlike stereo, binaural audio has a more detailed positioning of
sound, including localization on the vertical axis. Described above are three different
studies that all have investigations into navigation supported by binaural sound. The
three studies bear a resemblance in their way to use binaural audio to convey infor-
mation about the location of game elements in the game environment, but the game
dynamics of the studies’ respective games are very different. Berqvist uses a “find the
spot on the screen”-type of interaction, while Drossos et al. (2015) uses a game that is

Ludvig Widman S0038F
mail@ludvigwidman.se Spring 2021

more like a strategic tabletop game. Gustafson and Cancar (2020), however, use bin-
aural audio in the context of a FPS game.

As shown by Drossos et al. (2015), differences in spatial awareness between test sub-
jects vary to a large degree for non-sighted or visually impaired people. Comparably, as
seen in the study by Bergqvist (2014), sighted test subjects using blindfolds also differ
in their perception of binaural sound. Some do not perceive the binaural effect, perhaps
due to differences in pinna between people, and models used by binaural sound pro-
cessing computer programs. Gustafson and Cancar (2020) also describe that the hard-
ware and software setup used by a potential player of a game with binaural audio can
impact the perception of the binaural audio. For example, the quality of headphones
might influence the result, or as Farkaš (2018) mentions, that unless treated with spe-
cial filtering, binaural audio does not translate well over speakers. In the study by
Drossos et a. (2018), it is mentioned that binaural audio does not work equally well for
all directions and sounds. This would seem to be plausible since binaural audio use fil-
ters that make minute changes to a sounds’ frequency spectrum (Rumsey & McCor-
mick, 2014), and for example, sounds poor in overtones (a sinewave-like sound for ex-
ample) has little material for the binaural filter to work with, and thus it is harder for a hu-
man to interpret the intended direction of the sound: Cheng & Wakefield (1999) states
that a sound needs to be broadband and contain frequencies over 7 kHz to be possi-
ble to localize on an elevation in the median plane.

In the three studies, significant evidence of binaural audio heightening efficacy of navi-
gating within a game cannot be observed. Both studies by Bergqvist (2014) and Gus-
tafson and Cancar (2020) conclude that there are no results that can be interpreted as
binaural sound being more efficient than stereo, with respect to localizing sound
sources. However, in terms of gaming experience, the test subjects in Gustafson and
Cancar’s (2020) study preferred the version with binaural sound. Bergqvist (2014) did
not account for preference between the stereo and binaural version, since the study
used an un-paired t-test, and the test subjects did not play both versions of the game.

The game model used by Gustafson and Cancar (2020) raises questions whether a
better a method exists to determine if binaural audio can improve navigation in 3D
computer games. In their game, the player was constrained only to move on the
ground level of the building. The objective was then to guess which floor the footsteps
were coming from. This is quite a hard task, even in reality, as reflections and absorp-
tion by walls, floor and ceiling can produce a very complex auditory situation. For ex-
ample, one can appreciate the difficulty in localizing the sound of a neighbor’s music,
playing from an adjoining apartment. If the player could move freely and navigate to-
wards the sound source in the game by Gustafson and Cancar (2020), and in so doing,
validate their guesses, it is possible a difference would show in the results regarding ef-
ficacy of navigation, between the binaural and stereo audio version.

Bergqvist (2014) suggests future work could investigate how binaural audio could make
a difference in navigation a three-dimensional environment. This, together with the
questions posed by Gustafson and Cancar’s (2020) study suggests that studies about
the experience of navigation in a three-dimensional environment, with stereo and binau-
ral audio respectively, would be relevant to consider. It is also notable that there is, in
these studies, relatively little said about the subjective experience of navigation with bin-
aural sound, as it may change how the game is perceived when played.

Ludvig Widman S0038F
mail@ludvigwidman.se Spring 2021

1.3. Research Question
The research question for this thesis is how binaural audio in a 3D computer game can
change a player’s perception or game experience, if binaural audio is preferred over
standard stereo, and if binaural audio could aid the player in localizing objects that
makes sound. To inform this research question a game prototype with binaural and
stereo sound respectively will be created. In the form of a hypothesis, the research
question may be stated as: The binaural version of the game prototype will be preferred
by the test subjects, improve their gaming experience and aid them in localizing the ob-
jects in the game. Test subjects will play this game and the data collected will be ana-
lyzed to confirm or deny this hypothesis.

1.4. Purpose
In the previous work described in detail in this paper, there are no significant results
which points to binaural audio being more effective in localizing objects in a game, or
that the binaural audio alters the gaming experience in a considerable fashion. How-
ever, drawing on the results shared in these studies, and using a different approach to
investigate this area, there can be other research outcomes. The experiment design is
a significant part of this study, as it is intended to build on the experiences of the previ-
ous work in this research area, and be ecologically valid to a large degree.

2. Method
2.1. The Game Prototype “Crystal Gatherer”

2.1.1. Introduction to the Game Prototype
As stimuli, the study uses a prototype in the form of a 3D game with two levels with
binaural and stereo audio respectively. The game consists of a space that the player
can navigate freely, floating or flying, using standard WASD-and-mouse computer con-
trols. The objective is to navigate to objects, resembling crystals, that emit sound, and
“gather” them, thus confirming where they are in the space, as fast as possible. The
game outputs the time elapsed to finish the level at the end of the game. The location
of the objects differs between the stereo and binaural versions of the game, so that the
players cannot memorize their locations. The game uses a very dark space, and a lim-
ited light radius around the player, so that the player only sees what is in their proximity,
thus using their hearing in a more active way to find objects. The player is only be able
to move slowly, so that the potential strategy to rapidly fly at random to find the objects
is discouraged. As mentioned above, the game prototype is meant to be ecologically
valid, in the context of a computer game. However, the goal of the game prototype is
not to resemble the most popular FPS games of today, but rather to look and feel like a
commercial- or indie-level computer game in terms of visual interface, controls,
graphics and sound. This is partly achieved by using a popular game engine (Unreal
Engine, 2021) and give attention to developing the graphical and audible aspects of the
game.

In addition, the sound in the game, emitted by the objects, is made to be broad-band,
in order to efficiently be HRTF-filtered (Cheng & Wakefield, 1999). The sounds are also
intended to be engaging, reminding of a commercial game sound design that the play-
ers are used to. One half of the crystals has droning, continuous sounds, and one half
emits intermittent bursts of sound, so that both types of sound emission impact the

Ludvig Widman S0038F
mail@ludvigwidman.se Spring 2021

test. The game starts with a training level where no crystals make sound, to let the
player learn the game controls and prepare them for the objective of the game. The or-
der of the binaural and the stereo level is also switched for each test subject, so that
the effect of the order is negated.

2.1.2. Construction of the Game Levels
The game was built on the First Person Example Map template, provided by Unreal En-
gine version 4.26 (Epic Games, 2021) in the standard installation of the game engine.
Resonance Audio v.1.0 (Resonance Audio, 2021) was activated as plugin to the Unreal
Engine project. The choice of Unreal Engine as platform was influenced by the author’s
previous experience of development with this game engine. Resonance Audio was
chosen because of its comprehensive documentation and compatibility with both Win-
dows and Macintosh computer systems.

For the movement, the player was set to play as the Unreal engine asset Spectator
Pawn and the max speed parameter was set to 500 in order to enable slow movement.
The Spectator Pawn enables free movement in the game space, using WASD and
mouse control. Additionally, it features collision, which is used to keep the player inside
the bounds of the game space and also notify the game when a crystal has been ac-
quired. A point light was attached to the Spectator Pawn to enable a spherical light that
lets the player see the crystals when they are in their immediate proximity.

A game menu was created, with a looping soundtrack. This menu provides access to
the Trial Level, and the levels X and Z (which are the levels with binaural and stereo
sound respectively).

Figure 1. The menu screen for the game prototype.

The levels (trial level excluded) have a basic shape of a rectangular room with sides that
are 8000 Unreal units long, and a height of 5000 Unreal units. The walls were then clad
in polygon shapes, which make up the cave-like space. The result was a roughly
spherical space that is enclosed by the outer walls. The polygon shapes' collision pa-
rameters are set to limit the players movement, so that the player cannot end up out-
side the room.

Ludvig Widman                                                                            S0038F
mail@ludvigwidman.se                                                                 Spring 2021

Figure 2. Wireframe view of the geometry making up the space in the game, with the rectangular
room and the cladding of polygon shapes. The magenta shapes are the crystals placed in the
game level.

At the start of the game, a timer is activated in both levels, and the count of elapsed
seconds are shown on the screen. An Unreal event checks continuously if all crystals
are eliminated. Upon eliminating (or “gathering”) all crystals, the game ends and the
timer stops. When the level is completed, the game pauses and the time elapsed
shows on the screen. A message on the screen tells the player to note down the time
elapsed, in order to fill in the survey form.

Figure 3. Overview of a crystal, with its diamond-like shape and sound component in the list to
the left.

                                                                                              13

Ludvig Widman S0038F
mail@ludvigwidman.se Spring 2021

The crystals are comprised of a diamond-like shape and has a Sound Cue embedded
in them that enables them to play a looping sound. The crystals in the binaural level
share an Attenuation Settings file that enables the Resonance Audio plugin and con-
trols the properties of the sound playback, such as volume and spatialization. The crys-
tals in the stereo level have a corresponding Attenuation Setting. Both the binaural and
stereo settings use a shared Falloff Distance. The Falloff Distance dictates the change
in volume as the player moves closer or further away from a sound source. This allows
a setting for the crystals’ sounds to be heard from a large portion of the space that the
player can move in. The Spatialization Method, that sets the sounds spatial playback
function, is set to Panning for the stereo Attenuation Setting and Binaural for the binau-
ral Attenuation Setting.

Figure 4. Wireframe view of the game level with falloff distance visualized by the yellow circle
around the crystal in the game.

Furthermore, both Attenuation Settings use the Natural Sound setting for Attenuation
Function. This provides a volume falloff that is meant to be similar to the physical vol-
ume falloff (Epic Games, 2021). In other respects than Spatialization Method and the
properties described above, the settings are left to their default value, provided by the
game template First Person Example Map, see figure 5 and 6.

Ludvig Widman                                                               S0038F
mail@ludvigwidman.se                                                    Spring 2021

Figure 5. Attenuation settings for the binaural crystals in the game.

Figure 6. Attenuation settings for the stereo crystals in the game.

                                                                                15

Ludvig Widman                                                                              S0038F
mail@ludvigwidman.se                                                                   Spring 2021

The settings file for the Resonance Audio plugin are set to HRTF as Spatialization
Method. The Pattern and Sharpness parameters are left their default values, generating
an omnidirectional sound source. In the plugin’s settings, the Quality Mode is set to
Binaural High Quality.

Figure 7. Settings for Resonance Audio plugin.

Figure 8. Flowchart diagram of settings for audio in Unreal Engine for "Crystal Gatherer".

                                                                                               16

Ludvig Widman S0038F
mail@ludvigwidman.se Spring 2021

2.1.3. Crystal Arrangement and Packaging of the Game
The sound crystals are arranged around two cubes (invisible in the game), so that the
distance between them is the same between both versions of the game. However, in
the stereo version, the cube is rotated and the sound sources shift positions, so that
the player cannot memorize the locations. Since it is dark in the game, and very hard to
memorize the characteristics of the space, this amount of sound source location ran-
domization was deemed adequate.

The game was built and packaged as an executable for Windows operating system,
and the option Include Prerequisites was checked in the Packaging Settings in Unreal
Engine. This provides an installer to the game that checks whether necessary programs
to run the game exist on the players computer, and offers to install them, thus increas-
ing the usability of the game.

2.1.4. Sounds for “Crystal Gatherer”
The sounds in the game were constructed to be evoke interest in the player and con-
tribute to the ecological validity of the game. The sounds for the crystals are seamless
loops of synthesized sounds, created to be broad in frequency spectrum and rich in
texture. This contributes to sounds being possible to localize on an elevation in the me-
dian plane (Cheng & Wakefield, 1999). All of the sounds created are rich in overtone
harmonics to provide a good material for the HRTF to work with. Half of the six sounds
are continuous droning sounds and the other half are intermittent, burst-like sounds.
They were all mastered to have a similar frequency spectrum and a loudness of -14
LUFS. The sounds are monophonic to correspond with the point source sound in the
game.

Figure 9. Overview of the spectrum of each sound used for the crystals. Three are constant
droning and three are intermittent sounding. A similar spectral balance can be observed be-
tween the sounds. The sounds are mastered to have a loudness of -14 LUFS.

A short sound for when a crystal is “gathered” was also added for conveying feedback
of game mechanics to the player. This sound was synthesized from filtered noise, mak-
ing a “swoosh”-like sound. This sound plays when the player comes in contact with a
crystal and “gathers” it. The sound is not localized in the game, and uses the Play

Ludvig Widman S0038F
mail@ludvigwidman.se Spring 2021

Sound 2D-function in Unreal Engine. A music piece for the menu of the game was also
created and mastered to -14 LUFS.

2.1.5. Baseline Test
In the findings in the previous work, presented above, there has been doubts whether
the game itself or some problem in perceiving binaural audio may have impacted the
results. Therefore, a baseline test was constructed to gain information whether the lo-
cation conveyed by the binaural audio is perceived by the player. The baseline test fea-
tured two recordings of sound played from different locations in the game setting fol-
lowed by questions to verify which locations the sound originated from. The baseline
test was taken by the test subjects after the experiment, so that it did not introduce
preconceptions about the theme of the experiment. The sound for the baseline test is
that of a voice saying “Strawberry”. The sound was equalized to have a full spectrum
and clear treble and was mastered to a loudness of -14 LUFS. The sound was rec-
orded from Unreal Engine in two positions, above and to the left, and behind, with the
Resonance Audio plugin activated. Four repetitions of the word were recorded for each
position, with very slight alterations in viewing angle in between, to introduce small vari-
ations in the HRTF filtering effect. A version of the sound without the HRTF-filtering from
Resonance Audio was also included as a reference.

Figure 10. Locations of the baseline test sounds. The player is represented by the image of the
lightbulb, located in the center of the cube, used as a guide grid. The spheres represent the
sound. In the image to the left, it is behind the player, and in the image to the right, it is above
and to the left of the player.

Figure 11. The multiple-choice answers for the baseline test in the Google Forms (Google, 2021)
survey.

Ludvig Widman S0038F
mail@ludvigwidman.se Spring 2021

2.2. Informal Pre-Study
Test feedback was given in an informal setting with four colleagues in the Sound Engi-
neering Programme. Two of them pointed out that the game was very hard to finish. To
aid the player a little, and prevent fatigue, the light radius of the player, which allows
them to see the crystals that are close by, was extended somewhat. One player in-
formed that they found it difficult to understand their movements in the complete dark
of the levels. To lessen this confusion, a random scattering of small triangle shapes was
put in the levels. They only show in the immediate proximity of the player, and gives vis-
ual cues about the players movement, but does not contribute to the localization of the
crystals. Another player suggested that the menu could have music, so that the player
automatically adjusts their volume when entering the game. This was also imple-
mented. Another comment was that the game needed to specify that headphones
should be used, thereafter, a high visibility banner with the text "Use headphones!" was
put into the menu.

In conversation with supervisor J. Allan (personal communication, 9 March -21), after
Allan had been testing the game, it was brought up that the panning in Unreal Engine
seems to drop in perceived volume when a sound is played from the front, both in the
stereo and the binaural level. This same phenomenon could be experienced on two dif-
ferent computers with different headphones. It would seem that this is an inherent
property of the game engine, and modifications to the sound plugins both for stereo
and binaural sound would have to be implemented to amend this. Considering the
scope of this project this potential source of error was accepted into the experiment
design.

2.3. Experiment Execution & Data Collection
To replicate the setting in which computer games are played, the game was played on
a computer with headphones that the test subject freely could adjust the volume on.
The test equipment at the tests taking place at Piteå Musikhögskola, computer lab
L131, featured a Dell workstation computer, mouse and keyboard, and Beyerdynamic
DT-770 closed-back headphones. This was meant to resemble a typical gaming envi-
ronment. The distance tests used the participants own hardware setups. Seven test
subjects out of the fourteen in total chose to participate remotely.

The test was executed by giving the player instructions through a Google survey form
(Google, 2021), which informed the test subject about what level in the game to play
first. After completing a level (Trial Level excluded) the player was asked to return to the
form to fill out questions about the level they just had played, before continuing to the
next. After the game levels were completed, and the questions regarding these were
answered, the test subject was asked to play the sounds for the baseline test.

The baseline test was taken after playing the game levels. The test subjects were first
asked to play the test sound as reference (as it was not HRTF-filtered), and then the
two binaural sounds. The test subjects were informed that they only were permitted to
listen to these once (each test sound was repeated four times in the recording).

To avoid the result being affected by preconceptions about the audio technology, the
difference between the two game levels was not mentioned in the survey. Considering
test subject demography, test subjects with some experience of first-person games
were encouraged to participate. A question about the test subjects’ experience with

Ludvig Widman S0038F
mail@ludvigwidman.se Spring 2021

first-person perspective computer games was asked, to make sure that the partici-
pants were familiar with the controls for this type of game. The test subjects were stu-
dents at Piteå Musikhögskola and contacts on the authors social media platforms.

Both quantitative and qualitative data were gathered from the test subjects’ interaction
with the game in order to inform the research question. Quantitate data was gathered
by measuring the time the players used to gather the objects. Qualitative data was col-
lected by letting test subjects write about their experiences playing the levels. The table
below is an overview of the data collected in the survey:

Table 1. Data collection overview.
Time elapsed for completing Level X Level Z
each level (data acquired from
the game)
Level of difficulty, Level X/Z 1 (very easy) 2 3 4 5 (very hard)
Did you experience any differ- Yes No
ence between the levels?
If, yes, please explain the differ-
ence using a few sentences.
Preference 1 (Strong pref- 2 3 (No 4 5 (Strong
erence for X) prefer- preference
ence) for Z)
Please explain your preference,
or lack thereof. Please answer
using a few sentences.
Where does Sound 2/3 appear Multiple
to come from? (Baseline Test) choices of di-
rections.
How long have you been play- One year or Two
ing First-Person Perspective less. years or
computer games? more.

2.4. Regarding the COVID-19 Recommendations
In order to follow the recommendations by the Swedish authorities and Folkhälsomyn-
digheten (www.folkhalsomyndigheten.se) the test was designed to work from the test
subject’s own computers, should they choose to participate in the test remotely. The
test on location was carried out with social distancing and the equipment was disin-
fected between each test subject.

2.5. Research Ethics
The test participants were informed of the conditions of the test, and their free choice
to end their participation at any time during the test. The test was conducted according
to the guidelines of Vetenskapsrådets Forskningsetiska Principer (2002). The game
used in the test contains no depiction of violence that might be stressful or harmful to
test participants.

2.6. Data Analysis Methods
A paired t-test, two-tailed, was used to analyze the time data to see whether a signifi-
cant difference exists between the game levels. A null hypothesis of 0 was used, thus

Ludvig Widman S0038F
mail@ludvigwidman.se Spring 2021

assuming that no significant difference between time elapsed for the levels exist. Fur-
thermore a a-value of 0.05 was used.

The qualitative data content was analyzed by using a grounded approach (Denscombe,
2016). Themes in the answers between the test subjects were coded and categorized.
The answers then were put into a table, sorting them by code, where after common
phenomena could be identified between the answers. The answers from those test
subjects who chose to answer in Swedish were translated to English. Some typing er-
rors identified in the answers were also corrected for readability. Translation and editing
were done with the intention to not distort the answers. The full, unedited, answers are
available in the Appendix of this paper.

3. Results & Analysis
16 test subjects participated in the test. Two test subjects were excluded from the re-
sults, one because of demographic selection, as this test subject reported less than a
year’s experience with first-person perspective computer games. The other participant
was excluded because of reported technical issues playing the game (lag), which
greatly hindered their playing of the game. See 7. Appendix for the full survey results.
The binaural level is referred to as “Level X” and the stereo level is referred to as “Level
Z”. The order of which the test subjects played the levels results in two groups of test
subjects, they are referred to as “X Group”, that played Level X first, and “Z Group” that
played Level Z first. In the following table and diagrams, areas marked in blue are “X
Group”, that played Level X first, and green are “Z Group” that played Level Z first.

3.1. Time Results for Levels X and Z
Table 2. Time results for completing Levels X and Z. Results marked in blue are “X Group”, that
played Level X first, and green are “Z group” that played Level Z first.
Test Subject no Time Elapsed Binaural Time Elapsed Stereo
Level (Level X) (in sec- Level (Level Z) (in sec-
onds) onds)
1 148 68
2 104 100
3 114 64
4 304 274
5 184 88
6 300 134
7 238 214
8 88 110
9 244 232
10 114 232
11 80 140
12 232 124
13 218 272
14 156 172
Mean 180 s 159 s
Median 170 s 137 s

Ludvig Widman                                                                                                            S0038F
mail@ludvigwidman.se                                                                                                 Spring 2021

                                           3.1.1 Analysis of Time Results for Levels X and Z
A paired t-test, executed in Excel (Microsoft, 2021) rendered a P-value of 0.31 (two
tailed), which is larger than the a-value of 0.05 used in this study. Therefore, the null-
hypothesis could not be rejected, and no significant difference between the elapsed
times for the levels can be stated.

                                   3.2. Level Difficulty Results
Table 3. Level difficulty results. Results marked in blue are “X Group”, that played Level X first,
and green are “Z group” that played Level Z first.
 Level X Difficulty                          Amount of test subjects (X Group)   Amount of test subjects (Z Group)
 1 (Very easy)                               0                                   0
 2                                           0                                   5
 3                                           4                                   1
 4                                           3                                   0
 5 (Very hard)                               0                                   1
 Median                                      Difficulty 3                        Difficulty 2
 Level Z Difficulty
 1 (Very Easy)                               1                                   0
 2                                           4                                   1
 3                                           1                                   3
 4                                           1                                   3
 5 (Very hard)                               0                                   0
 Median                                      Difficulty 2                        Difficulty 3

                                                            Level X Difficulty

                               6
     Amount of test subjects

                               5

                               4

                               3

                               2

                               1

                               0
                                     1 (Very easy)          2             3                4       5 (Very hard)

                                                                X Group   Z Group

Figure 12. Perceived difficulty for Level X.

                                                                                                                             22

Ludvig Widman S0038F
mail@ludvigwidman.se Spring 2021

Level Z Difficulty
5
Amount of test subjects

0
1 (Very easy) 2 3 4 5 (Very hard)

X Group Z Group

Figure 13. Perceived difficulty for Level Z.

3.2.1. Level Difficulty Analysis
Perceived level difficulty can be viewed as centered around the middle difficulty. As can
be seen in the median of the groups X and Z, the test subjects found their first level
played somewhat more difficult. There does not seem to exist a clear difference be-
tween the levels’ perceived difficulty with regard to stereo or binaural sound.

3.3. Perception of Level Difference, Results and Coding

Did you experience any difference between the
levels X and Z?

No; 2

Yes; 12

Figure 14. Amount of test subjects that experienced a difference between the levels.

Ludvig Widman                                                                        S0038F
mail@ludvigwidman.se                                                             Spring 2021

The qualitative answers from the test subjects explaining their perception of difference
between the game levels were categorized into five codes, derived from themes identi-
fied in the answers, see table 4.

    • Code: Vertical aspect
Answers that from point to a perceived difference in localization along the vertical axis.
    • Code: Easier to locate in X
Answers that points to Level X being perceived to be easier in localizing the crystals.
    • Code: Easier to locate in Z
Answers that points to Level Z being perceived to be easier in localizing the crystals.
    • Code: Learning factor
Answers that reflects upon that the player has learned the interaction with the game in
the previous level.
    • Code: Lack of perceived difference
This code is used for answers that express lack of perceived difference.

                                                                                         24

Table 4. Code Table: Perceived Level Difference. Results marked in blue are “X Group”, and green are “Z group”.
(1)=test subject no. (N)=Did not perceive a difference between the levels.
 Code: Vertical aspect                                       Code: Easier to locate in X                                 Code: Easier to locate in Z                                   Code: Learn-      Code: Lack of perceived differ-
                                                                                                                                                                                       ing factor        ence
 (8) ”It was easier to find the crystals in X, because       (8) ”It was easier to find the crystals in X, because       (3) “In level Z it felt easier to hear where the sounds       (13) “X felt      (9)(N) ”I did not.”
 one could easier understand where the crystals              one could easier understand where the crystals              came from. During level X I could distinguish the di-         somewhat
 where vertically.” (translated from Swedish)                where vertically.” (translated from Swedish)                rection of crystals to some extent, but it was very           easier and
                                                                                                                         hard to figure out if they were above, below or behind        had bit clearer
                                                                                                                         me.”                                                          sound direc-
                                                                                                                                                                                       tion could also
                                                                                                                                                                                       be because I
                                                                                                                                                                                       played X last.”
 (11) “In level Z it was difficult to hear if the crystals   (10) “I thought that Level X was much easier, be-           (4) “I found Z easier in tracking down the crystals with                        (2)(N) “I didn't feel a difference be-
 were above or under me. So I used a method                  cause I had gotten the hang of how to play and              the sounds. I don't really know why, but I found it                             tween the levels, I noticed a big
 where I traveled towards the sound to make the              where the crystals tended to be when I heard                easier to navigate in this level.”                                              difference in how hard it was to
 sound as loud as possible, then I tried going up or         them in different ways.” (translated from Swedish)                                                                                          find them depending on if the
 down based on guessing, to see if it was there. In                                                                                                                                                      sound source was constantly
 Level X This was different. I thought I could locate                                                                                                                                                    making a sound or if it was pul-
 the sound by looking up or down which made                                                                                                                                                              sating instead. Pulsating sound
 things easier.”                                                                                                                                                                                         was harder to locate.”
 (12) “I perceived a difference in height where the          (11) “In level Z it was difficult to hear if the crystals   (5) “In Z it was easier to locate sounds for some rea-
 crystals in X had greater distance in levels up or          were above or under me. So I used a method                  son. In X I thought that the sounds originated from
 down. There may have been less lighting in X as             where I traveled towards the sound to make the              above where the crystals were.”
 well but I am unsure.”                                      sound as loud as possible, then I tried going up or
                                                             down based on guessing, to see if it was there. In
                                                             Level X This was different. I thought I could locate
                                                             the sound by looking up or down which made
                                                             things easier.”
                                                             (13) “X felt somewhat easier and had bit clearer            (6) “Some of the sounds differed that was being
                                                             sound direction could also be because I played X            used, though some stayed the same. Otherwise
                                                             last.”                                                      mostly that level Z seemed to be louder. It was easier
                                                                                                                         to hear the different crystals from anywhere in the
                                                                                                                         room, so perhaps the decay curve of distance was
                                                                                                                         longer in Z, in that the sound could travel longer more
                                                                                                                         than it actually being louder. Though it might have
                                                                                                                         been it being louder.”
                                                             (14) “Difficult to pin-point, but it felt like the sound    (7) “Z felt easier to localize in, it might've been due to
                                                             in level X was more three-dimensional, making it            the positioning of the crystals - or I was just more
                                                             easier to locate the crystals.”                             used to the sounds after playing level X to begin with.

                                                                                                                         It was hard to say whether or not there was anything
                                                                                                                         particularly different between the levels - or if it due to
                                                                                                                         the experience of having played X that made Z eas-
                                                                                                                         ier.”

Ludvig Widman                                                                                                            S0038F
mail@ludvigwidman.se                                                                                                 Spring 2021

                                        3.3.1. Analysis of Qualitative Level Difference Perception
A majority of test subjects answered that they perceived a difference between the
game levels. Comparable to the results in table 5 there are answers that seem to indi-
cate that the test subjects find their last level played different in that it was easier to
complete. There are three answers which point to test subjects noticing a difference in
localization in the vertical axis in Level X, two of these seem to find the localization in
the vertical to be better using binaural sound.

There are five answers that indicate a perception of Level Z to be easier, one of these
notes that Level Z seemed to be louder, and this made localization easier. The percep-
tion of increased loudness would seem to be correct, since the sounds in Level Z are
not HRTF-filtered, which in some angles causes attenuation in specific frequency bands
to occur; without HRTF-filtering the sum of the crystals’ sound in a given position in the
game would be louder, even though the same .wav-files are used in both levels.

                                3.4. Quantitative Level Preference Results

                                                         Level Preference
                            4
  Amount of test subjects

                            3

                            2

                            1

                            0
                                      Strong         Moderate      No preference    Moderate           Strong
                                 preference for Z preference for Z               preference for X preference for X

                                                             X Group     Z Group

Figure 15. Level preference.

                                                                                                                             26

Ludvig Widman S0038F
mail@ludvigwidman.se Spring 2021

Simplified Level Preference
5
Amount of test subjects

0
Preference for Z No Preference Preference for X

X Group Z Group

Figure 16. Simplified level preference: All test subjects with some preference for either level was
put into their corresponding category. Test subjects with no preference were put into “No Pref-
erence” category.

3.4.1. Quantitative Level Preference Analysis
As can be seen from figure 15 there is a relatively even distribution of preference for
each level, with five test subjects with any preference for either level. Two test subjects
had no preference. There may be a connection between the order of levels played and
preference, since most of the test subjects who preferred Level X played Level Z first,
and vice versa. This could be a result of the players getting familiar with the sounds in
the game, and also having familiarized themselves with playing the dark levels in the
game.

3.5. Qualitative Level Preference, Results and Coding
The answers from the test subjects explaining their preference were coded into five
codes, derived from themes identified in the answers, see table 5.

• Code: Confusion in Z
Answers from test subjects who seem to find the localization in Level Z confusing in the
vertical axis.
• Code: X being easier/more fair
Answers that points to level X being easier, either in localizing the crystals or some
other factor.
• Code: Z being easier/more fair
Answers that points to level Z being easier, either in localizing the crystals or some
other factor.
• Code: Lack of preference due to lack of perceived difference
This code is used for answers that seem to express lack of preference, due to not find-
ing a clear difference between the levels.
• Code: Learning factor
Answers that reflects upon that the player has learned the interaction with the game in
the previous level.

Table 5. Code Table: Level Preference. Results marked in blue are “X Group”, and green are “Z group”.
(1)=test subject no. (SX)=Strong preference for X, (MX)=Moderate preference for X, (NP)=No preference, (MX)=Moderate preference for Z, (SZ)=Strong prefence for Z.
 Code: Confu-         Code: X being easier/more fair                                 Code: Z being easier/more fair                                             Code: Lack of preference due to        Code: Learning
 sion in Z                                                                                                                                                      lack of perceived difference           factor
 (8)(SX) “I experi-   (10)(SX) ”I liked that it [Level X] was a bit easier. I also   (12)(MZ) “It was difficult to know where the sound was coming              (9)(NP) ”I did not”                    (1)(MZ) “I had an
 enced that one       think that the level was a bit easier because I got            from as there are bigger differences in the sound level on a hori-                                                easier time on Z,
 became con-          stuck a few times on Level Z, which was annoying.”             zontal plane than vertical. I didn’t feel as if the increased difficulty                                          but I also think
 fused much           (Translated from Swedish”)                                     was as gratifying as I had no control over this "issue". “                                                        that it gave me
 easier in Level Z                                                                                                                                                                                     more help and
 in the case                                                                                                                                                                                           tools to find the
 where the crys-                                                                                                                                                                                       targets.”
 tal was above or
 below you”
 (Translated from
 Swedish)
                      (11)(SX) “It was a lot easier to find the crystals in level    (1)(MZ) “I had an easier time on Z, but I also think that it gave me       (2)(NP) “Didn't notice a big differ-
                      X with my hearing, it felt more fair and realistic.”           more help and tools to find the targets.”                                  ence in the levels, most of the dif-
                                                                                                                                                                ference was the different sound
                                                                                                                                                                sources, if they were pulsating or
                                                                                                                                                                constant and I feel like both levels
                                                                                                                                                                had the same amount of both.”
                      (13)(MX) “It felt like it was easier to follow the direc-      (3)(SZ) “For the same reason as the difference I noted earlier be-
                      tion of the sound in X”                                        tween level Z and X. (In level Z it felt easier to hear where the
                                                                                     sounds came from. During level X I could distinguish the direc-
                                                                                     tion of crystals to some extent, but it was very hard to figure out
                                                                                     if they were above, below or behind me.)”
                      (14)(MX) “Level X felt more like a game where I could          (4)(MZ) “I like the idea of the game, and I felt Z was a bit easier to
                      use my ears to find the approximate correct direc-             "understand" and locate than X.”
                      tion, while I had to use more trial and error in Level
                      Z.”
                      (6)(MX) “I would prefer a little more of a challenge           (5)(MZ) “Just easier to locate the crystals by sound. I don't really
                      which the level X gave me. Though it was the first             know why.”
                      level I tried, I took more notes during that play
                      through, which may have affected the time. Though
                      people that like to just play the game with lower skill
                      challenge and play the game less intense, I think they
                      would prefer Z, though my choice would be X.”
                                                                                     (7)(MZ) "As mentioned above, they felt very similar - but Z was
                                                                                     easier to localize in and it felt like I could easier find my way to
                                                                                     the crystals. Given this, I guess I slightly prefer Z, just because it
                                                                                     feels good to being able to localize yourself in a dark space with
                                                                                     nothing but sound. I would probably have the play the levels
                                                                                     through to determine any particular "strong" preference between
                                                                                     the two.”

You can also read