EMERGENT USER INTERFACES - CS-E4200 LECTURE 4 SOUND AND AUDITORY INTERFACES 4 FEB 2021 - MYCOURSES
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
What is Sound Vibrations (pressure variations) in elastic media (such as air), and detected by a receiver (ears, microphone, …) Sound signal can be analysed with Fourier transform, and chacterized by the frequency spectrum Two different types tonal sounds, consisting of sinusoidal components with frequencies in harmonic (integer) relation noises – with variable spectra Human-made sounds are often tonal (vowels, singing, musical instruments), most natural sounds are not 4 Feb 2021
Sinusoidal Sound Elementary component in sound analysis, rare in nature Two key properties frequency, measured in Hertz (Hz) 1 Hz = one cycle per second A inverse of cycle period ( f = 1/ T ) amplitude (pressure level, A ) T 4 Feb 2021
Hearing Sounds Humans can hear approx. 20 Hz - 20 kHz reduces with age to about 15 kHz Pressure level (amplitude) measured in relation to "standard" sound pressure level on a logarithmic scale as decibels (dB) Examples: normal conversation 65 dB, chainsaw 120 dB above 85 dB is to be avoided Note: this is not the same as "loudness", which means subjectively perceived strength of the sound 4 Feb 2021
Auditory Thresholds Humans hear frequencies from 20 – 22,000 Hz Most everyday sounds from 40 – 80 dB 4 Feb 2021
Tonal pitch and timbre Non-sinusoidal but repeating waveform base frequency (F0) + harmonic overtones (n x F0) pitch = perceived base frequency timbre = sound "color", defined by spectral proportions Not constant loudness sound envelope 4 Feb 2021
Non-tonal sounds Non-repeating waveform a) Discrete spectral components with non-harmonic frequencies bells, resonating everyday objects b) Continuous spectrum = noise random (not predictable) signal noises are different: distribution of frequencies and signal values matter white noise, pink/blue noise, popcorn noise, etc. 4 Feb 2021
Voice and Speech Formed by the vocal tract Vowels (a, e, i, …) have a harmonic spectrum characterized by formants (peaks at certain kHz frequencies) Consonants harmonic: voiced nasals (b, m, n, j, …) transients (k, p, t) continuous noise (f, s, …) https://en.wikipedia.org/wiki/Articulation_(phonetics) 4 Feb 2021
Hearing 4 Feb 2021
Anatomy of the Ear 4 Feb 2021
How the Ear Works https://www.youtube.com/watch?v=pCCcFDoyBxM 4 Feb 2021
How the Ear Works Cochlea performs kind of Fourier analysis different frequencies are distributed along the basilar membrane, and sensed by different hair cells Important for sound perception is the instant spectral content over short time (ca. 5–50 ms) signal sonogram Example: "cochlea" from http://www.neuroreille.com/promenade/english/sound/fsound.htm 4 Feb 2021
Distance to Listener Relationship between sound intensity and distance to the listener Inverse-square law The intensity varies inversely with the square of the distance from the source. So if the distance from the source is doubled (increased by a factor of 2), then the intensity is quartered (decreased by a factor of 4). 4 Feb 2021
Sound Localization Humans have two ears localize sound in space Sound can be localized using 3 coordinates Azimuth, elevation, distance 4 Feb 2021
Sound Localization Azimuth Cues Difference in time of sound reaching two ears Interaural time difference (ITD) Difference in sound intensity reaching two ears Interaural level difference (ILD) Elevation Cues Monaural cues derived from the pinna (ear shape) Head related transfer function (HRTF) Range Cues Difference in sound relative to range from observer Head movements (otherwise ITD and ILD are same) 4 Feb 2021
Sound Localization https://www.youtube.com/watch?v=FIU1bNSlbxk 4 Feb 2021
Sound Localization (Azimuth Cues) Interaural Time Difference (ITD) Interaural Level Difference (ILD) 4 Feb 2021
4 Feb 2021
HRTF (Elevation Cue) Pinna and head shape affect frequency intensities Sound intensities measured with microphones in ear and compared to intensities at sound source Difference is HRTF, gives clue as to sound source location 4 Feb 2021
Accuracy of Sound Localization People can locate sound Most accurately in front of them 2-3° error in front of head Least accurately to sides and behind head Up to 20° error to side of head Largest errors occur above/below elevations and behind head Front/back confusion is an issue Up to 10% of sounds presented in the front are perceived coming from behind and vice versa (more in headphones) BUTEAN, A., Bălan, O., NEGOI, I., Moldoveanu, F., & Moldoveanu, A. (2015). COMPARATIVE RESEARCH ON SOUND LOCALIZATION ACCURACY IN THE FREE-FIELD AND VIRTUAL AUDITORY DISPLAYS. InConference proceedings of» eLearning and Software for Education «(eLSE)(No. 01, pp. 540-548). Universitatea Nationala de Aparare Carol I. 4 Feb 2021
4 Feb 2021
Sound Synthesis Abstract algorithms additive: sum up a number of sinusoidals (+noise) subtractive: reduce white noise spectrum by filters any computational signal generator: wavetable, FM, etc… Physically based models – simulate vibrations in a material excitation: impact, friction, air flow … resonances of the object damping in the material Sound envelope design choice: stationary vs. variable amplitude/spectrum https://en.wikipedia.org/wiki/Category:Sound_synthesis_types 4 Feb 2021
4 Feb 2021
Sound Reproduction Loudspeakers Single sound source at one point Stereo – two speakers, sound may move along the Left-Right axis Multichannel – more directional / spatial effects Surround sound in theaters, horizontal directionality (L-R, front-rear) General spatialization (next slide) Headphones Normal stereo signal – L/R directionality sound may appear coming from inside the head Spatialized signal by filtering with HRTF (head-related transfer function) 4 Feb 2021
Vector Base Amplitude Panning Extension of usual stereo: Divide a virtual sound source's signal among the three loudspeakers nearest to source direction Can be realized with any number of loudspeakers 4 Feb 2021 http://legacy.spa.aalto.fi/research/cat/vbap/
Sound in the user interface Output: Auditory Display analogous to visual display, but sound is a temporal medium sonification = representing information with sound continuous sound: system state / object property transients: events Input: Sound Recognition analysing sound structure interpreting as events / input values 4 Feb 2021
more info & some examples: http://www.icad.org/audio.php 4 Feb 2021
Sonic Finder demo – https://vimeo.com/158610127 4 Feb 2021
4 Feb 2021
Example of mapping features to sounds to help navigation 4 Feb 2021
https://sonification.de/handbook/chapters/chapter2/ 4 Feb 2021
https://dl.acm.org/doi/10.1145/1978942.1979357 4 Feb 2021
4 Feb 2021
https://www.youtube.com/watch?v=dplpCW-P77o 4 Feb 2021
http://dnasonification.org 4 Feb 2021
visual attention Example: signals at traffic crosswalks • https://www.brantfordexpositor.ca/2013/05/23/resident-fed-up-with-audible-crosswalk-beeping/wcm/696e8e66-8e0c-9cc6-88d5-f11a0780175c • https://nationalpost.com/news/canada/chirping-sound-at-intersections • http://www.apsguide.org/appendix_c_signal.cfm 4 Feb 2021
Sound as input What sounds to use? Analyzing input signal: find patterns in… Active: voice, tactile (eg. hand clapping), amplitude variation, various instruments envelope Ambient: environment frequency content noises, traffic, footsteps, (spectrum) etc. temporal structure (rhythm) Input device: microphone Interpret found patterns as highly variable amplitude – auto-adjusting sound level events ambient noise often a continuous values problem 4 Feb 2021
Voice input Complex issue, developed for decades Available as software libraries or web services Based on analyzing various features from the sound signal (short time spectra, envelope, etc.) word-based: input is compared to prototype words, works in simple cases phoneme-based: more general, detecting phonemes and their combinations in words Has to be tuned for different speakers 4 Feb 2021
Practical issues Don’t limit your ideas to most obvious features only (amplitude and frequency) Pattern recognition is the easier the less you have different options to detect (eg. just numbers or few commands vs. full language vocabulary) Sound is temporal medium – detecting input requires time (eg. while driving: turning a physical wheel vs. pronouncing left/right commands) 4 Feb 2021
Demos Audio libraries in Processing: Sound and Minim examples: Libraries/Sound/Analysis, …/IO, …/Soundfile/Keyboard Contributed Libraries/Minim/Basics/SynthesizeSound Sonogram analysis Theremin controlled with Arduino What is theremin? https://www.youtube.com/watch?v=-QgTF8p-284 4 Feb 2021
Next Steps Get used to different technologies Try out examples demonstrated in the lectures (camera, sound, sensors) Arduino packages will be available next week at Aalto è check Doodle form in MyCourses for suitable times Start thinking, what technologies and/or application cases would interest you è check questionnaire in MyCourses Seek ideas in the net (links in the lecture slides, more will appear in MyCourses) 4 Feb 2021
You can also read