MP3 Based Digital Audio Announcement For Mass

                       Transit Systems
                                              Tobias Maisch
                                  INIT Innovations in Transportation, Inc.
                                             Chesapeake, VA

ABSTRACT                                                                  •    Excellent audio quality
                                                                          •    Moderate use of storage amount
     One measure to make Mass Transit Systems more user-
friendly and to comply with the requirements of the
                                                                          •    No additional workload for operator
Americans With Disabilities Act (ADA) is to provide                       •    Easy way to produce and edit/modify recorded
information to passengers. The name of the next stop is one                    announcements
of the most important types of information for passengers                 •    Fast, easy and automated way to download new/
using public transit. This information is supplied with an                     edited announcements from Control Center to
audio and/or visible “next stop information system”. Digital                   vehicles with no or minimal workload for service
audio announcements systems that are controlled by an on-                      personnel or operator
board computer are a flexible and therefore commonly used
solution. Digital audio announcement systems, however,                HISTORY
often have the disadvantage of offering poor audio quality.
The reason for the poor quality is that digital systems require            Next stop announcements of the vehicle operator was
a large storage capacity, yet storage is an expensive option.         the original way to implement a next stop audio
Therefore, smaller amounts of storage are often used with             announcement system. But announcing the next stop or
the result of unfavorable audio quality. With the use of audio        other information by vehicle operator had a lot of
compression algorithms based on digital signal processing             disadvantages. First, the workload of the driver was
such as MPEG II Layer 3 (MP3) a new technology is available.          increased. He or she had to track the particular route and
This new type of digital audio announcement system is able            had to know at which point of his trip the announcements
to combine excellent audio quality with only a moderate               were to be made. Concentrating on driving could be inhibited.
amount of storage and is therefore a very reasonable solution.        For this the danger of an accident was raised. Additionally,
     The article will describe in brief the history and               often the driver forgot to make the announcement or the
development of announcement technology in transit vehicles            announcement of the driver was not loud enough or clearly
from announcements of operators / drivers to the newest               spoken. The satisfaction of vehicle operator and passenger
MP3 based Digital Announcement System.                                of this system was low.
     Different digital announcement concepts and methods                   The next step was to play back pre-recorded
with pros and cons concerning audio quality, use of storage           announcements from tape. In the first generation of this
capacity, efforts for announcement production and                     system the announcement had to be triggered from the
integration in an Intelligent Transportation System (ITS) will        vehicle operator and therefore he had to track his route in
be discussed. The articles will finally point out the benefits        mind and had to know at which point of his trip he had to
of the latest MP3 announcement systems and the integration            trigger the announcements. Further generations of tape-
in a Computer Aided Dispatch / Automatic Vehicle Location             based announcement systems had an optional interface to
(CAD/AVL) System.                                                     AVL systems and the announcement was triggered by the
                                                                      AVL. The audio quality was good but the disadvantage of
REQUIREMEMTS                                                          this system was the huge effort to maintain it.
                                                                           Tapes were wasted, the tape drive had to be cleaned
     Automatic audio announcement of the next stop                    regularly, and for a change of one announcement or the
information is a favorable way to make mass transit systems           chronological order all tapes had to be re recorded again.
easier to use and to comply with the requirements of the                   Progress was made in the field of automated audio
ADA. The major requirements are:                                      announcement systems by digitizing and storing the


announcement in digital semiconductor storage media for               decoder, DAC and a control unit. This subsystem is
play back. Once the audio was digitized, editing, adding or           connected to the Public Address system (PA) consisting of
changing the announcements and maintaining the whole                  amplifier(s) and inside- and/or outside loudspeakers.
system was much easier. The disadvantage of those systems                  Once the coded audio announcement data is loaded
was the huge amount of storage needed. To shorten the                 into the storage device, it is available to be played over the
amount of storage to a moderate size different methods were           PA system. Therefore a command will be sent from the AVL
used. All of these methods had the disadvantage of poor               system that knows the precise location of the vehicle to the
play back audio quality. With the use of digital signal               control unit. After receiving a play back command, the
processing algorithms like MP3 combined with the availability         decoder is fed with the coded data stream from the storage
of huge processing performance (with embedded Digital                 unit. The now decoded data stream is fed into the DAC
Signal Processors), for the first time excellent audio quality        where the digital data stream is converted into an analog
in conjunction with moderate amount of storage is reached.            signal, amplified and fed into the loudspeakers.

CONCEPT AND ALGORITHMS OF                                             AUDIO CODEC
                                                                           The word CODEC is an abbreviation made up of the two
    •    Generally the whole system consists of three parts:          words CODER and DECODER. The task of a CODEC is to
                                                                      code and compress digitized raw audio data into a formatted
    •    Recording subsystem (see figure 1)
                                                                      data stream. The task of the DECODER is vice versa. It
    •    Play back subsystem (see figure 2)                           decodes and expands the formatted data stream back to the
    •    Transfer Interface (Included in figure 1 and 2)              digitized raw audio data. There are many CODEC algorithms
                                                                      but the most popular are Pulse Code Modulation (PCM),
RECORDING SUBSYSTEM                                                   (Adaptive) Difference Pulse Code Modulation (DPCM/
                                                                      ADPCM) and MPEG II Layer 3 (MP3).
     Based on a PC with a high quality soundcard the                       A comparison of compression ratio, required bandwidth
recording subsystem digitize and quantize the audio signal            and amount of storage per second for the coded data stream
with an Analog Digital Converter (ADC), integrated in the             is shown in table 1.
soundcard. Depending on the used sampling rate and
resolution of the ADC the basics for an excellent audio quality       PCM
will be set. This raw data, representing the recorded audio
signal in digital format, is stored on hard disk for further               This CODEC only formats the raw digital audio data for
editing like cutting, equalizing, level scaling, time scaling         transmission and storage. No compression will be carried
and so on. The processed and arranged audio                           out. Most popular example for PCM is the Compact Disk
announcements will be transformed to a suitable digital               (CD). Music (audio data) stored on a CD is sampled with
format through the next processing stage called coder.                44100 samples per second (44 ksps) and digitized with a
Concerning the amount of storage that will be needed for              resolution of 16 bit for each of the two stereo channels.
the final audio data and the achievable audio quality, the            Therefore the needed bandwidth is 1411.2 kbit/s (705.6 kbit/
coder is the key element of the whole processing chain.               s for each channel) and amount of storage of 176.4 kbyte/s.
After adding control and management data the announcement             For audio announcement systems the amount of storage is
data is ready to transfer to the play back subsystem via the          was high. Therefore the sampling rate and quantization was
transfer interface. Throughout all steps of processing the            reduced. While reducing the quantization from 16 to 8 bit
announcements, the audio can be monitored by the audio                lead to a significant lower audio quality the reducing of the
output chain of the recording subsystem. It consists of the           sampling rate was a compromise to lower the amount of
Digital Analog Converter (DAC) of the soundcard and the               needed storage. But the amount of needed memory was still
active Monitor Loudspeakers.                                          to high for use in mass transit systems.

PLAYBACK SUBSYSTEM FOR VEHICLES                                       DPCM /ADPCM
     Based on a high sophisticated embedded hardware                     While PCM is only formatting raw audio data, DPCM/
platform the playback subsystem consists of a storage unit,           ADPCM is reducing the bandwidth of data stream by factor


2. This will be done by omitting all redundant data. Therefore         High Quality Audio
the DPCM/ADPCM is using a prediction based on trend of
the recent samples. Only the difference between the                         Based on a 32 Mbyte Multi Media Card (MMC) for
prediction and the real value of the sample is transmitted or          storing the MP3 files, there is place for ca. 4000 sec. of near
stored. Because of the lower quantization that will be needed          CD quality audio or 6500 sec. for FM radio quality audio.
for coding the difference a reduction of bandwidth can be              There are approximately 1300 and 2150 different, respectively,
achieved. For better audio quality the quantization will be            audio announcements in high audio quality.
set dynamically when ADPCM is used.                                    Worldwide standard
MP3                                                                         MP3 is a worldwide standard that was first used for
                                                                       Internet applications but is now established in a broad range
     This CODEC is using a completely different algorithm              of audio recording, play back and storage application and
(1,2,3,4). The raw audio data will be transformed from time            devices. It is robust, proven und most important it is not
domain to the frequency domain. The whole processing,                  proprietary.
transmission and storage of the data will be done in the
frequency domain. To reach a compression factor of 10 to 14            Off the Shelf Solution
an algorithm called perceptual noise shaping based on the                  Using off-the-shelf components for digital voice
psycho acoustic hearing model of humans is used. The                   recording workstation hardware, software, storage media like
“perceptual” part in the name means that the MP3 uses                  MMC and MP3 decoder chips a very competitive solution is
characteristics of the human ear to design the compression             available. Combing it with WIRELESSlan based on
algorithm like:                                                        IEEE802.11b there is a transparent solution of automatic
     There are certain sounds that the human ear cannot                downloading of newest announcement files into the vehicle
hear                                                                   without any need of manual interaction.
     • There are certain sounds that the human ear hears
          much better than others                                      ACKNOWLEDGMENTS
     • If there are two sounds playing simultaneously we
          hear the louder one but cannot hear the softer one.              Thanks to Michael Wittemann who provided technical
          This is called masking effect (figure 3).                    background.
     • If there is a louder tone we can not her a softer one
          a short time (approx. 2- 5 ms) before the beginning
          of the louder tone and a much longer time (up to
                                                                       1.                 ISO/IEC                   11172-3:1993
          100ms) after ending of the louder tone. This is called
                                                                             Information technology — Coding of moving pictures
          pre masking and post masking effect (figure 4).
                                                                             and associated audio for digital storage media at up to
     Using facts like these about the human ear, certain parts
                                                                             about 1,5 Mbit/s — Part 3: Audio
of an announcement can be eliminated without significantly
                                                                       2.                 ISO/IEC                   13818-3:1998
hurting the quality of the announcement for the listener.
                                                                             Information technology — Generic coding of moving
                                                                             pictures and associated audio information — Part 3:
                                                                             Audio (available in English only)
SYSTEMS                                                                3. Home page of MPEG Group
                                                                       4. : Homepage with
     Based on an INIT Demonstration Installation of a MP3
                                                                             MP3 basics
announcement system (figure 5) that uses the INIT
WIRELESSlan network for up and downloading MP3 files
from the digital voice recording Workstation to the MP3
audio announcement system INIT MRI/MP3 there will be
the following benefits:


             Figure 1. Structure of recording subsystem.


Figure 2. Structure of play back subsystem.

         Figure 3. Masking effect.


                          Figure 4. Pre and post maksing effect.

             Figure 5. Example of a full integrated audio announcement system.


                  PCM                            DPCM/ADCPM                                 MP3
            Compression factor 1               Compression factor 2                  Compression factor
                                                                                           10 to 14
Sampling   Bandwidth          Approx.         Bandwidth            Approx.        Bandwidth Approx.
rate       [kbit/s]           Recordi         [kbit/s]             Recordi        [kbit/s]       Recording
(16 bit    ([kbyte/s])        ng time         ([kbyte/s])          ng time        ([kbyte/s])    time for
mono)                         for 32                               for 32                        32 Mbyte
[ksps]                        Mbyte                                Mbyte                         [sec]
                              [sec]                                [sec]
44.1       705.6              371             352.8                743            64.0             4096
           (88.2)                             (44.1)                              (8.0)
32.0       512 .0             512             256.0                1024           40.0             6553
           (64.0)                             (32.0)                              (5.0)
22.05      352.8              743             126.4                1486           32.0             8192
           (44.1)                             (22.05)                             (4.0)

              Table 1. Comparison of compression ratio and needed bandwidth of CODEC algorithms.

