The algorithm uses time-frequency (T-F) representations
of keywords in the MRT. The T-F pattern of an impaired
speech signal is correlated with corresponding patterns of
the six unimpaired options from a list of six rhyming words
in multiple AI bands (Figure 2).
The top image in Figure 2 is a spectrogram of the
sentence “Please select the word went,” as spoken by
Female 1 in the Institute for Telecommunication Sciences
(ITS) ABC-MRT speech database. The frequency is
from 0 to 10 kHz and the time span is approximately 2
seconds. The significant levels are represented by blue
(low) and red (high). The portion of the spectrogram
containing the keyword “went” is highlighted with a red
rectangle. The bottom images show the spectrograms of
the six keywords from the list, plotted on the same time
and frequency scales.
For each trial, a sentence containing the carrier phrase
and keyword is generated and passed through the system,
or Device Under Test (DUT). The signal is acquired from
the output of the DUT and transformed to a T-F pattern
using the same technique applied to the 1,200 isolated
keyword recordings in the database.
The keyword is located within the T-F pattern by
cross-correlating it in two of the AI bands with the T-F
pattern of the original keyword in the template file. The
portion of the DUT’s T-F pattern containing the keyword
is extracted and matched with one of the six rhyming
words in the AI bands of interest. Eliminating negative
results leaves 17 narrowband and 21 full-band AI band
correlations for each candidate keyword. Keywords in
the 16 AI bands with the highest correlation values are
selected and compared to the known correct keyword.
The average number of correct keyword selections over
the 16 AI bands provides the success rate. The average
success rate across all the trials is corrected to account
for the effect of guessing in MRT tests, and provides the
ABC-MRT Intelligibility Score.
An ITS evaluation of ABC-MRT and ABC-MRT16 found
extremely high correlation with subjective MRT and low
estimation error. ABC-MRT16 performed significantly
better than ABC-MRT for the 139 narrowband conditions,
indicating the importance of the attention model.
STI is used in high-noise environments where
intelligibility is paramount. Examples include Public
Address (PA) systems, aircraft Voice Announcements
(VA) and emergency communication systems, in-vehicle
communication systems, auditorium systems, and assistive
It should not be used for measuring systems with
transmission channels that contain vocoders (i.e. codecs
which operate only on speech elements), but may be used
for digital codecs that operate on the entire signal. In
systems with aggressive noise-suppression algorithms, the
STI signal is likely to be suppressed by the algorithm.
The ABC-MRT algorithm works with voice codecs and
noise suppression systems, and can be used for any of the
standard speech bandwidths.
Having ABC-MRT measurement capability in an
audio analyzer complements its traditional audio test
functionality. Depending upon the analyzer, it can also
provide access to multiple audio interfaces (analog,
acoustic, AES3-SP/DIF, digital serial, Bluetooth, PDM,
HDMI, and ASIO), a built-in test sequencer and
reporting engine, test limits, multiple channels, and a
wide variety of specialized audio measurements. ECN
Figure 3: Test setup with anechoic chamber.
The U.S. National Fire Protection Association (NFPA) 1981
standard covers self-contained breathing apparatus (SCBA)
for emergency services. It requires a minimum STI of 0.55 for
non-electronic systems, and 0.60 for supplementary systems.
The test methods for both non-electronic and supplementary
systems are similar. Both require a hemi-anechoic chamber
and a Head and Torso Simulator (HATS) with mouth simulator
NFPA test methods require measuring STI at a test
microphone that is 1.5 m in front of the artificial mouth, while
simultaneously generating pink noise from a separate speaker
located below the test microphone. (Pink noise power per hertz
decreases as the frequency increases; white noise has an equal
power per hertz through all frequencies.)
NFPA 1981 requires that the mouth simulator is equalized
to a specific 1/3-octave spectral shape at the mouth reference
point of the mouth simulator. For convenience when adjusting
the EQ by trial and error, the output EQ curve in the audio
analyzer software can be specified at standard 1/3-octave
The standard suggests using a STIPA signal generator with
an equalizer to drive the mouth simulator and a separate pink
noise generator with an equalizer to drive the pink noise speaker.
Both signals can be generated simultaneously by an APx audio
analyzer, using its ability to generate stereo waveforms with
different levels on each channel.