LISTSERV - LLTI Archives - LISTSERV.DARTMOUTH.EDU

--- Forwarded Message from Philippe Delcloque <[log in to unmask]> ---

>From: Philippe Delcloque <[log in to unmask]>
>To: "[log in to unmask]" <[log in to unmask]>
>Cc: "[log in to unmask]" <[log in to unmask]>
>Subject: Headsets and RESOLUTION
>Date: Wed, 6 Dec 2000 13:53:18 -0000

A much more authoritative answer from one of InSTIL's most knowledgeable new
members.  IT IS A LOT MORE THAN YOU NEED BUT MAY INTEREST SOME PEOPLE ON THE
LIST.  CAPITAL ANNOTATIONS FROM PHILIPPE.

Philippe,

Thank you for forwarding Deanne's questions:

> What would be considered a low resolution for a
> headset microphone and conversely a high resolution?  What
> do these numbers tell me?

Just as your video monitor has a certain number of pixels and also a
fixed number of colors each pixel may display, there are really two
kinds of resolution.  But as Philippe indicated, there are other
factors that can be more important.

For digital audio, sound is stored as a certain number of samples
per second, and each sample is comprised of a certain number of bits.
Typical values are 44K, 22K, 16K, 11K, or 8K samples per second, and
either 8 or 16 bits per sample.

THAT IS REFERRED TO AS SAMPLING (OR SAMPLE) RATE, BTW

Some digital audio is stereophonic,requiring two distinct recordings -- one
for each channel -- but
speech intended for language learning is usually monophonic.

For speech data, typical sample rates are 16000, 11025, or 8000
samples per second.  The choice of sample rate determines the highest
frequency you will be able to record (for more information, search on
the "Nyquist sampling" theorem.)  At 16000 samples per second, the
highest frequency that can be recorded is 8000 Hz; the frequency is
half the sampling rate.

YES, FREQUENCY IS MEASURED IN HERTZ (LOW-BASS FREQUENCIES SUCH AS 100 Hz)
AND KILOHERTZ (HIGH FREQUENCIES).  DON'T CONFUSE WITH SAMPLING RATE.

As a practical matter, there is virtually no important speech information
between 8K and 5.5K Hz, so 11025 samples
per second is almost always acceptable.  However, humans can usually
tell the difference between recordings at 16K and 11K samples per
second given time and effort, but not if the recordings are free from
background noise and they are only given one attempt to listen.

There is some important speech information between 5.5K and 4K Hz, so
8000 samples per second sometimes causes problems.  The most notable
of the problems is the increased difficulty in discerning the letter
"f" /ef/ from the letter "s" /es/, but it is safe to say that only
consonants are affected by the bandwidth limitation imposed by 8000
samples per second, not vowels.  Sometimes hardware limitations will
only allow for 8000 samples per second, especially if international
telephone equipment is being used.

HENCE, SOME VOICE OR SPEECH RECOGNITION ACTUALLY WORKING THROUGH A TELEPHONE
SYSTEM, WHICH ALLOWS FOR INSTANCE ROBUST SPOKEN LANGUAGE TESTING, SEE
ORDINATE.COM, JARED BERNSTEIN'S NEW COMPANY.

The number of bits per second makes a huge difference in comparison.
16 bits per second is always acceptable (unless there are the kind of
hardware problems mentioned below.)  8 bits per second, in my
experience, is usually unacceptable.  Even with very high quality
hardware, 8 bits per second of resolution usually results in overflow
(audio clipping by the analog-to-digital (A/D) converter) or underflow
(elimination of audio information during quiet sounds by the A/D
converter.)  So, a good rule of thumb is to always make sure you have
16 bits per sample and 11K or 16K samples per second (with 8K samples
per second at 16 bits per sample a moderately close second, and any
rate at 8 bits per sample a distant third.)  Audio engineers refer to
the number of bits per sample as the "dynamic range" of the sound,
which is usually expressed in decibels for analog signals.

AS A AUDIO ENGINEER IN ONE OF MY PREVIOUS LIVES (Philippe speaking), I do
recognise the expression Dynamic Range expressed in dbs, but I'm still none
the wiser about what is meant by resolution, I presume IT MUST REFER TO THE
SAMPLING QUALITY OR RATE, SO THE HIGHER THE RATE, THE BETTER (AS I SUGGESTED
INTUITIVELY).  I AM COPYING THIS TO JAMES SO HE CAN CORRECT ME IF I GOT THIS
WRONG.

Now, as Philippe mentioned, many of the microphones intended for
speech input are equiped with analog hardware to perform cancellation
of noise.  These days, this often doesn't add more than a few dollars
to the cost of the microphone, but it is very useful and effective in
environments with a moderate amount of background noise.  Noise
cancellation can actually make things worse in a windy environment,

SO, USE WINDSHIELDS, I AGREE THERE AREN'T ENOUGH LANGUAGE LABS IN THE OPEN
AIR!  :-)

so if you plan on going outside you might want to have both kinds of
microphones available (and even if you stay inside it is always good
to have a spare mic not just as a backup, but also so you can
compare the two to check for damage.)  A few years ago, the best
place to get a high quality noise canceling speech mic within the
U.S. was Andrea Electronics, which sold such headsets for $15, or
$12 for headsets without the "noise canceling" feature.  Desktop boom
mics are similarly priced these days.

SOUNDS LIKE A VERY GOOD PRICE TO ME!

Really, the only way to check for damage in one microphone is to try
different microphones in the same jack.  Usually mic damage manifests
as a decrease in gain (which makes 8 bits per sample all that much
worse) but it is also possible for mic damage to cause a low-pass or
high-pass filtering of the signal.  I forget wHiCh of those two is the
most likely; I think it is high-pass.  Anyway, the possible causes of
microphone damage are:  Plugging the microphone in to the wrong jack,
especially if the jack is an amplified or D.C. output; Exposing the
microphone to high heat; Exposing the microphone element to direct
sunlight (through a windowpane is usually okay, since the ultraviolet
is the source of degradation, unless it gets too hot); and Allowing
too much dust or debris to accumulate against the microphone element.

Obviously, if the background noise includes other people speaking, or
sounds in the same frequency range as the human voice, it will be
worse than a simple hum confined to a tight frequency range with
quiet harmonics.

THAT'S EXACTLY WHY USING PROFESSIONAL HEADSETS IN PC LANGUAGE LABS IS SO
IMPORTANT!  I'VE SEEN DREADFUL PROBLEMS OCCUR WITH AURALANG COURSEWARE WHEN
USED WITH INAPPROPRIATE HEADSETS!

There are a few ways to shield against background noise, such as carpets,
heavy curtains and other direct barriers;
check what "soundproofing" finds in your favorite local index.

These days, if speech quality is important to you, it is really much
better to get some type of external microphone than an internal mic,
primarily because most of the internal mics are too close to sources
of radio-frequency interference (RFI), either from a CRT display and
its associated large electromagnets, or from the fluorescent lights
that are usually used for back-lighting in flat-panel (e.g., laptop)
displays.

PC INTERNAL MICS ARE ALSO USUALLY POOR QUALITY.

RFI interference is hard to characterize and even harder
to predict.  Just because you can't hear it in a test recording that
you play back on a laptop doesn't meant that it isn't filtering out
critical bands from the audio channel.  Before relying on an internal
microphone, record a frequency sweep from 20-4000 Hz and examine it
closely with a frequency analysis program in comparison with the same
frequency sweep recorded with a well-shielded external microphone on
a non-laptop computer system known to be free from interference.  My
experience is that only one in five laptop internal microphones are
free from RFI interference, and that the problem varies from each
individual laptop to the next even within the same make and model.

ABSOLUTELY CORRECT.

My favorite frequency analysis program for the PC is CoolEdit:
  http://www.syntrillium.com

FOR MORE SOPHISTICATED LANGUAGE-BASED SPEECH ANALYSIS AND PEDAGOGY, TRY
WINPITCH FROM PHILIPPE MARTIN.

Another thing to keep in mind is that some audio equipment such as PC
sound cards optionally provide "automatic gain control.  AGC is an
algorithmic attempt to re-normalize the volume of the sounds so that
it has about the same loudness no matter where it was recorded.  This
can wreak havoc with some speech recognition systems, but it can also
sometimes help, especially with human-to-human communication systems.
If your recordings start out quiet or loud, and then become more
moderate after the first few seconds while your voice remained at
about the same level, then there is probably an AGC algorithm in the
channel somewhere.  You can sometimes toggle AGC with the "Mixer" or
"Multimedia Audio Control Panel" or "Microphone Preferences" programs.

Finally, there are a whole lot of audio compression methods, some of
which are popular and well-known (e.g., MP3), popular and obscure (e.g.,
GSM 06.10), etc., and they all have different effects on the quality
of speech signals recovered from them.  The encodings most likely to
be encountered by people dealing with speech input are those used by
cellular telephones.  For example, the GSM 06.10 vocodec adds short
bursts of high-frequency energy in an unpredictable fashion.  But when
it comes to cellular telephones, radio link problems such as RFI and
blocked signals, and the hand-off transition from one cell to another,
will almost always cause more noise than the compression method.

Cheers,
James

THANK YOU, JAMES, FOR SUCH A DETAILED RESPONSE/SPEECH TUTORIAL