--- Forwarded Message from Philippe Delcloque <[log in to unmask]> --- >From: Philippe Delcloque <[log in to unmask]> >To: "[log in to unmask]" <[log in to unmask]> >Cc: "[log in to unmask]" <[log in to unmask]> >Subject: Headsets and RESOLUTION >Date: Wed, 6 Dec 2000 13:53:18 -0000 A much more authoritative answer from one of InSTIL's most knowledgeable new members. IT IS A LOT MORE THAN YOU NEED BUT MAY INTEREST SOME PEOPLE ON THE LIST. CAPITAL ANNOTATIONS FROM PHILIPPE. Philippe, Thank you for forwarding Deanne's questions: > What would be considered a low resolution for a > headset microphone and conversely a high resolution? What > do these numbers tell me? Just as your video monitor has a certain number of pixels and also a fixed number of colors each pixel may display, there are really two kinds of resolution. But as Philippe indicated, there are other factors that can be more important. For digital audio, sound is stored as a certain number of samples per second, and each sample is comprised of a certain number of bits. Typical values are 44K, 22K, 16K, 11K, or 8K samples per second, and either 8 or 16 bits per sample. THAT IS REFERRED TO AS SAMPLING (OR SAMPLE) RATE, BTW Some digital audio is stereophonic,requiring two distinct recordings -- one for each channel -- but speech intended for language learning is usually monophonic. For speech data, typical sample rates are 16000, 11025, or 8000 samples per second. The choice of sample rate determines the highest frequency you will be able to record (for more information, search on the "Nyquist sampling" theorem.) At 16000 samples per second, the highest frequency that can be recorded is 8000 Hz; the frequency is half the sampling rate. YES, FREQUENCY IS MEASURED IN HERTZ (LOW-BASS FREQUENCIES SUCH AS 100 Hz) AND KILOHERTZ (HIGH FREQUENCIES). DON'T CONFUSE WITH SAMPLING RATE. As a practical matter, there is virtually no important speech information between 8K and 5.5K Hz, so 11025 samples per second is almost always acceptable. However, humans can usually tell the difference between recordings at 16K and 11K samples per second given time and effort, but not if the recordings are free from background noise and they are only given one attempt to listen. There is some important speech information between 5.5K and 4K Hz, so 8000 samples per second sometimes causes problems. The most notable of the problems is the increased difficulty in discerning the letter "f" /ef/ from the letter "s" /es/, but it is safe to say that only consonants are affected by the bandwidth limitation imposed by 8000 samples per second, not vowels. Sometimes hardware limitations will only allow for 8000 samples per second, especially if international telephone equipment is being used. HENCE, SOME VOICE OR SPEECH RECOGNITION ACTUALLY WORKING THROUGH A TELEPHONE SYSTEM, WHICH ALLOWS FOR INSTANCE ROBUST SPOKEN LANGUAGE TESTING, SEE ORDINATE.COM, JARED BERNSTEIN'S NEW COMPANY. The number of bits per second makes a huge difference in comparison. 16 bits per second is always acceptable (unless there are the kind of hardware problems mentioned below.) 8 bits per second, in my experience, is usually unacceptable. Even with very high quality hardware, 8 bits per second of resolution usually results in overflow (audio clipping by the analog-to-digital (A/D) converter) or underflow (elimination of audio information during quiet sounds by the A/D converter.) So, a good rule of thumb is to always make sure you have 16 bits per sample and 11K or 16K samples per second (with 8K samples per second at 16 bits per sample a moderately close second, and any rate at 8 bits per sample a distant third.) Audio engineers refer to the number of bits per sample as the "dynamic range" of the sound, which is usually expressed in decibels for analog signals. AS A AUDIO ENGINEER IN ONE OF MY PREVIOUS LIVES (Philippe speaking), I do recognise the expression Dynamic Range expressed in dbs, but I'm still none the wiser about what is meant by resolution, I presume IT MUST REFER TO THE SAMPLING QUALITY OR RATE, SO THE HIGHER THE RATE, THE BETTER (AS I SUGGESTED INTUITIVELY). I AM COPYING THIS TO JAMES SO HE CAN CORRECT ME IF I GOT THIS WRONG. Now, as Philippe mentioned, many of the microphones intended for speech input are equiped with analog hardware to perform cancellation of noise. These days, this often doesn't add more than a few dollars to the cost of the microphone, but it is very useful and effective in environments with a moderate amount of background noise. Noise cancellation can actually make things worse in a windy environment, SO, USE WINDSHIELDS, I AGREE THERE AREN'T ENOUGH LANGUAGE LABS IN THE OPEN AIR! :-) so if you plan on going outside you might want to have both kinds of microphones available (and even if you stay inside it is always good to have a spare mic not just as a backup, but also so you can compare the two to check for damage.) A few years ago, the best place to get a high quality noise canceling speech mic within the U.S. was Andrea Electronics, which sold such headsets for $15, or $12 for headsets without the "noise canceling" feature. Desktop boom mics are similarly priced these days. SOUNDS LIKE A VERY GOOD PRICE TO ME! Really, the only way to check for damage in one microphone is to try different microphones in the same jack. Usually mic damage manifests as a decrease in gain (which makes 8 bits per sample all that much worse) but it is also possible for mic damage to cause a low-pass or high-pass filtering of the signal. I forget wHiCh of those two is the most likely; I think it is high-pass. Anyway, the possible causes of microphone damage are: Plugging the microphone in to the wrong jack, especially if the jack is an amplified or D.C. output; Exposing the microphone to high heat; Exposing the microphone element to direct sunlight (through a windowpane is usually okay, since the ultraviolet is the source of degradation, unless it gets too hot); and Allowing too much dust or debris to accumulate against the microphone element. Obviously, if the background noise includes other people speaking, or sounds in the same frequency range as the human voice, it will be worse than a simple hum confined to a tight frequency range with quiet harmonics. THAT'S EXACTLY WHY USING PROFESSIONAL HEADSETS IN PC LANGUAGE LABS IS SO IMPORTANT! I'VE SEEN DREADFUL PROBLEMS OCCUR WITH AURALANG COURSEWARE WHEN USED WITH INAPPROPRIATE HEADSETS! There are a few ways to shield against background noise, such as carpets, heavy curtains and other direct barriers; check what "soundproofing" finds in your favorite local index. These days, if speech quality is important to you, it is really much better to get some type of external microphone than an internal mic, primarily because most of the internal mics are too close to sources of radio-frequency interference (RFI), either from a CRT display and its associated large electromagnets, or from the fluorescent lights that are usually used for back-lighting in flat-panel (e.g., laptop) displays. PC INTERNAL MICS ARE ALSO USUALLY POOR QUALITY. RFI interference is hard to characterize and even harder to predict. Just because you can't hear it in a test recording that you play back on a laptop doesn't meant that it isn't filtering out critical bands from the audio channel. Before relying on an internal microphone, record a frequency sweep from 20-4000 Hz and examine it closely with a frequency analysis program in comparison with the same frequency sweep recorded with a well-shielded external microphone on a non-laptop computer system known to be free from interference. My experience is that only one in five laptop internal microphones are free from RFI interference, and that the problem varies from each individual laptop to the next even within the same make and model. ABSOLUTELY CORRECT. My favorite frequency analysis program for the PC is CoolEdit: http://www.syntrillium.com FOR MORE SOPHISTICATED LANGUAGE-BASED SPEECH ANALYSIS AND PEDAGOGY, TRY WINPITCH FROM PHILIPPE MARTIN. Another thing to keep in mind is that some audio equipment such as PC sound cards optionally provide "automatic gain control. AGC is an algorithmic attempt to re-normalize the volume of the sounds so that it has about the same loudness no matter where it was recorded. This can wreak havoc with some speech recognition systems, but it can also sometimes help, especially with human-to-human communication systems. If your recordings start out quiet or loud, and then become more moderate after the first few seconds while your voice remained at about the same level, then there is probably an AGC algorithm in the channel somewhere. You can sometimes toggle AGC with the "Mixer" or "Multimedia Audio Control Panel" or "Microphone Preferences" programs. Finally, there are a whole lot of audio compression methods, some of which are popular and well-known (e.g., MP3), popular and obscure (e.g., GSM 06.10), etc., and they all have different effects on the quality of speech signals recovered from them. The encodings most likely to be encountered by people dealing with speech input are those used by cellular telephones. For example, the GSM 06.10 vocodec adds short bursts of high-frequency energy in an unpredictable fashion. But when it comes to cellular telephones, radio link problems such as RFI and blocked signals, and the hand-off transition from one cell to another, will almost always cause more noise than the compression method. Cheers, James THANK YOU, JAMES, FOR SUCH A DETAILED RESPONSE/SPEECH TUTORIAL