ITS and UVa logos for printed output

Videoconferencing at UVa

Audio Quality

The human visual system will tolerate much interference. A video image that is grainy, has untrue colors, or that is jerky can still be comprehended. An audio signal, however, must be high quality in order for a human to be able to perceive the words. A speech signal can tolerate some peak clipping (loss of amplitude above a certain threshold), which is perceived as pops or clicks in the signal. The "picket fence" effect (50 to 100 ms. gaps in the signal) can cause loss of an entire phoneme or syllable, leading to lack of comprehension. Even a time lag, although the quality of the signal is otherwise good, is intolerable to human listeners during conversation. The worst form of audio signal degradation, in terms of speech intelligibility, is reduction of the frequency range. The speech signal is composed of frequencies ranging between 500 and 4000 Hz. When the bandwidth is reduced, which is heard as a "muffled" quality, suprasegmental aspects such as speaker identity and affect are impacted. Further reduction impacts intelligibility. Female speakers are affected more drastically than male speakers since their voices are composed of higher frequencies.

These factors make audio quality an extremely important component of a video conference. The old ISDN-based video conferencing software from PictureTel, for example, devoted a full 64 kilobytes of the 128 kilobyte connection to the audio signal by default. The application could be configured to use progressively smaller amounts of the bandwidth at the user's discretion, however.

Video conferencing sessions conducted over a LAN can devote a significant amount of bandwidth to the audio signal so long as the bandwidth is available. If the bandwidth is not available, two different actions can be taken. The ProShare application, for example, drops both the video and audio signals in order to provide a data-only conference if the bandwidth of the connection will not support a minimal frame rate. NetMeeting does not control the frame rate programmatically, but if the user decides that the video signal is unusable, he can opt to turn off the video signal and maintain the audio signal. If that becomes unusable, he can turn off the audio and maintain a data-only conference.

The quality of speech transmitted over a LAN is impacted by the way that the information is divided into packets upon origin and then reassembled at the destination. Compression and decompression of the data may also be used. For this reason, the audio signal may suffer in quality and may lag. Decades of research by the telephone industry document how specific types of signal degradation affect perception. This research has been used to produce a telephone network that is optimized for transmission of high quality human speech. LANs are not optimized for this activity.

Another dimension of the audio signal to be considered is whether it is transmitted in full or half duplex mode. In half duplex mode, the signal from only one source at a time is transmitted. This is efficient because the same channel can be used for both participants in the conversation. However, the two speakers must be willing to speak in discrete turns. If both participants speak at the same time, which is a common occurrence in human conversation, the electronics may switch the transmission back and forth between the two speakers in such a way that neither is transmitted well enough for comprehension. Systems that attempt to prevent this may impose a lag or loose the first few phonemes during the interval when the electronics are "deciding" whether to commit to transmitting from that source. A full duplex signal allows both signals to be transmitted and received at the same time. This is optimal for human comprehension, but the electronics to support this must incorporate a way to separate the signal that originates from a human speaking into the microphone from the sound that is coming from the speakers into the microphone. If this is not done, the proximity of microphone and speakers causes a feedback loop that sounds like a squeal and that can even be painful to the listener.

The best use of available technologies to conduct a useful videoconference could well be to use an actual, full duplex voice telephone circuit to carry the audio portion of the conference. This independent channel can also be helpful for troubleshooting if the participants experience difficulty with the LAN or with their collaboration application during the meeting.

Page Updated: 2012-02-16