Sampling is one of the most important steps in audio processing. Recently, lot of people have been asking me about the representation of the audio signals and sampling, so I thought it would be appropriate to pen down a few lines about the audio signal representation in the present post.
Sampling of speech signals
Speech signal is an analog signal which varies with time. The x-axis of the signal represents time and y-axis represents the amplitude. It is a continuous time and continuous amplitude signal. Sampling is the process of converting continuous time signal into a discrete time signal.
Sampling makes storage, processing and transmission of audio very easy. When a signal of duration 100 seconds, is sampled at regular intervals, say at a sampling frequency of 1Hz, there exists a sample at every 1 second. That means, the amplitude of the signal at every 1 second and stored. The entire signal will be of 100 samples.
The sampling theorem:
A bandlimited signal which has no spectral components above fm cycles per second, can be uniquely reconstructed from its samples,
which are at equal intervals, which are at 1/2fm seconds apart.
Therefore to uniquely reconstruct a signal from its samples, the sampling frequency should be at least twice the highest frequency of
Selection of sampling frequency
For voiced sounds, the high frequencies are more than 40dB below the peak of the spectrum for frequencies above 4KHz.On the other hand, for unvoiced sounds, the spectrum has not fallen off appreciably even above 8KHz. Thus, to accurately represent all speech sounds would require a sampling rate greater than 20KHz.In most applications, however, this sampling rate is not required. For most phonemes, almost all of the energy is contained in the 5Hz-4 kHz.
Therefore, if the speech is filtered by a sharp cut off analog filter prior to sampling, so that the Nyquist frequency is 4KHz, then a sampling rate of 8KHz is possible.
When it is necessary to capture audio covering the entire 20–20,000 Hz range of human hearing, such as when recording music or many types of acoustic events, audio waveforms are typically sampled at 44.1 kHz (CD), 48 kHz (professional audio), or 96 kHz.