Face to face meetings and conversations in our business and personal lives are becoming less frequent in order to save cost and time. Mobile phones and voice over internet technology (VoIP) have enabled this significant change in how we interface with one another. The quality of sound and the suppression of noise are crucial to ensure a good user experience with voice communications.
by David Coode, Manager – Audio DSP, ON Semiconductor
Rarely do we experience true quiet, and we have become so accustomed to noise that most of us don’t notice it anymore. The human brain can do an outstanding job of filtering out noise that we are exposed to; we hear it all, but only listen to what we’re interested in. However, as the world becomes a noisier place and the use of mobile phones, laptops and webcams increases, it is more difficult to filter out the noise.
Due to rapidly advancing electronics technology, several approaches and potential solutions now exist to manage noise and improve voice clarity. The effectiveness of various solutions can vary dramatically, and what constitutes good clean communications is contextual and subjective. It can be difficult to understand how one solution compares to another, and which is better for a given application.
The value of a technology solution to improve the communication capability of a laptop, for instance, is highly related to the context in which that laptop will be used. The user of a notebook for a Skype call will want the netbook to pick up only his voice and suppress background noise, whereas a student using the same notebook to record a lecture will want to effectively pull speech from any location out of the ambient noise of the lecture hall. A solution may be judged effective in one scenario and a failure in another. A compromise solution may perform sub optimally in both, but provide value to both users.
Mapping the technical solutions available by use is hard enough, but it is even more challenging to explain audio differentiation to retail consumers, when all products can tritely boast “great audio performance” in marketing material. With few audio demo opportunities available, consumers are often left to chance on first purchases.
Comparing noise reduction technologies
The technology that provides noise reduction solutions breaks into three broad classes: electro acoustic, analog and digital.
Electro acoustic solutions involve microphone element design, selection and placement of the microphone and the acoustic coupling design for microphone mounts. Noise cancelling or gradient microphones are a simple example of an inexpensive solution that can give a moderate benefit in some situations. Good electro acoustic design is important to get good performance out of any voice communications device, but that base performance can be further improved through the use of modern digital and analog circuits.
Analog solutions involve direct manipulation of the electrical signals that are produced by a microphone or an array of microphones. Simple solutions such as compression or directional
‘time of arrival’ processing may be more efficient in an analog form since they avoid the digital conversion stages. However, manufacturing variances inherent in semiconductor processes directly affect the performance of an analog solution in a way that digital processes are designed to avoid. As analog solutions become more complex in a bid to deliver more value, the performance variance at each processing step will compound with each subsequent step. This has effectively kept any successful analog audio products relatively simple. Analog solutions also lack the functional flexibility possible with a digital solution, since analog systems implement processing within the silicon design, rather than as a software layer over a flexible foundation.
Digital solutions involve the sampling or quantization of the electrical signal from the microphone so that computer processors can apply repeatable algorithms on the signal. This is then either transmitted in digital form or reconstructed as an improved analog representation of the captured speech. Since the digital solutions have many inherent benefits with today’s silicon technology, it is not surprising that most of the available solutions fall into this class.
Digital solutions can realize any algorithm in order to reduce noise or improve the quality of speech. Typically, these algorithms consist of spatial selectivity (where is the speech coming from?), temporal selectivity (when is there speech or not speech?) and frequency selectivity (is the speech at higher or lower pitch than the noise?). Some solutions only focus on one of these aspects, but the best will use a combination of them. Additional refinements can be added in the form of gain control, advanced environmental modeling, or other concepts.
A solution that relies heavily on spatial selectivity – also known as beam forming or directional processing – will be well suited to an application where the speaker is in a known location relative to the microphones. Such approaches are used in notebooks and mobile phones, but carry inherent disadvantages. In notebooks, this scenario is well suited for video calling, where the sound pickup is confined to the direction of the camera, but it doesn’t allow for the same computer to be used as a conference phone with several people around a table. In mobile phones, the location of the speech is usually very tightly constrained to enable ambient noise reduction, but this means that the voice may be dropped if the phone isn’t held in the correct position.
In contrast, a solution that relies on the statistics of human speech to make ongoing instantaneous decisions about what should be kept as speech and what should be filtered out as noise can handle a broader scope of uses effectively. Unfortunately, these solutions are never completely accurate in classifying the speech against the noise. The more aggressively they are tuned, the more distortions a user will hear as parts of the speech get filtered out as a result of misclassifications. Typically, intelligibility is maintained while naturalness can suffer. On a mobile phone, this may not matter since naturalness is already degraded by the wireless network, but in other applications such as a voice recorder, it may be of critical importance.
The best digital solutions tend to be blended algorithms that combine pieces of all the approaches in an intelligent way.
These solutions can often adapt to different circumstances, but also often add a heavier burden of tuning or customizing a more complex technology.
Products such as ON Semiconductor’s BelaSigna R261 single chip digital noise reduction processor are representative of the latest digital technology available to give clean voice capture. The device’s ultra miniature SoC format and low power consumption fit well with the needs of small form factor portable voice products. BelaSigna R261 has an advanced two microphone noise reduction algorithm that improves perceived speech quality while preserving naturalness. Offered with a range of prototyping tools, it is an example of the easier to design in solutions that are becoming readily available to consumer electronics product designers.
An engineer selecting a technical solution to improve voice quality needs to consider the impact beyond the audio performance merits of a solution. Some solutions will demand specific microphone types or microphone placement and acoustic design which can compromise the overall industrial or mechanical design of the product. Some solutions may draw an unacceptably high level of power from the battery or not fit into the available space on a PCB. In almost any design, cost will also be a deciding factor.
Currently there is no universal standard by which solutions can be compared. Product designers have a challenging task to interpret the audio performance needs of their market and translate those into the best technology choice for their product. However, the latest digital noise reduction solutions offer small chips, low power consumption and advanced algorithms. These solutions give excellent options to choose from when designing products for clear high quality voice communications.