計算機人工智能類留學生畢業dissertation寫作-The Classification of Speech enhancement
The speech enhancement shows more and more fascination in the speech processing, so there are emerging many thechnologies to process the speech signal. If we classfy the methods of speech enhancement, there are four main types:
Parameter Method: such as Wiener filter,Comb filter,Kalman filter and so on.The method applies on the model generated by language (such as AR model), which takes out the model parameter (such as fundamental tone and LPC coefficient) by the methods of iteration. However, if the real noise or the speech environment is far different from the model, the parameters are hard to get, so the method is not that general.
Non-Parameter Method: such as spectrum substraction, self-adjusting filter and so on. Because it is not necessary to estimate the model parameter from the noisy speech, the non-parameter methods are suitable for more applications. However, the constraints are less and there is little aggregation information, so the results are not so optimal.
語音增強示出了在語音處理越來越多的魅力,所以有很多新興thechnologies來處理語音信號。如果我們classfy語音增強的方法,主要有四種類型:
參數方法:如維納濾波,梳狀濾波器,卡爾曼濾波等on.The方法適用于由按語言(如AR模型),它接受了模型參數(如基音和LPC系數)生成的模型迭代的方法。但是,如果真正的噪聲或語音環境是從模型完全不同,參數是很難得到,所以這種方法并不是一般。
非參數方法:如譜減法,自我調節過濾器等。因為這是沒有必要來估計從嘈雜的語音模型的參數,非參數方法適合于更多的應用。然而,約束少,很少有聚集信息,所以結果不那么理想。
Aggregation Method: such as Markov model, traditional speech spectrum and noisy speech spectrum, MAP-maximum a-posteriori, Minimum estimation and so on. Aggregaion method can take full advantage of the aggregation, characteristics of speech and noisy. Firstly, we can establish the model library, and then abtain the aggregation parameter in the progress of training, which is much related to the speech regconition system.
The Other Method: such as Wavelet filter, Karhunen-Loeve Transform, artificialneural network and so on. The methods are not as general as the above three types, so we can call them non-mainstream methods.
In recent years, the aggregation method shows great powers in the industrial fields.
The Challenge of Speech enhancement
Speech enhancement is a key step for the speech signal detection, and there many challenges for the speech enhancement. In this paper, the challenges are divided into several types.
Diversification
In common use, the word noise means any unwanted sound. In this paper, it focuses on microphone speech inprovement for a speech recognization system which means the microphone noise is the research point of this paper. Therefore, microphone noise generally could come from two sources: background; frustrating complication of recording or transferring sound. The background noise (sometimes called ambient noise also) which belongs to acoustic noise could be any sound from envirement that makes speech signal monitored. Meanwhile, microphones are touchy, influenced by a number of variables ranging from jack connections to electromagnetism. The noise come from the recording or transferring process is usually produced from electric equipment that is why it is called electric noise. In corresponding, the types of distortion introduced by speech enhancement algorithms could be broadly divided into two categories: the distortions that affect the speech signals themselves (called speech distortion) and the distortions that affect the noise (called noise distortion). Of these two types of distortion, listeners are easy to be influenced by the speech distortion when making judgments of overall quality.
The noise comes from the real life and the noise is various. According to the generation methods, the noise can be divided into the internal noise and external noise. Internal noise is caused by the devices inside as well as the equipment material itself, external noise refers to the noise introduced by the signal processing apparatus external environment. The noise can be additive, can also be non-additive. For non-additive noise, some noise can transform into additive noise. If we class the noise in smooth mode, the noise can be divided
So the speech enhancement can be different if the noise is different. Speech is a time-varying, non-stationary and random signal. The shape of vocal tract and vocal cord of human being are relatively stable, so the change of the physiological structure of human beings is limited. In this case, the speech signal could be assumed as a relative stable signal in a short period of time (10-30ms). Therefore the research of short-term spectrum of speech is based on this short-time stationary characteristic .
The purpose of speech enhancement technology is how to eliminate the noise interference by speech processing techniques. The common way to eliminate the noise pollution is to let the contaminated signal through a filter, which could inhibite the noise, but keep the signal relatively constant. The way to achieve the purpose of improving signal to noise ratio is to make the process by offseting the original noise.
The noisy speech signal Y (t) is assumed to be the sum of the clean speech signal s (t) and the uncorrelated additive noise n (t). The function would be explained as following:
Y(t)=s(t)+n(t) 0 ≤ t ≤T.
In this function, Y(t) represent the noisy speech signal and s(t)is the pure speech signal, and the n(t) represent the noise signal . As the noise is different, the n(t) will verify a lot. So the primary chanllenge of speech enhancement is to deal with the noise. We can adopt various methods for different applications.
Music Noise
If the noise is kicked out from the speech, the speech will turn to “music noise”, which will make the speech be less readable. The spectrum substraction is widely used in the frequency domain. When the speech spectrum exceeds the estimated noisy spectrum, there will be intermittent short-term sudden speech. And the sudden speech is clearer than the original signal, which is more repugnant. Speaking more specifically, the noisy generated by the spectrum substraction is called residual noise and the noise sounds like the “music”, so the noisy is named “music noise”.
Noise signal is time-varying as the processing
In many fields, the noise is stable from the start to the end, so if we want to get the pure signal, we can pick out the stable noise from the noisy speech. However, in the daily life, stable noise is rarely emerged.
Besides the above chanllenges, there is a more important one which is that when we process the speech, the noise varies as the processing.
Take the smart home as an example. When the speech recognition is in progress, the TV set is on or there is a door-knocking. In this situation, in order to get the purer signal, we should kick out the special noise in the smart home environment.