Noisereducer Code Learning

ノイズの音声データがなかった際の処理 - spectralgate/stationary.py

noisereducerはノイズ音声をパラメータ"y_noise"によって設定できるが、受け取ることができなかった場合は47行目においてy_noiseがなかった際の条件分岐が行われる。このとき、48行目においてノイズ音声を全体の音声として設定する。そうすることで、全体の音声から定常音声をノイズとして抜き取ることができる。


if y_noise is None:
    self.y_noise = self.y

除去率の設定 - spectralgate/stationary.py

除去率は、パラメータ"prop_decrease"によって設定される。パラメータの値の範囲は0から1までである。 108行目において、prop_decreaseで指定したノイズのカット率が高いほどノイズ部分を0に近い値にし、低いほどノイズ部分を元の音声の大きさに近い値にする。


sig_mask = sig_mask * self._prop_decrease + np.ones(np.shape(sig_mask)) * (
        1.0 - self._prop_decrease
)

平滑化

112行目において、平滑化の有無を決めるsmooth_maskが有効であればノイズ除去後の音声を平滑化する。平滑化を決めるパラメータは"freq_mask_smooth_hz"と"time_mask_smooth_ms"である。ともにデフォルトは"500"と"50"である。平滑化のプログラムは、base.pyの7行目に関数として定義されている。そこでは、N点の三角フィルタを生成している。その時、Nの数は周波数成分と時間成分において"freq_mask_smooth_hz"と"time_mask_smooth_ms"で定義されているため、三角フィルタは2次元のピラミッド型のフィルタになる。その後、平滑化した音声データを元の音声データと掛け合わせて、平滑化した音声データを元の音声データに代入する。


if self.smooth_mask:
    # convolve the mask with a smoothing filter
    sig_mask = fftconvolve(sig_mask, self._smoothing_filter, mode="same")


def _smoothing_filter(n_grad_freq, n_grad_time):
    """Generates a filter to smooth the mask for the spectrogram

    Arguments:
        n_grad_freq {[type]} -- [how many frequency channels to smooth over with the mask.]
        n_grad_time {[type]} -- [how many time channels to smooth over with the mask.]
    """
    smoothing_filter = np.outer(
        np.concatenate(
            [
                np.linspace(0, 1, n_grad_freq + 1, endpoint=False),
                np.linspace(1, 0, n_grad_freq + 2),
            ]
        )[1:-1],
        np.concatenate(
            [
                np.linspace(0, 1, n_grad_time + 1, endpoint=False),
                np.linspace(1, 0, n_grad_time + 2),
            ]
        )[1:-1],
    )
    smoothing_filter = smoothing_filter / np.sum(smoothing_filter)
    return smoothing_filter

Dispatch Logic

112行目において、平滑化の有無を決めるsmooth_maskが有効であればノイズ除去後の音声を平滑化する。平滑化を決めるパラメータは"freq_mask_smooth_hz"と"time_mask_smooth_ms"である。ともにデフォルトは"500"と"50"である。平滑化のプログラムは、base.pyの7行目に関数として定義されている。そこでは、N点の三角フィルタを生成している。その時、Nの数は周波数成分と時間成分において"freq_mask_smooth_hz"と"time_mask_smooth_ms"で定義されているため、三角フィルタは2次元のピラミッド型のフィルタになる。その後、平滑化した音声データを元の音声データと掛け合わせて、平滑化した音声データを元の音声データに代入する。


if self.smooth_mask:
    # convolve the mask with a smoothing filter
    sig_mask = fftconvolve(sig_mask, self._smoothing_filter, mode="same")


def _smoothing_filter(n_grad_freq, n_grad_time):
    """Generates a filter to smooth the mask for the spectrogram

    Arguments:
        n_grad_freq {[type]} -- [how many frequency channels to smooth over with the mask.]
        n_grad_time {[type]} -- [how many time channels to smooth over with the mask.]
    """
    smoothing_filter = np.outer(
        np.concatenate(
            [
                np.linspace(0, 1, n_grad_freq + 1, endpoint=False),
                np.linspace(1, 0, n_grad_freq + 2),
            ]
        )[1:-1],
        np.concatenate(
            [
                np.linspace(0, 1, n_grad_time + 1, endpoint=False),
                np.linspace(1, 0, n_grad_time + 2),
            ]
        )[1:-1],
    )
    smoothing_filter = smoothing_filter / np.sum(smoothing_filter)
    return smoothing_filter