comirva.audio.util
Class MFCC

java.lang.Object
  extended by comirva.audio.util.MFCC

public class MFCC
extends java.lang.Object

Mel Frequency Cepstrum Coefficients - MFCCs

Description:

Computes the MFCC representation of a pcm signal. The signal is cut into short overlapping frames, and for each frame, a feature vector is is computed, which consists of Mel Frequency Cepstrum Coefficients.
The cepstrum is the inverse Fourier transform of the log-spectrum. We call mel-cepstrum the cepstrum computed after a non-linear frequency wrapping onto a perceptual frequency scale, the Mel-frequency scale. Since it is a inverse Fourier transform, the resulting coefficients are called Mel frequency cepstrum coefficients (MFCC). Only the first few coefficients are used to represent a frame. The number of coefficients is a an important parameter. Therefore MFCCs provide a low-dimensional, smoothed version of the log spectrum, and thus are a good and compact representation of the spectral shape. They are widely used as features for speech recognition, and have also proved useful in music instrument recognition [1].

[1] Aucouturier, Pachet "Improving Trimbre Similarity: How high's the sky?", in Journal of Negative Results in Speech and Audio Sciences, 1(1), 2004.


Constructor Summary
MFCC(float sampleRate)
          Creates a new MFCC object with default window size of 512 for the given sample rate.
MFCC(float sampleRate, int windowSize, int numberCoefficients, boolean useFirstCoefficient)
          Creates a new MFCC object. 40 mel-filters are place in the range from 20 to 16000 Hz.
MFCC(float sampleRate, int windowSize, int numberCoefficients, boolean useFirstCoefficient, double minFreq, double maxFreq, int numberFilters)
          Creates a new MFCC object. 40 mel-filters are place in the range from 20 to 16000 Hz.
 
Method Summary
 int getWindowSize()
          Returns the window size.
 java.util.Vector<double[]> process(AudioPreProcessor in)
          Performs the transformation of the input data to MFCCs.
 double[][] process(double[] input)
          Performs the transformation of the input data to MFCCs.
 double[] processWindow(double[] window, int start)
          Transforms one window of MFCCs.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

MFCC

public MFCC(float sampleRate)
     throws java.lang.IllegalArgumentException
Creates a new MFCC object with default window size of 512 for the given sample rate. The overlap of the windows is fixed at 50 percent. The number of coefficients is set to 20 and the first coefficient is in use. The 40 mel-filters are place in the range from 20 to 16000 Hz.

Parameters:
sampleRate - float samples per second, must be greater than zero; not whole-numbered values get rounded
Throws:
java.lang.IllegalArgumentException - raised if method contract is violated

MFCC

public MFCC(float sampleRate,
            int windowSize,
            int numberCoefficients,
            boolean useFirstCoefficient)
     throws java.lang.IllegalArgumentException
Creates a new MFCC object. 40 mel-filters are place in the range from 20 to 16000 Hz.

Parameters:
sampleRate - float samples per second, must be greater than zero; not whole-numbered values get rounded
windowSize - int size of window; must be 2^n and at least 32
numberCoefficients - int must be grate or equal to 1 and smaller than the number of filters
useFirstCoefficient - boolean indicates whether the first coefficient of the dct process should be used in the mfcc feature vector or not
Throws:
java.lang.IllegalArgumentException - raised if method contract is violated

MFCC

public MFCC(float sampleRate,
            int windowSize,
            int numberCoefficients,
            boolean useFirstCoefficient,
            double minFreq,
            double maxFreq,
            int numberFilters)
     throws java.lang.IllegalArgumentException
Creates a new MFCC object. 40 mel-filters are place in the range from 20 to 16000 Hz.

Parameters:
sampleRate - float samples per second, must be greater than zero; none integer values get rounded
windowSize - int size of window; must be 2^n and at least 32
numberCoefficients - int must be grate or equal to 1 and smaller than the number of filters
useFirstCoefficient - boolean indicates whether the first coefficient of the dct process should be used in the mfcc feature vector or not
minFreq - double start of the interval to place the mel-filters in
maxFreq - double end of the interval to place the mel-filters in
numberFilters - int number of mel-filters to place in the interval
Throws:
java.lang.IllegalArgumentException - raised if method contract is violated
Method Detail

process

public java.util.Vector<double[]> process(AudioPreProcessor in)
                                   throws java.lang.IllegalArgumentException,
                                          java.io.IOException
Performs the transformation of the input data to MFCCs. This is done by splitting the given data into windows and processing each of these windows with processWindow().

Parameters:
in - AudioPreProcessor input data is a complete Audio stream, must have the same sample rate like this sone object, must not be a null value
Returns:
Vector this vector contains a double array of Sone value for each window
Throws:
java.io.IOException - if there are any problems regarding the inputstream
java.lang.IllegalArgumentException - raised if method contract is violated

process

public double[][] process(double[] input)
                   throws java.lang.IllegalArgumentException,
                          java.io.IOException
Performs the transformation of the input data to MFCCs. This is done by splitting the given data into windows and processing each of these windows with processWindow().

Parameters:
input - double[] input data is an array of samples, must be a multiple of the hop size, must not be a null value
Returns:
double[][] an array of arrays contains a double array of Sone value for each window
Throws:
java.io.IOException - if there are any problems regarding the inputstream
java.lang.IllegalArgumentException - raised if method contract is violated

getWindowSize

public int getWindowSize()
Returns the window size.

Returns:
int the window size in samples

processWindow

public double[] processWindow(double[] window,
                              int start)
                       throws java.lang.IllegalArgumentException
Transforms one window of MFCCs. The following steps are performed:

(1) normalized power fft with hanning window function
(2) convert to Mel scale by applying a mel filter bank
(3) Conversion to db
(4) finally a DCT is performed to get the mfcc

This process is mathematical identical with the process described in [1].

Parameters:
window - double[] data to be converted, must contain enough data for one window
start - int start index of the window data
Returns:
double[] the window representation in Sone
Throws:
java.lang.IllegalArgumentException - raised if method contract is violated