comirva.audio.util
Class Sone

java.lang.Object
  extended by comirva.audio.util.Sone

public class Sone
extends Object

Specific Loudness Sensation - Sone

Description:

Computes sonogram from a pcm signal. A sonogram of an audio segment consists of the specific loudness sensation (Sone) per critical-band (bark) in short time intervals[1]. One object supports only one sample rate and a given window size.

[1] Rauber, Pampalk, Merkl "Using Psycho-Acoustic Models and Self-Organizing Maps to Create a Hierarchical Structuring of Music by Sound Similarity", in Proceedings of ISMIR, 2002.

[2] Schroeder, Atal, Hall "Optimizing digital speech coders by exploiting masking properties of the human ear", JASA, 1979.

[3] Terhardt "Calculation virtual pitch", Hearing Research, 1979.

[4] Zwicker, Fastl "Psychoacoustics, Facts and Models", Springer, 2nd edition.

[5] Bladon, Lindblom "Modeling the judgment of vowel quality differences", JASA, 1981.

[6] Pampalk, Dixon, Widmer "Exploring Music Collections by Browsing Different Views", Computer Music Journal, Vol. 28, Issue 2, 2004.


Field Summary
protected  double baseFreq
           
protected  int hopSize
           
protected  FFT normalizedPowerFFT
           
protected  float sampleRate
           
protected  int windowSize
           
 
Constructor Summary
Sone(float sampleRate)
          Creates a Sone object with default window size of 256 for the given sample rate.
Sone(int windowSize, float sampleRate)
          Ceates a Sone object with given window size and sample rate.
 
Method Summary
 int[] getBarkUpperBoundaries(double sampleRate)
          Returns an array with the upper boundaries of the bark bands.
 int getHopSize()
          Returns the number of samples skipped between two windows.
 double[][] getSpreadMatrix(int barkSize)
          Creates a matrix for computation of spectral masking effects for the used bark bands.
 double[] getTerhardtWeights(double baseFrequency, int vectorSize)
          Creates a weight vector according to the outer ear formula of Terhardt.
 Vector<double[]> process(AudioPreProcessor in)
          Performs the transformation of the input data to Sone.
 double[][] process(double[] input)
          Performs the transformation of the input data to Sone.
 double[] processWindow(double[] window, int start)
          Transforms one window of samples to Sone.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

windowSize

protected int windowSize

hopSize

protected int hopSize

sampleRate

protected float sampleRate

baseFreq

protected double baseFreq

normalizedPowerFFT

protected FFT normalizedPowerFFT
Constructor Detail

Sone

public Sone(float sampleRate)
     throws IllegalArgumentException
Creates a Sone object with default window size of 256 for the given sample rate. The overleap of the windows is fixed at 50 percent.

Parameters:
sampleRate - float samples per second, must be greater than zero; not whole-numbered values get rounded
Throws:
IllegalArgumentException - raised if method contract is violated

Sone

public Sone(int windowSize,
            float sampleRate)
     throws IllegalArgumentException
Ceates a Sone object with given window size and sample rate. The overleap of the windows is fixed at 50 percent. The window size must be 2^n and at least 32. The sample rate must be at least 1.

Parameters:
windowSize - int size of a window
sampleRate - float samples per second, must be greater than zero; not whole-numbered values get rounded
Throws:
IllegalArgumentException - raised if method contract is violated
Method Detail

getHopSize

public int getHopSize()
Returns the number of samples skipped between two windows. Since the overleap of 50 percent is fixed, the hop size is half the window size.

Returns:
int hop size

process

public Vector<double[]> process(AudioPreProcessor in)
                         throws IllegalArgumentException,
                                IOException
Performs the transformation of the input data to Sone. This is done by splitting the given data into windows and processing each of these windows with processWindow().

Parameters:
in - AudioPreProcessor input data is a complete Audio stream, must have the same sample rate like this sone object, must not be a null value
Returns:
Vector this vector contains a double array of Sone value for each window
Throws:
IOException - if there are any problems regarding the inputstream
IllegalArgumentException - raised if method contract is violated

process

public double[][] process(double[] input)
                   throws IllegalArgumentException,
                          IOException
Performs the transformation of the input data to Sone. This is done by splitting the given data into windows and processing each of these windows with processWindow().

Parameters:
input - double[] input data is an array of samples, must be a multiple of the hop size, must not be a null value
Returns:
double[][] an array of arrays contains a double array of Sone value for each window
Throws:
IOException - if there are any problems regarding the inputstream
IllegalArgumentException - raised if method contract is violated

processWindow

public double[] processWindow(double[] window,
                              int start)
                       throws IllegalArgumentException
Transforms one window of samples to Sone. The following steps are performed:

(1) normalized power fft with hanning window function

(2) compute influence of the outer ear by emphasizing some frequencies (model by Terhardt[3])

(3) Conversion to bark scale to reduce the data to the critical bands of human hearing[4].

(4) calculate the influence of spectral masking effects, since the human hear needs some regeneration time and can't perceive similar short delayed tones[2]. Also conversion to db is done in this step

(5) Finally the db values are converted to loudness values (Sone, a psychoacoustic scale). This loudness scale better represent the human perception of loudness than the db scale does[5].

Parameters:
window - double[] data to be converted, must contain enough data for one window
start - int start index of the window data
Returns:
double[] the window representation in Sone
Throws:
IllegalArgumentException - raised if method contract is violated

getTerhardtWeights

public double[] getTerhardtWeights(double baseFrequency,
                                   int vectorSize)
                            throws IllegalArgumentException
Creates a weight vector according to the outer ear formula of Terhardt. The k-th component of the weight vector is the weight for the frequency k*baseFrequency. For details take a look at [3].

Parameters:
baseFrequency - The base frequency (Hz) the weights are based on. The frequency of the first component of the weight vector. The base frequency must be a positive value.
vectorSize - dimension of the vector to compute, must be a positive value or zero
Returns:
a vector with weights for multiples of the base frequency.
Throws:
IllegalArgumentException - raised if method contract is violated

getSpreadMatrix

public double[][] getSpreadMatrix(int barkSize)
                           throws IllegalArgumentException
Creates a matrix for computation of spectral masking effects for the used bark bands. Masking effects can be calculated by matrix multiplication. For details take a look at [2].

Parameters:
barkSize - the number of bark bands in use for the calculation of the masking effects, must be a positive value
Returns:
quadratic matrix with dimensions according to the given number of bark bands
Throws:
IllegalArgumentException - raised if method contract is violated

getBarkUpperBoundaries

public int[] getBarkUpperBoundaries(double sampleRate)
                             throws IllegalArgumentException
Returns an array with the upper boundaries of the bark bands. Only bark bands with a lower frequency than the sampling frequency are considered. For details take a look at [4].

Parameters:
sampleRate - sample rate (Hz), must be a positive value
Returns:
an array containing the upper boundaries of the bark bands, the number of bark bands to consider defines the length of the array
Throws:
IllegalArgumentException - raised if method contract is violated