comirva.mlearn
Class SOM

java.lang.Object
  extended by comirva.mlearn.SOM
All Implemented Interfaces:
VisuListItem, Serializable
Direct Known Subclasses:
GHSOM

public class SOM
extends Object
implements Serializable, VisuListItem

This class implements a Self-Organizing Map and some useful algorithms for initializing and training.

See Also:
Serialized Form

Field Summary
protected  Vector<String> altLabels
           
protected  boolean circular
           
protected  DataMatrix codebook
           
protected  DataMatrix coOccMatrix
           
protected  Vector<String> coOccMatrixLabels
           
 DataMatrix data
           
static int INIT_GRADIENT
           
static int INIT_LINEAR
           
static int INIT_RANDOM
           
static int INIT_SLC
           
protected  int intMUCols
           
protected  int intMURows
           
protected  Vector<String> labels
           
protected  int method
           
 JLabel statusBar
          statusBar represents the status bar of the calling MainUI-instance and is used to update the status bar while performing training
static int TRAIN_BATCH
           
static int TRAIN_SEQ
           
protected  int trainingLength
           
 Vector<Vector<Integer>> voronoiSet
           
 
Constructor Summary
SOM(DataMatrix trainData)
          Creates a SOM-instance with the training data contained in the DataMatrix trainData.
SOM(DataMatrix trainData, int rows, int cols)
          Creates a SOM-instance with the training data contained in the DataMatrix trainData.
 
Method Summary
 void clearAltLabels()
           
 void clearLabels()
          Clears all labels assigned to the SOM.
 void createVoronoiSet()
          Calculates the Voronoi-Set of the SOM and stored the result in the internal voronoiSet Vector which contains a nested Vector containig Integers of mapped data item indices for each map unit.
static double euclideanDistance(Vector<Double> item1, Vector<Double> item2)
          Calculates and returns the Euclidean distance between the data vectors item1 and item2.
 String getAltLabel(int dataItemIndex)
           
 Vector<String> getAltLabels()
           
 int getBMU(Vector<Double> dataItem)
          Calculates the best matching unit for the data vector dataItem and returns its index in the codebook.
 DataMatrix getCodebook()
           
 DataMatrix getCoOccMatrix()
           
 Vector<String> getCoOccMatrixLabels()
           
 DataMatrix getDataset()
          Returns the data set of the SOM (that is used for training).
 Color[] getGridcolors()
           
 String getLabel(int dataItemIndex)
          Returns the label for the data item whose index is dataItemIndex.
 Vector<String> getLabels()
           
 MDM getMDM()
           
 int getNumberOfColumns()
          Returns the number of map units of the SOM in horizontal direction.
 int getNumberOfDataItems()
          Returns the number of data items in the training set.
 int getNumberOfRows()
          Returns the number of map units of the SOM in vertical direction.
 TreeMap<Double,Integer> getOrderedBMUs(Vector<Double> dataItem)
          Calculates a set of best matching units for the data vector dataItem and returns the codebook-indices of these units.
 Vector<String> getPrototypesForMU(int idxMU, int maxNumber)
          Calculates for the given map unit its most "representative" data items.
 boolean[][] getVoronoiMatrix()
          Calculates a boolean matrix containing, for each map unit, the data items which are mapped to the unit.
 void gradientInit()
          Initializes the SOM based on a gradient from min to max.
 void init(int initMethod)
           
protected  void initWithCorners(Vector<Double> upperLeft, Vector<Double> upperRight, Vector<Double> lowerLeft, Vector<Double> lowerRight)
          The SOM is initiated by giving specific values to the corner units of the map.
 boolean isColorByPCA()
           
 void linearInit()
          Initializes the SOM with the linear initialization algorithm as proposed by T.
protected  double mapunitDistance(int mu1, int mu2)
          Calculates and returns the Euclidean distance between two map units in the output space, i.e. its distance on the SOM-grid.
 void printVoronoiSet()
          Prints the Voronoi-set of all map units to java.lang.System.out.
 void randomInit()
          Initializes the SOM based on random values.
 void setAltLabels(Vector<String> labels)
           
 void setCircular(boolean circular)
           
 void setCoOccMatrix(DataMatrix coOccMatrix)
           
 void setCoOccMatrixLabels(Vector<String> coOccMatrixLabels)
           
 void setLabels(Vector<String> labels)
          Sets the labels, i.e. the descriptions for the data items, of the SOM.
 void setMDM(MDM mdm)
           
 void setSOMSize(int mapUnitsInRow, int mapUnitsInColumn)
          Sets the number of map units in each row and each column to the argument values.
 void setTrainingLength(int trainingLength)
           
 void showCurrentFeatureState()
          This method does the same as showCurrentFeatureState(int feature), but it prints all available features.
 void showCurrentFeatureState(int feature)
          This method is useful for debugging SOM-initialisation-algorithms.
 void slcInit()
          Initializes the SOM with the algorithm proposed by Su, Liu and Chang in "Improving the Self-Organizing Feature Map Algorithm Using an Efficient Initialization Scheme" created by MSt
 void train(int method, int length)
          Trains the SOM using the method given in the parameter method.
protected  void trainBatch()
          Performs batch training.
protected  void trainSequential()
          Performs a very simple sequential training based on the equation: mi(t+1) = mi(t) + alpha(t)*hbmu,i(t)*[x-mi(t)].
 Vector<Double> vectorDistance(Vector<Double> item1, Vector<Double> item2)
          Calculates and returns a Vector containing the pairwise distances between the data vectors item1 and item2.
 Vector<Double> vectorDistanceMultiply(Vector<Double> item1, Vector<Double> item2, double multi)
          Calculates a Vector containing the pairwise distances between the data vectors item1 and item2.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

INIT_RANDOM

public static final int INIT_RANDOM
See Also:
Constant Field Values

INIT_GRADIENT

public static final int INIT_GRADIENT
See Also:
Constant Field Values

INIT_LINEAR

public static final int INIT_LINEAR
See Also:
Constant Field Values

INIT_SLC

public static final int INIT_SLC
See Also:
Constant Field Values

TRAIN_SEQ

public static final int TRAIN_SEQ
See Also:
Constant Field Values

TRAIN_BATCH

public static final int TRAIN_BATCH
See Also:
Constant Field Values

data

public DataMatrix data

codebook

protected DataMatrix codebook

labels

protected Vector<String> labels

altLabels

protected Vector<String> altLabels

coOccMatrix

protected DataMatrix coOccMatrix

coOccMatrixLabels

protected Vector<String> coOccMatrixLabels

intMURows

protected int intMURows

intMUCols

protected int intMUCols

voronoiSet

public Vector<Vector<Integer>> voronoiSet

trainingLength

protected int trainingLength

method

protected int method

circular

protected boolean circular

statusBar

public JLabel statusBar
statusBar represents the status bar of the calling MainUI-instance and is used to update the status bar while performing training

Constructor Detail

SOM

public SOM(DataMatrix trainData)
Creates a SOM-instance with the training data contained in the DataMatrix trainData.
The size of the SOM (the codebook) is determined with a heuristic function.

Parameters:
trainData - a DataMatrix containing the data for training the SOM

SOM

public SOM(DataMatrix trainData,
           int rows,
           int cols)
Creates a SOM-instance with the training data contained in the DataMatrix trainData.
The size of the SOM (the codebook) is set to the values specified by the arguments rows and cols.

Parameters:
trainData - a DataMatrix containing the data for training the SOM
rows - the number of map units in vertical direction
cols - the number of map units in horizontal direction
Method Detail

getNumberOfRows

public int getNumberOfRows()
Returns the number of map units of the SOM in vertical direction.

Returns:
the number of map units in vertical direction

getNumberOfColumns

public int getNumberOfColumns()
Returns the number of map units of the SOM in horizontal direction.

Returns:
the number of map units in horizontal direction

getNumberOfDataItems

public int getNumberOfDataItems()
Returns the number of data items in the training set.

Returns:
the number of data items of the SOM

setSOMSize

public void setSOMSize(int mapUnitsInRow,
                       int mapUnitsInColumn)
Sets the number of map units in each row and each column to the argument values.

Parameters:
mapUnitsInRow - the number of map units per row
mapUnitsInColumn - the number of map units per column

setLabels

public void setLabels(Vector<String> labels)
               throws SizeMismatchException
Sets the labels, i.e. the descriptions for the data items, of the SOM. The argument labels must contain as many String-instances as there are data items in the training set.

Parameters:
labels - a Vector containing one String for every data item in the traing set
Throws:
SizeMismatchException

clearLabels

public void clearLabels()
Clears all labels assigned to the SOM.


getAltLabel

public String getAltLabel(int dataItemIndex)

getAltLabels

public Vector<String> getAltLabels()

setAltLabels

public void setAltLabels(Vector<String> labels)
                  throws SizeMismatchException
Throws:
SizeMismatchException

clearAltLabels

public void clearAltLabels()

getLabel

public String getLabel(int dataItemIndex)
Returns the label for the data item whose index is dataItemIndex. If not labels were specified, the dataItemIndex is returned as String.

Parameters:
dataItemIndex - the index of the data item for which the label is requested
Returns:
a String containing the description (label) of the data item

getDataset

public DataMatrix getDataset()
Returns the data set of the SOM (that is used for training).

Returns:
a DataMatrix representing the data set

init

public void init(int initMethod)

randomInit

public void randomInit()
Initializes the SOM based on random values.
For this reason, the minimum and maximum value in the training data is determined and the codebook is filled with values in the range [min, max).


gradientInit

public void gradientInit()
Initializes the SOM based on a gradient from min to max.
For every feature of the inputvectors the min and max is determined, and with these values 2 vectors are created:
minVec which contains all min-values and is written to the upper left;
maxVec which contains all max-values and is written to the lower right;
All the other vectors are then generated by interpolation.


linearInit

public void linearInit()
Initializes the SOM with the linear initialization algorithm as proposed by T. Kohonen.
First of all the (squarish) autocorrelation matrix of the input data is calculated


initWithCorners

protected void initWithCorners(Vector<Double> upperLeft,
                               Vector<Double> upperRight,
                               Vector<Double> lowerLeft,
                               Vector<Double> lowerRight)
The SOM is initiated by giving specific values to the corner units of the map. The initiation values of the other map units are interpolated. This initialisation method can be used to give a map a specific orientation. (e.g. used for SubSOM-Orientation in HSOMs) In a one-dimensional case, only one vector of an "end" is used


slcInit

public void slcInit()
Initializes the SOM with the algorithm proposed by Su, Liu and Chang in "Improving the Self-Organizing Feature Map Algorithm Using an Efficient Initialization Scheme" created by MSt


train

public void train(int method,
                  int length)
Trains the SOM using the method given in the parameter method.

Parameters:
method - the training method used
length - the training length in epochs
See Also:
TRAIN_SEQ, TRAIN_BATCH

trainSequential

protected void trainSequential()

Performs a very simple sequential training based on the equation: mi(t+1) = mi(t) + alpha(t)*hbmu,i(t)*[x-mi(t)].

The number of iterations equals the number of data items in the training data set. For the learning rate alpha(t) the formula 1 - current_iteration/iterations is taken. The neighborhood-radius is calculated according to the formula: hbmu,i(t) = exp(- ||rbmu, ri|| / 2*sigma(t)^2) where the learning rate alpha is used for sigma.


trainBatch

protected void trainBatch()
Performs batch training. Basically, the standard batch SOM algorithm is used. The update rule of it is: mi(t+1) = sum(j=1..N, hbmu,i(t)*xi) / sum(j=1..N, hbmu,i(t)) The neighborhood-radius is calculated according to the formula: hbmu,i(t) = exp(- ||rbmu, ri|| / 2*sigma(t)^2)


getBMU

public int getBMU(Vector<Double> dataItem)
Calculates the best matching unit for the data vector dataItem and returns its index in the codebook.

Parameters:
dataItem - the Vector containing the data item for which the BMU should be determined
Returns:
the index of the map unit which is the best matching unit for the dataItem

getOrderedBMUs

public TreeMap<Double,Integer> getOrderedBMUs(Vector<Double> dataItem)
Calculates a set of best matching units for the data vector dataItem and returns the codebook-indices of these units.

Parameters:
dataItem - the Vector containing the data item for which the BMUs should be determined
Returns:
a TreeMap containing the indices of the map units which are the best matching units for the dataItem

getPrototypesForMU

public Vector<String> getPrototypesForMU(int idxMU,
                                         int maxNumber)
Calculates for the given map unit its most "representative" data items. That means the data items with minimal distance to the map unit's model vector are calculated and returned in decreasing order.

Parameters:
idxMU - the index of the map unit
maxNumber - the maximum number of returned data items
Returns:
a Vector containing the labels of the data items which are most similar to the map unit's model vector (in decreasing order)

euclideanDistance

public static double euclideanDistance(Vector<Double> item1,
                                       Vector<Double> item2)
                                throws SizeMismatchException
Calculates and returns the Euclidean distance between the data vectors item1 and item2.
item1 and item2 must contain the same number of Double-instances, otherwise a SizeMismatchException is thrown.

Parameters:
item1 - a Vector representing the first data vector
item2 - a Vector representing the second data vector
Returns:
a double value which is the Euclidean distance between data vector item1 and item2
Throws:
SizeMismatchException

vectorDistance

public Vector<Double> vectorDistance(Vector<Double> item1,
                                     Vector<Double> item2)
                              throws SizeMismatchException
Calculates and returns a Vector containing the pairwise distances between the data vectors item1 and item2.

Parameters:
item1 - a Vector representing the first data vector
item2 - a Vector representing the second data vector
Returns:
a Vector containing the result of the subtraction item1 - item2
Throws:
SizeMismatchException

vectorDistanceMultiply

public Vector<Double> vectorDistanceMultiply(Vector<Double> item1,
                                             Vector<Double> item2,
                                             double multi)
                                      throws SizeMismatchException
Calculates a Vector containing the pairwise distances between the data vectors item1 and item2. The result is multiplied with multi before it is returned.

Parameters:
item1 - a Vector representing the first data vector
item2 - a Vector representing the second data vector
multi - the multiplier for (item1 and item2)
Returns:
a Vector containing the result of the calculation (item1 - item2) * multi
Throws:
SizeMismatchException

mapunitDistance

protected double mapunitDistance(int mu1,
                                 int mu2)
Calculates and returns the Euclidean distance between two map units in the output space, i.e. its distance on the SOM-grid.

Parameters:
mu1 - the codebook-index of the first map unit
mu2 - the codebook-index of the second map unit
Returns:
a double value representing the Euclidean distance between the two map units mu1 and mu2 in the output space

createVoronoiSet

public void createVoronoiSet()
Calculates the Voronoi-Set of the SOM and stored the result in the internal voronoiSet Vector which contains a nested Vector containig Integers of mapped data item indices for each map unit.


getVoronoiMatrix

public boolean[][] getVoronoiMatrix()
Calculates a boolean matrix containing, for each map unit, the data items which are mapped to the unit.

Returns:
a boolean[][] two-dimensional array that contains the codebox-indices in its first dimension and the data items in its second
If there is a mapping between the map unit and the data item, the value in the boolean-matrix is true, otherwise false.

printVoronoiSet

public void printVoronoiSet()
Prints the Voronoi-set of all map units to java.lang.System.out. This set contains all data items that are mapped to a specific map unit.


showCurrentFeatureState

public void showCurrentFeatureState()
This method does the same as showCurrentFeatureState(int feature), but it prints all available features. created by MSt


showCurrentFeatureState

public void showCurrentFeatureState(int feature)
This method is useful for debugging SOM-initialisation-algorithms. It outputs the arrangement of the feature nr feature from the codebook. The visualised array shows the values of exactly this feature. created by MSt


getLabels

public Vector<String> getLabels()

getCodebook

public DataMatrix getCodebook()

getMDM

public MDM getMDM()

setMDM

public void setMDM(MDM mdm)

isColorByPCA

public boolean isColorByPCA()

getGridcolors

public Color[] getGridcolors()

setTrainingLength

public void setTrainingLength(int trainingLength)

setCircular

public void setCircular(boolean circular)

getCoOccMatrix

public DataMatrix getCoOccMatrix()

setCoOccMatrix

public void setCoOccMatrix(DataMatrix coOccMatrix)

getCoOccMatrixLabels

public Vector<String> getCoOccMatrixLabels()

setCoOccMatrixLabels

public void setCoOccMatrixLabels(Vector<String> coOccMatrixLabels)