comirva.web.crawling
Class PageCountsRetriever

java.lang.Object
  extended by java.lang.Thread
      extended by comirva.web.crawling.PageCountsRetriever
All Implemented Interfaces:
Runnable

public class PageCountsRetriever
extends Thread

This class implements functions for retrieving page counts with any search engine with Google-like interface.


Nested Class Summary
 
Nested classes/interfaces inherited from class java.lang.Thread
Thread.State, Thread.UncaughtExceptionHandler
 
Field Summary
 
Fields inherited from class java.lang.Thread
MAX_PRIORITY, MIN_PRIORITY, NORM_PRIORITY
 
Constructor Summary
PageCountsRetriever(PageCountsRetrieverConfig pcrCfg, Vector searchWords, Vector ml, DefaultListModel lm, JLabel statusBar)
          Creates a PageCountsRetriever for accessing Google-like search engines and calculating a page count matrix for the co-occurence of the terms in the searchList Vector.
 
Method Summary
 DataMatrix getPageCountMatrix()
          Returns the page count matrix for the co-occurence of the search terms on web pages.
 void run()
          This method is called when the thread is started.
 
Methods inherited from class java.lang.Thread
activeCount, checkAccess, countStackFrames, currentThread, destroy, dumpStack, enumerate, getAllStackTraces, getContextClassLoader, getDefaultUncaughtExceptionHandler, getId, getName, getPriority, getStackTrace, getState, getThreadGroup, getUncaughtExceptionHandler, holdsLock, interrupt, interrupted, isAlive, isDaemon, isInterrupted, join, join, join, resume, setContextClassLoader, setDaemon, setDefaultUncaughtExceptionHandler, setName, setPriority, setUncaughtExceptionHandler, sleep, sleep, start, stop, stop, suspend, toString, yield
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

PageCountsRetriever

public PageCountsRetriever(PageCountsRetrieverConfig pcrCfg,
                           Vector searchWords,
                           Vector ml,
                           DefaultListModel lm,
                           JLabel statusBar)
Creates a PageCountsRetriever for accessing Google-like search engines and calculating a page count matrix for the co-occurence of the terms in the searchList Vector.

Parameters:
pcrCfg - a PageCountsRetrieverConfig-instance containing the configuration for the web crawls
searchWords - a Vector containing the search words for which the (joint) appearance on web pages should be determined
ml - the Vector to which the name of the DataMatrix should be added after it has been determined by web crawl
lm - the DefaultListModel to add the name of the matrix to the UI
statusBar - the JLabel represetning the status bar (for writing current loading progress)
Method Detail

run

public void run()
This method is called when the thread is started. It creates AnySearch-instances for each query, raises the query and stores the retrieved page counts into a DataMatrix.

Specified by:
run in interface Runnable
Overrides:
run in class Thread
See Also:
Runnable.run()

getPageCountMatrix

public DataMatrix getPageCountMatrix()
Returns the page count matrix for the co-occurence of the search terms on web pages.

Returns:
a DataMatrix with the page count for the co-occurence of the search terms