comirva.web.crawling
Class PageCountsRetriever_InvalidPCRequerier

java.lang.Object
  extended by java.lang.Thread
      extended by comirva.web.crawling.PageCountsRetriever_InvalidPCRequerier
All Implemented Interfaces:
Runnable

public class PageCountsRetriever_InvalidPCRequerier
extends Thread

This class implements functions for retrieving page counts with any search engine with Google-like interface. It is used to requery invalid entries (-1) in an existing page-count-matrix.


Nested Class Summary
 
Nested classes/interfaces inherited from class java.lang.Thread
Thread.State, Thread.UncaughtExceptionHandler
 
Field Summary
 
Fields inherited from class java.lang.Thread
MAX_PRIORITY, MIN_PRIORITY, NORM_PRIORITY
 
Constructor Summary
PageCountsRetriever_InvalidPCRequerier(PageCountsRetrieverConfig pcrCfg, Vector searchWords, DataMatrix pageCountMatrix, JLabel statusBar)
          Creates an InvalidPageCountsRetriever for accessing Google-like search engines and calculating a page count matrix for the co-occurence of the terms in the searchList Vector.
 
Method Summary
 DataMatrix getPageCountMatrix()
          Returns the page count matrix for the co-occurence of the search terms on web pages.
 void run()
          This method is called when the thread is started.
 
Methods inherited from class java.lang.Thread
activeCount, checkAccess, countStackFrames, currentThread, destroy, dumpStack, enumerate, getAllStackTraces, getContextClassLoader, getDefaultUncaughtExceptionHandler, getId, getName, getPriority, getStackTrace, getState, getThreadGroup, getUncaughtExceptionHandler, holdsLock, interrupt, interrupted, isAlive, isDaemon, isInterrupted, join, join, join, resume, setContextClassLoader, setDaemon, setDefaultUncaughtExceptionHandler, setName, setPriority, setUncaughtExceptionHandler, sleep, sleep, start, stop, stop, suspend, toString, yield
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

PageCountsRetriever_InvalidPCRequerier

public PageCountsRetriever_InvalidPCRequerier(PageCountsRetrieverConfig pcrCfg,
                                              Vector searchWords,
                                              DataMatrix pageCountMatrix,
                                              JLabel statusBar)
Creates an InvalidPageCountsRetriever for accessing Google-like search engines and calculating a page count matrix for the co-occurence of the terms in the searchList Vector.

Parameters:
pcrCfg - a PageCountsRetrieverConfig-instance containing the configuration for the web crawls
searchWords - a Vector containing the search words for which the (joint) appearance on web pages should be determined
pageCountMatrix - a DataMatrix containing the page-count-matrix with invalid entries
statusBar - the JLabel represetning the status bar (for writing current loading progress)
Method Detail

run

public void run()
This method is called when the thread is started. It creates AnySearch-instances for each query, raises the query and stores the retrieved page counts into a DataMatrix.

Specified by:
run in interface Runnable
Overrides:
run in class Thread
See Also:
Runnable.run()

getPageCountMatrix

public DataMatrix getPageCountMatrix()
Returns the page count matrix for the co-occurence of the search terms on web pages.

Returns:
a DataMatrix with the page count for the co-occurence of the search terms