Mavuno
Mavuno
A Hadoop-Based Text Mining Toolkit
supported input formats
applications
pattern/context extractors
nlp tool support
installation
Download Mavuno:
1. Unpack the Mavuno archive or clone the Mavuno github repository.
To compile the code, build the Mavuno jar, and set up the Hadoop classpath:
2.Run ant jar from the Mavuno directory to compile the code and build the Mavuno jar.
3.Add the jars in the lib directory to the Hadoop classpath. The easiest way to do this is to copy the jars to your $HADOOP_HOME/lib/ directory.
To run Mavuno applications:
4.Run hadoop jar ivory-VERSION.jar APPCLASS OPTIONS
where VERSION is the current version of Mavuno, APPCLASS is the class of the application to run, and OPTIONS are any command line options that may be required by the application.
Applicable to all text documents:
Applicable to NLP processed documents:
Applicable to Twitter JSON documents:
Other extractors:
- MultiExtractor - Allows multiple extractors to be applied per document.
- LineInputFormat (one input per line)
- TextFileInputFormat (text files)
- TrecInputFormat (TREC-style documents)
- TwitterInputFormat (Twitter JSON)
Basic Text Mining (edu.isi.mavuno.app.mine):
Distributional Similarity (edu.isi.mavuno.app.distsim):
NLP (edu.isi.mavuno.app.nlp):
Information Extraction (edu.isi.mavuno.app.ie):
Utilities (edu.isi.mavuno.app.util):
application parameters
Some applications require that one or more parameters be specified (e.g., input paths, output paths, etc.). Mavuno supports two ways of specifying these parameters:
Specifying parameters on the command line:
-PARAMETER=VALUE
where PARAMETER is the name of the parameter/option and VALUE is the desired value.
Specifying parameters with parameter files:
Each line of a parameter file takes the following form:
PARAMETER [TAB] VALUE
where PARAMETER is the name of the parameter/option, VALUE is the desired value, and [TAB] denotes the tab ("\t") character.
javadoc
The Mavuno javadoc is available here.