Mavuno

supported input formats

applications

pattern/context extractors

nlp tool support

installation

Download Mavuno:

1. Unpack the Mavuno archive or clone the Mavuno github repository.

To compile the code, build the Mavuno jar, and set up the Hadoop classpath:

2.Run ant jar from the Mavuno directory to compile the code and build the Mavuno jar.
3.Add the jars in the lib directory to the Hadoop classpath. The easiest way to do this is to copy the jars to your $HADOOP_HOME/lib/ directory.

To run Mavuno applications:

4.Run hadoop jar ivory-VERSION.jar APPCLASS OPTIONS

where VERSION is the current version of Mavuno, APPCLASS is the class of the application to run, and OPTIONS are any command line options that may be required by the application.

Applicable to all text documents:

Applicable to NLP processed documents:

Applicable to Twitter JSON documents:

Other extractors:

- MultiExtractor - Allows multiple extractors to be applied per document.

- ClueWarcInputFormat (ClueWeb)
- LineInputFormat (one input per line)
- TextFileInputFormat (text files)
- TrecInputFormat (TREC-style documents)
- TwitterInputFormat (Twitter JSON)

Basic Text Mining (edu.isi.mavuno.app.mine):

Distributional Similarity (edu.isi.mavuno.app.distsim):

NLP (edu.isi.mavuno.app.nlp):

Information Extraction (edu.isi.mavuno.app.ie):

Utilities (edu.isi.mavuno.app.util):

application parameters

Some applications require that one or more parameters be specified (e.g., input paths, output paths, etc.). Mavuno supports two ways of specifying these parameters:

Specifying parameters on the command line:

-PARAMETER=VALUE

where PARAMETER is the name of the parameter/option and VALUE is the desired value.

Specifying parameters with parameter files:

Each line of a parameter file takes the following form:

PARAMETER [TAB] VALUE

where PARAMETER is the name of the parameter/option, VALUE is the desired value, and [TAB] denotes the tab ("\t") character.

javadoc

The Mavuno javadoc is available here.