A Hadoop-Based Text Mining Toolkit



recent news

- January 20, 2012 - Mavuno version 0.2 released.

- October 8, 2011 - Mavuno web site launched and version 0.1 released.

Mavuno is an open source, modular, scalable text mining toolkit built upon Hadoop. It supports basic natural language processing tasks (e.g., part of speech tagging, chunking, parsing, named entity recognition), is capable of large-scale distributional similarity computations (e.g., synonym, paraphrase, and lexical variant mining), and has information extraction capabilities (e.g., instance and semantic relation mining). It can easily be adapted to new input formats and text mining tasks.

getting started

The easiest way to start using Mavuno is to follow these steps:

0. Install Java (1.6+) and Hadoop (version 0.20.2)

  1. 1. Download Mavuno

  2. 2. Browse the documentation

  3. 3. Try out some examples

If you modify the Mavuno source code, then you'll also want to:

  1. 4. Build Mavuno

  2. 5. Browse the Javadocs

  3. 6. Contribute bug fixes, enhancements, and additional functionality back to the project

coming soon

  1. - Improved documentation.

  2. - More examples.

  3. - Faster distributional similarity implementation.

  4. - More applications (information retrieval, natural language processing, text mining).

  5. - Better error handling.