Mavuno
Mavuno
A Hadoop-Based Text Mining Toolkit
Mavuno
recent news
- January 20, 2012 - Mavuno version 0.2 released.
- October 8, 2011 - Mavuno web site launched and version 0.1 released.
Mavuno is an open source, modular, scalable text mining toolkit built upon Hadoop. It supports basic natural language processing tasks (e.g., part of speech tagging, chunking, parsing, named entity recognition), is capable of large-scale distributional similarity computations (e.g., synonym, paraphrase, and lexical variant mining), and has information extraction capabilities (e.g., instance and semantic relation mining). It can easily be adapted to new input formats and text mining tasks.
getting started
The easiest way to start using Mavuno is to follow these steps:
0. Install Java (1.6+) and Hadoop (version 0.20.2)
2. Browse the documentation
3. Try out some examples
If you modify the Mavuno source code, then you'll also want to:
4. Build Mavuno
5. Browse the Javadocs
6. Contribute bug fixes, enhancements, and additional functionality back to the project
coming soon
- Improved documentation.
- More examples.
- Faster distributional similarity implementation.
- More applications (information retrieval, natural language processing, text mining).
- Better error handling.