Cloud9 icon indicating copy to clipboard operation
Cloud9 copied to clipboard

Cloud9 is a Hadoop toolkit for working with big data

Results 5 Cloud9 issues
Sort by recently updated
recently updated
newest added

Optimizations to support indexing English Gigaword 5th ed (10M docs).

- Increased language support for Wikipedia for top 24 languages by # of articles - Added disambiguation patterns for each of the 24 supported languages - ExtractWikipediaDisambiguations lets you extract...

Some error checks for parsing Wikipedia dumps and English wikipedia pages.

With this change, one should be able to process a bzip2 directly. Let me know if you have any comment.

Very minor change on line 211 to make sure AWS EMR doesn't throw errors.