lightlda
lightlda copied to clipboard

da03

→

Metadata

Distributed LDA, takes raw text as input and outputs topic word table.

Readme
Issues

Light LDA

Modified based on MSR's Light LDA, added preprocessing scripts.

Usage (Suppose you are in lightlda/):

make

cd datasets

tar zxf 20news-train.tgz

python scripts/pipeline.py etc/params.config

Note: parameters are defined in etc/params.config. The result is put in output/model/${timestamp}/snapshot.word_topic_table.${iteration}${client_id}. By using python scripts/parse_word_topic_table.py a visualization can be obtained. The <word-id-file> is in output/datablocks/${timestamp}/word_tf.txt.

Note2: The machine file defined in etc/params.config only works on cogito. And the whole pipeline assumes a shared filesystem.

About

Distributed LDA, takes raw text as input and outputs topic word table.

16

Stars

7

Forks

Watchers

Owner

da03

← Metadata

16

Stars

7

Forks

Watchers

Owner

da03

Metadata

Distributed LDA, takes raw text as input and outputs topic word table.

Back

lightlda lightlda copied to clipboard

Metadata

Light LDA

← Metadata

Owner

Metadata

lightlda
lightlda copied to clipboard