Michael Heilman
Michael Heilman
The NLTK tokenizer used in the code doesn't handle fancy quotation marks very well. They just end up attached to words rather than being separate tokens. We should probably either...
Some of the regular expressions are a bit unnecessarily complicated (e.g., including extraneous instances of `.*`), and in some cases, perhaps `str.startswith` could be used instead of `re.search`.
Currently, the code just uses `logging.info`, `logging.warning`, etc. to record log messages. It would be better to instantiate one logger for the module, or logging modules for each class, etc....
Currently, `convert_rst_discourse_tb.py` uses NLTK's POS tagger to create flat trees for sentences that are in the RST treebank but not the Penn Treebank. This dependency should eventually be removed and...
We need some methods/scripts to evaluate parsing performance. We probably want to do two things: a) replicate previous work that uses parseval so that we can easily report previous results...
The code says it implements the version of the LSTM from Graves et al. (2013), which I assume is this http://www.cs.toronto.edu/~graves/icassp_2013.pdf or http://www.cs.toronto.edu/~graves/asru_2013.pdf. However, it looks like the LSTM equations...
In class_lm_cluster.compute_weight(), if two words don't occur by each other (i.e., paircount == 0), then the function returns 0.0 for the weight. Is this the appropriate behavior, given that it...
It'd be nice to use range headers to avoid re-downloading already-downloaded bytes when an S3 connection error happens. See #273.
It'd be nice to have a progress bar show up for `civis files upload` and `civis files download`. possibilities: * https://pypi.python.org/pypi/tqdm * https://pypi.python.org/pypi/progress * https://pypi.python.org/pypi/progressbar2 * https://stackoverflow.com/questions/3173320/text-progress-bar-in-the-console
It'd be nice to have the ability to do multioutput regression.