Andreas van Cranenburgh issues

Results 33 issues of


                                            Andreas van Cranenburgh

Host this material?

Hi Allen, I refer to this material in my courses, but unfortunately the links are broken, e.g.: https://de.dariah.eu/tatom/visualizing_trends.html I mailed the DARIAH team about this several times, and their response...

treesearch web interface wish list

- [ ] hierarchical subcorpus selection; handle corpora with large number of sections - [ ] query cancellation: pressing stop in browser should cancel the query. - [ ] pagination:...

create wheels

https://github.com/explosion/wheelwright

replace multiprocessing.Pool

multiprocessing pools work fine unless any kind of error condition arises... - [ ] properly detect segmentation faults, out of memory, &c. `concurrent.futures` does this, but doesn't take an `initializer`...

efficiency

robin hood hash table

Tessil/ordered-map might be a better trade off than spp::sparse_hash.

incompatibility with setuptools / easy_install

When these are installed, the installed script is wrong: ``` $ cat `which discodop` #!/usr/bin/python3 # EASY-INSTALL-SCRIPT: 'disco-dop==0.5rc1','discodop' __requires__ = 'disco-dop==0.5rc1' __import__('pkg_resources').run_script('disco-dop==0.5rc1', 'discodop') ``` The workaround is to remove these...

operations on Tree objects may exceed maximum recursion depth

e.g., a pathological sentence with >1000 words will be too deep to recurse when binarized. - Any function that directly recurs on the children of a tree is affected, as...

bug

inefficiency of treesearch engines

- tgrep2: generally fast, but loads corpus at every invocation, and always returns an exhaustive list of all matches; no support for discontinuous constituents. - xpath / alpinocorpus: memory hungry,...

efficiency

Re-implement NLTK tree

Would allow a potentially significant speedup for treebank transformations and grammar extraction. Wishlist: - represent all treebank information: functions, morphology, lemmas, &c. - combine indices and words in one datastructure...

efficiency

Coreference in CoNLL output

I'm trying to run the dcoref system on a plain text file and want to get the output in CoNLL 2012 format. I've tried several things: ``` $ ./corenlp.sh -annotators...