Michael Ringgaard

Results 36 comments of Michael Ringgaard

The output looks correct. The silver-annotated Wikipedia documents are in train-*.rec and eval.rec. Together these contain all the Wikipedia articles. They are split into train and eval because I use...

From where did you get the impression that the silver annotations should be in `e/silver/en/silver-0000%d-of-00010.rec`? The silver annotations are in `local/data/e/silver/en/train-?????-of-00010.rec` and `local/data/e/silver/en/eval.rec`. You can take a look at the...

The problem seems to be that the `distantly_supervise.py` script expects the silver data to be indexed by QIDs but the silver pipeline assigns random keys in order to shuffle the...

There are basically two solutions: either take the train and eval files and reindex them, or make a new silver workflow that is compatible with the old mode. Let me...

With the Python script below you should be able to produce the silver-*.rec output that should be compatible with distantly_supervise.py: ``` import sling import sling.flags as flags import sling.log as...

Hmm... My test run seems to indicate that the script above does not read the stopword and blacklists correctly, resulting in many spammy annotations. Let me try to fix this.

Is there a stack trace below the "Check failed:" line?

The CHECK fault indicates that some invalid date is being processed. You could just comment out the CHECK in line 41 of calendar.cc. It would cause some invalid dates in...

I have updated the Python script above to include the configuration of stopwords and blacklists. The following lines were missing: ``` config = corpora.repository("data/wiki/" + language + "/silver.sling") mapper.attach_input("commons", wf.resource(config,...

Just as other deep models, the SLING parser training is "data hungry". It needs a fair amount of training data, and more data is better data! We are working on...