Michael Ringgaard comments

Results 36 comments of


                                            Michael Ringgaard

How to run the silver annotation pipeline

The output looks correct. The silver-annotated Wikipedia documents are in train-*.rec and eval.rec. Together these contain all the Wikipedia articles. They are split into train and eval because I use...

How to run the silver annotation pipeline

From where did you get the impression that the silver annotations should be in `e/silver/en/silver-0000%d-of-00010.rec`? The silver annotations are in `local/data/e/silver/en/train-?????-of-00010.rec` and `local/data/e/silver/en/eval.rec`. You can take a look at the...

How to run the silver annotation pipeline

The problem seems to be that the `distantly_supervise.py` script expects the silver data to be indexed by QIDs but the silver pipeline assigns random keys in order to shuffle the...

How to run the silver annotation pipeline

There are basically two solutions: either take the train and eval files and reindex them, or make a new silver workflow that is compatible with the old mode. Let me...

How to run the silver annotation pipeline

With the Python script below you should be able to produce the silver-*.rec output that should be compatible with distantly_supervise.py: ``` import sling import sling.flags as flags import sling.log as...

How to run the silver annotation pipeline

Hmm... My test run seems to indicate that the script above does not read the stopword and blacklists correctly, resulting in many spammy annotations. Let me try to fix this.

How to run the silver annotation pipeline

Is there a stack trace below the "Check failed:" line?

How to run the silver annotation pipeline

The CHECK fault indicates that some invalid date is being processed. You could just comment out the CHECK in line 41 of calendar.cc. It would cause some invalid dates in...

How to run the silver annotation pipeline

I have updated the Python script above to include the configuration of stopwords and blacklists. The following lines were missing: ``` config = corpora.repository("data/wiki/" + language + "/silver.sling") mapper.attach_input("commons", wf.resource(config,...

SLING with many frames on a small dataset

Just as other deep models, the SLING parser training is "data hungry". It needs a fair amount of training data, and more data is better data! We are working on...