Confusion about the dataset
Hello,
Thanks for your dataset and code! I see #Docs is much smaller than #Events from Table 1, indicating that a document can contain multiple events. So is there a clear boundary between these events, that is, whether different events under the same document will share arguments?
In addition, I found that the doc_key of each instance in the jsonlines is unique. How do you count the number of documents (3194,399 and 400)?
Any help would be great.

As you've noticed, given the stats, there are documents with multiple events. In those cases, there's a good chance an argument or two will be shared across events. However, determining the amount of argument overlap would require combining the examples back into full documents, which is a bit tricky.
The number of documents (top row in the table) is the number of unique source document URLs. That is, it's the number of documents that were then processed to create the individual examples.
We added a script to generate the numbers in the table in https://github.com/pitrack/arglinking/pull/11.