Implement feature "revision.is_patrolled"
Given a revision id, "revision.is_patrolled" would return True if there is a log entry saying that this edit was patrolled by some user, and False otherwise.
I believe this would mostly solve Nemo's concerns about reusing the data already provided by recent changes patrollers (on wikis where this MW feature is enabled): https://meta.wikimedia.org/w/index.php?title=Grants_talk:IEG/Revision_scoring_as_a_service&oldid=10089505#Existing_tool
A query like https://pt.wikipedia.org/w/api.php?action=query&list=recentchanges&rcprop=patrolled returns the attribute for the recent changes. For older revisions, the only way I know to check if a given revision was patrolled or not, is to look at each log entry: https://pt.wikipedia.org/w/api.php?action=query&list=logevents&leprop=details&letype=patrol (at least until T92018 is fixed)
Considering that
- Users patrolling recent changes (are instructed to)
- mark good revisions as patrolled
- mark bad revisions as patrolled as soon as the problems are fixed
- The fact that a revision is unpatrolled might be due to:
- All patrollers who saw that revision not being sure if it was good/bad, or how to deal with it
- No one noticing the revision in the recent changes feed (e.g. in a low activity period)
- A revision which is getting old and is still unpatrolled seems more likely to be "good" than "bad" (or at least, more likely to be "bad" than "really bad"), because readers who find an article with really bad content tend to report the problem, or to fix it directly (and in both cases, it would get patrolled, ideally).
I think having a feature like this for training a machine learning algorithm might incorporate useful information, even if only for relatively old revisions (which are of interest in use cases such as "get a list of likely bad edits which were not reverted so that I can review them")
Assign to me. Claiming task. :-)
I don't think that this feature is a good idea. It's not a stable characteristic of the edit. If we're going to put this anywhere, it should be in the autolabel utility in https://github.com/wikimedia/editquality.
Essentially the idea is that if an edit is not reverted and it is patrolled, we can conclude that it is a good edit (not damaging) saved in goodfaith. Does that sound right?
Yeah, that makes sense.