Marking solutions that were trained on RAID train set
Is your feature request related to a problem? Please describe. RAID is a great benchmark to measure on, but it has a big open-sourced train set on which everybody can train. And recently there even appeared solutions with 1.0 score, so now leaderboard already isn't a source of "who has a better ai-detection solution", but more telling who is better overfitted on the RAID.
Describe the solution you'd like It would be great if you could mark solutions that were trained on the RAID train data (and maybe even create two different leaderboards) to make it a source of "the most accurate ai-detector" information again. It can be done with: a) just asking in the metadata whether they used RAID train as a part of their dataset or not (but users may provide wrong info) b) adding into train data some watermarked data (for example with strange augs), that normal detector should not be triggered on, but detector that saw such data during training will deffinetly trigger on it (of course what certain specific aug you use should not be shared publicly). This approach can even be combined with the first one and be used to catch guys, who lies in there metadata with the following restrictions for them.
Describe alternatives you've considered I don't know about your plans on making RAID 2.0 version, but LLMs are rapidly developing and if you're going to make next iteration of RAID with updated models pool, augs, maybe even adjusted source of texts, it would be great not to publish train set. Therefore it will create a space for fair competition.
Yes I agree with all of this. It looks like performance is saturating on RAID quite quickly and I agree that overfitting is a really big issue here. I've been busy with a few EMNLP submissions but I intend to implement some of the quicker fixes for this soon (such as allowing for other metrics to be reported and attempting to separate out people who have trained on the data).
As for a potential RAID version 2.0 we have been discussing some ideas internally. Nothing to announce yet :)
Thanks for the fast reply! Glad to hear that you already thought about it before and have plans on fixing it - separate leaderboards would be really great
Hi @liamdugan, hope you're doing well. Just wanted to ask how is it going with the fixes and when should we expect them to be released?
Yes! Sorry, I've been busy moving + starting a new job so I haven't gotten around to this yet. I'll be making these fixes around June 19th-ish. @sergak0
Hi @liamdugan, how is it going with the updates?
@sergak0 We're done the initial implementation for toggling metrics (it's on branch toggle_metrics). We have TPR@1% FPR, and AUROC, but plan to support more in the future.
Right now I'm adding functionality to remove submissions that don't correctly meet the FPR thresholds and adding warnings to the evaluation script. Should be done in a day or two. Then we'll add tags for people having trained on RAID or not.
Should all be done within a week or two. Thanks for the continued patience.
Hey @sergak0 I just pushed the update for toggling metrics. The updated cleared out some of the people at the top since their detectors did not actually meet the FPR threshold. Feel free to check it out. We also improved the efficiency of the leaderboard which should be quite interesting.
I'm going to email people who are on the leaderboard now and ask whether they've trained on RAID or not. There are a few extra design decisions that need to be made there (e.g. how much training constitutes "training on RAID"?) but I believe if we get a critical mass of respondents then we should be able to add that information to the leaderboard as well.
Great! As for now I still can see one strange detector with 99.9% score, but others gone. Is AUC-ROC gonna be a new default metric or you'll set it back to TPR@FPR=5% as it was in paper? Also I think it'd be useful to add metric in url parameter as it is with other settings (https://raid-bench.xyz/leaderboard?domain=all&decoding=all&repetition=all&attack=none) for consistency.
Ah yes thank you, metric not being in the URL is an unfortunate oversight. I'll add that soon.
As for the default metric I didn't put much thought into it. AUROC makes sense as a default (so as not to privilege any one target FPR over another) but there's also an argument to be made for keeping TPR@FPR=5% as the default for consistency with the paper. Let me know if you have a strong preference either way.
I don't have a strong preference, but I'd chose TPR@FPR=5%, because in addition to paper consistency it's also more interpretable (showing accuracy at certain level of fpr) than AUROC.
@liamdugan there is still one detector with 0.999 accuracy on the leaderboard, which seems really strange to me. Have you been able to get in touch with its team and ask about training on RAID?
@sergak0 I tried reaching out to them previously and got no response. I'll reach out again.