raid icon indicating copy to clipboard operation
raid copied to clipboard

Marking solutions that were trained on RAID train set

Open sergak0 opened this issue 8 months ago • 13 comments

Is your feature request related to a problem? Please describe. RAID is a great benchmark to measure on, but it has a big open-sourced train set on which everybody can train. And recently there even appeared solutions with 1.0 score, so now leaderboard already isn't a source of "who has a better ai-detection solution", but more telling who is better overfitted on the RAID.

Describe the solution you'd like It would be great if you could mark solutions that were trained on the RAID train data (and maybe even create two different leaderboards) to make it a source of "the most accurate ai-detector" information again. It can be done with: a) just asking in the metadata whether they used RAID train as a part of their dataset or not (but users may provide wrong info) b) adding into train data some watermarked data (for example with strange augs), that normal detector should not be triggered on, but detector that saw such data during training will deffinetly trigger on it (of course what certain specific aug you use should not be shared publicly). This approach can even be combined with the first one and be used to catch guys, who lies in there metadata with the following restrictions for them.

Describe alternatives you've considered I don't know about your plans on making RAID 2.0 version, but LLMs are rapidly developing and if you're going to make next iteration of RAID with updated models pool, augs, maybe even adjusted source of texts, it would be great not to publish train set. Therefore it will create a space for fair competition.

sergak0 avatar May 18 '25 08:05 sergak0

Yes I agree with all of this. It looks like performance is saturating on RAID quite quickly and I agree that overfitting is a really big issue here. I've been busy with a few EMNLP submissions but I intend to implement some of the quicker fixes for this soon (such as allowing for other metrics to be reported and attempting to separate out people who have trained on the data).

As for a potential RAID version 2.0 we have been discussing some ideas internally. Nothing to announce yet :)

liamdugan avatar May 19 '25 18:05 liamdugan

Thanks for the fast reply! Glad to hear that you already thought about it before and have plans on fixing it - separate leaderboards would be really great

sergak0 avatar May 19 '25 18:05 sergak0

Hi @liamdugan, hope you're doing well. Just wanted to ask how is it going with the fixes and when should we expect them to be released?

sergak0 avatar Jun 09 '25 12:06 sergak0

Yes! Sorry, I've been busy moving + starting a new job so I haven't gotten around to this yet. I'll be making these fixes around June 19th-ish. @sergak0

liamdugan avatar Jun 09 '25 16:06 liamdugan

Hi @liamdugan, how is it going with the updates?

sergak0 avatar Jul 02 '25 19:07 sergak0

@sergak0 We're done the initial implementation for toggling metrics (it's on branch toggle_metrics). We have TPR@1% FPR, and AUROC, but plan to support more in the future.

Right now I'm adding functionality to remove submissions that don't correctly meet the FPR thresholds and adding warnings to the evaluation script. Should be done in a day or two. Then we'll add tags for people having trained on RAID or not.

Should all be done within a week or two. Thanks for the continued patience.

liamdugan avatar Jul 07 '25 14:07 liamdugan

Hey @sergak0 I just pushed the update for toggling metrics. The updated cleared out some of the people at the top since their detectors did not actually meet the FPR threshold. Feel free to check it out. We also improved the efficiency of the leaderboard which should be quite interesting.

I'm going to email people who are on the leaderboard now and ask whether they've trained on RAID or not. There are a few extra design decisions that need to be made there (e.g. how much training constitutes "training on RAID"?) but I believe if we get a critical mass of respondents then we should be able to add that information to the leaderboard as well.

liamdugan avatar Jul 25 '25 17:07 liamdugan

Great! As for now I still can see one strange detector with 99.9% score, but others gone. Is AUC-ROC gonna be a new default metric or you'll set it back to TPR@FPR=5% as it was in paper? Also I think it'd be useful to add metric in url parameter as it is with other settings (https://raid-bench.xyz/leaderboard?domain=all&decoding=all&repetition=all&attack=none) for consistency.

sergak0 avatar Jul 30 '25 15:07 sergak0

Ah yes thank you, metric not being in the URL is an unfortunate oversight. I'll add that soon.

As for the default metric I didn't put much thought into it. AUROC makes sense as a default (so as not to privilege any one target FPR over another) but there's also an argument to be made for keeping TPR@FPR=5% as the default for consistency with the paper. Let me know if you have a strong preference either way.

liamdugan avatar Jul 30 '25 18:07 liamdugan

I don't have a strong preference, but I'd chose TPR@FPR=5%, because in addition to paper consistency it's also more interpretable (showing accuracy at certain level of fpr) than AUROC.

sergak0 avatar Jul 31 '25 15:07 sergak0

@liamdugan there is still one detector with 0.999 accuracy on the leaderboard, which seems really strange to me. Have you been able to get in touch with its team and ask about training on RAID?

sergak0 avatar Sep 17 '25 17:09 sergak0

@sergak0 I tried reaching out to them previously and got no response. I'll reach out again.

liamdugan avatar Sep 19 '25 14:09 liamdugan