Open-Assistant Reduce ability for users to register with fake email accounts

This can be solved in two steps:

[ ] Add an environment variable we can set during deployment that disables email login.
[ ] Add a ReCaptcha when user is logging in via email

Jan 24 '23 06:01 fozziethebeat

Although registering with "fake email accounts" might truly be a serious problem, requiring all users to authenticate via Oauth using accounts strongly tied with their identity is a huge privacy drawback. For example, perhaps there are high-quality volunteers that prefer to use a secondary email (myself included) for logging in.

Perhaps you mean to disable email registration from new users? Then, at least, administrators could still manually create email-based accounts for known-trustworthy volunteers who wish to have them.

Add a ReCaptcha when user is logging in via email

reCAPTCHA actively discriminates against VPN and Tor users by marking their correct solutions as invalid several times in a row and slowing down image reveals in a way that only hurts humans, not bots. Please consider hCaptcha instead; the host would even be able to earn some rewards for the captcha solving work its users are doing.

Jan 25 '23 07:01 mashdragon

Another alternative is Cloudfare captcha.

Jan 25 '23 08:01 notmd

This issue is to guard against a worst case scenario where the data collection platform is being flooded by malicious bots.

In that worst case scenario pretty much all good users will be negatively affected in someway. Temporarily turning off email login access while dealing with a bot attack seems reasonable as long as it's temporary.

Regarding captchas that guard against attacks, I'm open to any solution. hCaptcha or Cloudfare Captcha both seem like good alternatives.

If someone wants to setup a captcha that'd be awesome

Jan 25 '23 08:01 fozziethebeat

Let's go with Cloudfare. I will send a PR soon.

Jan 25 '23 08:01 notmd

Let's go with Cloudfare. I will send a PR soon.

Cloudflare would be good except that it does not function very consistently over VPN or Tor. It gives me HTTP 400 responses and refreshes infinitely. I would be really sad if it took a lot of work for me to log in.

Edit: If you know of a page that uses Cloudflare captcha (I don't remember off the top of my head) I can try to visit it and relay what the experience is like.

Jan 25 '23 08:01 mashdragon

Is there a reason you need to log into Open Assistant to contribute data via VPN or Tor?

Jan 25 '23 08:01 fozziethebeat

Let's go with Cloudfare. I will send a PR soon.

Cloudflare would be good except that it does not function very consistently over VPN or Tor. It gives me HTTP 400 responses and refreshes infinitely. I would be really sad if it took a lot of work for me to log in.

Edit: If you know of a page that uses Cloudflare captcha (I don't remember off the top of my head) I can try to visit it and relay what the experience is like.

There is a demo here https://demo.turnstile.workers.dev/. Does it work for you?

Jan 25 '23 08:01 notmd

I have a strong personal preference of being anonymous wherever possible, enough that I would not contribute if it was too difficult for me to stay anonymous.

@notmd That seems to work for me. I guess we should go ahead with Cloudflare, then. There are other Cloudflare pages where the captcha box is part of a full-screen page and a proof-of-work puzzle is issued, and those do not load for me. The "Cloudflare box" in your example looks similar to those, so I have to hope it's a different system.

Thank you for listening to my feedback!

Jan 25 '23 08:01 mashdragon

For the time being we'll do our best to support your log in strategy, but if we see a lot of malicious users, we can't guarantee you'll be able to log in during an attack. Hopefully that never happens.

Jan 25 '23 08:01 fozziethebeat

Thanks for bringing up the issue of fake email accounts being used to register on our platform, @fozziethebeat. We understand the importance of safeguarding our data collection platform from malicious bots, and appreciate your suggestions for implementing a solution.

We have taken note of the concerns raised by @mashdragon regarding privacy implications of requiring OAuth authentication and the potential impact on high-quality volunteers who prefer to use secondary email accounts for logging in.

After careful consideration, we have decided to implement Cloudflare's captcha as a solution for this issue. The team member @notmd has already submitted a pull request for this, and we will closely monitor its performance and effectiveness in preventing fake email account registrations.

We also want to assure users like @mashdragon who prefer to remain anonymous while contributing data, that we will do our best to support their log in strategy. However, in the event of a large-scale bot attack, we may temporarily disable email login access in order to safeguard the platform. We hope that such a scenario never arises and we will always try to minimize any negative impact on legitimate users.

Additionally, we would like to invite users who face any difficulties with Cloudflare's captcha to reach out to us and provide feedback so that we can continue to improve the user experience. We also opened another issue regarding this matter, you can find it by the number #932.

Thanks for your continued support and contributions to the Open-Assistant project.

Jan 26 '23 18:01 hemangjoshi37a