express feature: add a GitHub action to quell spam PRs

Problem

I was scrolling through Twitter and someone posted about the spam PR.

Possible Solution

Implement this GitHub action. It'll lessen the workload for maintainers.

Feb 06 '24 17:02 CBID2

I am not sure we need an action for this unless there is an action which does more than that one (like spam detection with ai or something). It is easy to delete/close/lock with as many clicks as it is to comment and we don't need an additional third party action (added security risk and maintenance) to do it right?

Feb 06 '24 17:02 wesleytodd

It's kinda scary that, Those idiots spamming PR's

Feb 06 '24 18:02 iammohan01

I agree and we are actively removing their comments and blocking them. I wish that work was not necessary, and I appreciate y'all working to help with suggestions and feedback. We used to have a pretty active triage team in place, maybe we can revive that to be more active, and if we do you all are welcome to help!

Feb 06 '24 18:02 wesleytodd

unless there is an action which does more than that one (like spam detection with ai or something)

Not sure about AI spam detection, but almost all of these PRs update the readme, have default name (Update filename), change only a single line and have no description, so it should be rather easy to close them automatically.

Most authors of these PRs have a repsitory called "localrepo", so that's another rule that could be used to detect spam from this source (Apna College).

Feb 06 '24 18:02 krzysdz

Yeah this is a one-off issue of the day. Spam PRs have been a problem for years and they come in different varieties. This is why I am hesitant do add an action for this as I would rather ask GitHub for better spam management tooling.

Feb 06 '24 18:02 wesleytodd

I reported a couple users already for spam. But this is just mopping with the tap wide open. I hate how this impacts contributors their time.

Feb 06 '24 19:02 TimGels

Hey @wesleytodd. I found this on X: https://twitter.com/github/status/1311772722234560517 Maybe that could work?

Feb 06 '24 20:02 CBID2

This is why I am hesitant do add an action for this as I would rather ask GitHub for better spam management tooling.

Agree with @wesleytodd. The issue is quite complex because the moderation in GitHub is not an easy thing:

Many people (those that are watching the repo) will receive a notification in their email or in the app/web once a new PR is created. This will occur even before a Github Action is triggered.
Closing PRs is an easy action (1 click), so that is not a big time consumer
Detecting invalid PRs is not the big issue because you can easily to spot them with the practice.
Reporting issues/users is complex because the UI requires many steps to do it for each user, so that is a blocker for many maintainers.
Nuke button like @CBID2 suggested is great when the community needs to slow down due a discussion/flame in certain moments, but is not the long term solution as it will prevent other users from contributing (PRs) that are legit or need help (issues) while using Express.

So, I think that we are quite limited on how much we can do with GitHub actions in this case. There are other scenarios that are less frequent but more prone to use GitHub Actions to moderate, for example when people do comments that include offensive content or heavily language. In most projects that I am involved the moderation is done by the humans behind the project or a specific team that volunteer to do it, it is a heavy job. The same way as it is hard to keep a slack/discord/gitter community channel a safe space by moderating content.

Feb 06 '24 20:02 UlisesGascon

In most projects that I am involved the moderation is done by the humans behind the project or a specific team that volunteer to do it, it is a heavy job. The same way as it is hard to keep a slack/discord/gitter community channel a safe space by moderating content.

I think historically the express project has not needed the same style of moderation as things like Node.js does. There have been less contentious discussions and mostly the CoC violations have come from folks outside the project so it was relatively simple to ban and move on. I don't think we need to immediately spin up a moderation team but I do think that the Triage team and TC should have the tools to properly moderate. Right now I don't think we have that well in hand. I believe that we can add this to the list of TODOs to address after we can get next weeks TC meeting organized and finished.

Are we in agreement that a GH Action is most likely not the direction we would want to take to solve this problem?

Feb 06 '24 20:02 wesleytodd

I believe that we can add this to the list of TODOs to address after we can get next weeks TC meeting organized and finished.

Yes, we can add it to the TODO list and start to work on it in few weeks

Are we in agreement that a GH Action is most likely not the direction we would want to take to solve this problem?

+1 from me

Feb 06 '24 20:02 UlisesGascon

Maybe a pull reqest template would make some people think before creating a PR? On the other hand, it doesn't look like those who spam with pull requests will even bother to read it and they'll probably just click "Create pull reqest" with unedited description.

Feb 06 '24 20:02 krzysdz

I agree and we are actively removing their comments and blocking them. I wish that work was not necessary, and I appreciate y'all working to help with suggestions and feedback. We used to have a pretty active triage team in place, maybe we can revive that to be more active, and if we do you all are welcome to help!

How can we join the triage team?

Feb 06 '24 21:02 QuantGeekDev

How can we join the triage team?

https://github.com/expressjs/express/blob/master/Contributing.md#becoming-a-triager

Feb 06 '24 21:02 krzysdz

Doesn't need to stop the conversation, but since we specifically don't want to use a GHA to do this then this issue is complete. There are more threads to follow up on, but I think we can do that in other discussions more specific to those.

Feb 06 '24 21:02 wesleytodd

A workflow could be an option too, These spam PRs generally don't have more than 2-3 words. Closing PRs with less than 3 words sounds reasonable. GitHub too does something similar in it's documentation repository. GitHub's workflow

Feb 07 '24 15:02 gaurishhs

A workflow could be an option too, These spam PRs generally don't have more than 2-3 words. Closing PRs with less than 3 words sounds reasonable. GitHub too does something similar in it's documentation repository. GitHub's workflow

@wesleytodd, I think you should reopen this issue. @gaurishhs made another point about using GitHub actions

Feb 07 '24 17:02 CBID2

I do agree that workflow is a bit more well suited IMO. I am still hesitant and I would like to also look at other ways but yeah in the mean time lets re-open the issue so we don't end up having multiple on the same topic or miss out on good ideas like @gaurishhs'.

Feb 07 '24 17:02 wesleytodd

even one more better thing is maximum spam pr revolve around README.md ,if we can specify to block those pr updating readme for now (upto the spam pr's get less) we can handle this through github actions for this so that good pr's may not effect.

Feb 07 '24 17:02 deepak4566

Is it possible to have the workflow also incorporate a check for new contributors when doing the x words check? I feel like that people who contributed, and already have code merged, in the past do not pose much of a threat in regards to spam. Or am I overthinking it?

Feb 07 '24 17:02 TimGels

Is it possible to have the workflow also incorporate a check for new contributors when doing the x words check? I feel like that people who contributed, and already have code merged, in the past do not pose much of a threat in regards to spam. Or am I overthinking it?

I don't think you're overthinking it @TimGels. Previous contributors should be spared in some way.

Feb 07 '24 18:02 CBID2

I think the workflow should be something like this:

The expressjs member check can be omitted / replaced with a collaborator check

Feb 07 '24 18:02 gaurishhs

I think the workflow should be something like this:

The expressjs member check can be omitted / replaced with a collaborator check

I don't think word count matters because some bugs only requires a minimal changes.

Feb 07 '24 20:02 Pratik-Kumar-621

And if word count check is added, they must find some other way to open spam prs. The problem here is those who make spam prs, they just started their development journey and when they got introduced to github (by some youtubers or some articles), they tried to test it by themselves. They didn't have any idea that the thing they are doing is a headace of someone else.

The only way we can stop this is by creating awarness.

Feb 07 '24 20:02 Pratik-Kumar-621

And if word count check is added, they must find some other way to open spam prs. The problem here is those who make spam prs, they just started their development journey and when they got introduced to github (by some youtubers or some articles), they tried to test it by themselves. They didn't have any idea that the thing they are doing is a headace of someone else.

The only way we can stop this is by creating awarness.

True, but knowing that there those who create spam PRs out of malice, raising awareness is not enough. Protective measures are a must too.

Feb 07 '24 22:02 CBID2

And if word count check is added, they must find some other way to open spam prs. The problem here is those who make spam prs, they just started their development journey and when they got introduced to github (by some youtubers or some articles), they tried to test it by themselves. They didn't have any idea that the thing they are doing is a headace of someone else. The only way we can stop this is by creating awarness.

True, but knowing that there those who create spam PRs out of malice, raising awareness is not enough. Protective measures are a must too.

There are Rest APIs for getting the info of prs and also for actions. Making a bot which can automate spam pr closing using those rest apis can help. But not sure how to detect spam.

Feb 07 '24 23:02 Pratik-Kumar-621

Hello! While I may not possess your level of genius, I've got an idea. Can we utilize NLP in Python code to detect significant changes in a Readme file solely through a GitHub Action?

Feb 08 '24 02:02 JavidSumra

I think we should try my suggestion for now and then do more afterwards. We must take some form of action quickly

Feb 08 '24 04:02 CBID2

Hello team! A quick action could be to limit who can submit a PR while there is no action to filter PR spamming.

github provide a settings to limit temporarly the submitted PR for a determined duration (maybe one week) for an external contributor: https://twitter.com/github/status/1311772722234560517 https://github.com/orgs/community/discussions/22804 (in this case only express team member can submit a PR)

I know isn’t aligned with open source phylosophy but it can stop this PR spamming cycle quickly while waiting for a real solution. Maybe when they see that it no longer works, the followers of the video will ask the YouTuber for explanations :)

Another solution, is to check if a PR have a commit related to an open issue on the same repo. If not the github action xill close automatically the PR.

see you

Feb 08 '24 06:02 Romakita

I really feel pensive seeing all these PRs come from literally "tech youtubers" who don't use basic common senses before uploading a video, I apologize on their behalf 🙏 .

Feb 08 '24 16:02 Kishlay-notabot

spam detection with ai

This seems like a pretty cool project idea, someone should DEFINITELY put something like this into place to help with spam hell for open-source maintainers.

Feb 08 '24 17:02 CompeyDev