operationcode-pybot icon indicating copy to clipboard operation
operationcode-pybot copied to clipboard

Auto-Reply Functionality for Unformatted Code

Open togakangaroo opened this issue 5 years ago • 6 comments

Problem:

We tend to have people posting often who don't know the three main ways to format code on slack. A bot to help them do this properly would save everyone some time and annoyance. Plus it should be fun to code.

This bot will activate when there is a post where a subset of the lines contains something that looks like code but is not formatted as such

So how to determine if something looks like code? Two possible approaches.

  1. Heuristically. There are libraries out there that auto-detect language. Presumably some of the same processes can be used to detect whether something is code at all. Example of something that does this: highlight.js

  2. ML. This is actually a pretty decent use case for something like a tensorflow classifier. And we could train it off of the actual operation code logs!

togakangaroo avatar May 08 '20 17:05 togakangaroo

I bet if you do it as a standalone bot, other slacks would love something like that. Good OSS project overall

togakangaroo avatar May 08 '20 17:05 togakangaroo

Pygments might be a good fit since we're already mostly python:

https://pygments.org/docs/api/#pygments.lexers.guess_lexer

LivingInSyn avatar May 08 '20 17:05 LivingInSyn

I'd like to give this a shot!

jasonappah avatar Oct 07 '20 19:10 jasonappah

Go for it @jasonappah! Happy to see what you come up with.

aaron-junot avatar Oct 07 '20 21:10 aaron-junot

I think this could be an interesting project to add into the rewrite. Looking at this example for Discourse (code here) I think we could translate this over to Python to fit into the bot. Alternatively we could just use the project in Javascript and have it run next to the bot.

This is really a great opportunity for a Machine Learning project, but unfortunately for our situation I don't think that's the route we want to go. It would entail a good bit more infrastructure setup in order to have the model in production to be queryable - we could do something like train the model and just run it on the machine we use for the bot but even that would require a good bit more processing power/more powerful machine than we are currently using. I think the Regex based approach in the Discourse repository is the place to start.

JudsonStevens avatar Jan 01 '22 15:01 JudsonStevens

Also, I tried Pygments and it was failing on some of the simpler examples I gave it, just detecting text - for example, this piece of text: I am using these codes before it .Embed_length = 25 model = Sequential() model.add(Embedding(vocab_size, Embed_length, input_length=1000)) model.add(SpatialDropout1D(0.2)) model.add(LSTM(10, dropout=0.5, recurrent_dropout=0.5)) parsed as <pygments.lexers.TextLexer>. I only tried it a couple times but it definitely seemed to default to text quite often.

JudsonStevens avatar Jan 01 '22 15:01 JudsonStevens