SubredditSimulator
SubredditSimulator copied to clipboard
Many generated sentences contain unbalanced punctuation/markdown
markovify actually throws out any sentences including quotes, parentheses or square brackets by default because they tend to end up unbalanced in the generated sentences. I overrode that behavior because it was removing a huge number of sentences from the training, like almost every single title in /r/relationships and most comments from /r/scenesfromahat. But by doing that I've ended up with the result it was trying to avoid - a lot of unmatched ones in the output.
Main things to try to fix with this:
- Quotes - both double-quotes and single-quotes (need to distinguish from apostrophes)
- Parentheses
- Square brackets (especially as markdown link text)
- Asterisks being used for bold and italic markdown