PyRedactKit icon indicating copy to clipboard operation
PyRedactKit copied to clipboard

[FR] Support for parsing text that is ANSI colored

Open proditis opened this issue 3 years ago • 0 comments

Checklist

  • [x] There are no similar reports on existing issues (including closed ones).
  • [x] I was in the master branch of the latest code.

Is your feature request related to a problem? Please describe

Not a problem, this is a feature/idea request/suggestion

The idea behind this is to be able to parse terminal output using ANSI escape codes. This is most often found on terminal captures, with tools such as asciinema.

Each letter is prefixed with an ANSI color code which makes matching entire words a bit difficult.

Describe the solution you'd like

I'd like to be able to redact these files, which include words and letters that are ANSI color escaped.

Describe alternatives you've considered

An alternative way to approach this would be to strip the color codes all together before processing. However, the result of that is not ideal since people do like their colors :smile:

Additional context

A sample of this output can be seen in the screenshot image

Notice that each character is colored differently which means each has its own ANSI color escape code before the character.

An example of this:

  • Consider the text ABC
  • We add a few ANSI escape sequences to make each of the letters styled differently \x1b[1;31mA\x1b[0mB\x1b[3mC
  • Displaying this will return something like image

This feature will allow RedactKit to strip the ansi codes and match the actual text behind it (ie ABC in our example)

proditis avatar Dec 15 '22 11:12 proditis