Auto-Split implemented text recognition (ocr)

I implemented another comparison method based on OCR.

This could be a useful addition in cases where modern game rendering and visual effects (clutter) cause difficulties to find good comparison images.

It currently depends on pytesseract and Tesseract-OCR but tests with EasyOCR have also been conducted. Both seem to get similar good recognition results. EasyOCR looks like to cause higher CPU load then tesseract. Tesseract on the other hand is an external dependency that needs to be installed seperatly.

The text comparison of the expected and recognized string has two modes. A perfect 1:1 match or the levenshtein ratio.

I also introduced two new file config options:

Rectangle position (only used for text files)
FPS limit per text or image file

Please let me know what you think of this feature.

Jan 17 '24 20:01 ston1th

I'll look into the code changes after I come back from GDQ, but I love the idea of a comparison method that specializes in text comparison/recognition.

Is the per-file FPS limit necessary to the implementation? Could you split it into a different PR?

Idk about the rectangle position option, but maybe it'll make sense once I give the implementation a proper look.

Jan 18 '24 07:01 Avasam

Hi @Avasam first of all have fun and good luck at GDQ.

To your question: yes I find the FPS limit necessary to not max out CPU usage too much. I included a note in the README regarding this. A quick FYI:

Note: This method can cause high CPU usage at the standard comparison FPS. You should therefor limit the comparison FPS when you use this method to 1 or 2 FPS using the limit option !1! in the file name. The size of the selected rectangle can also impact the CPU load (bigger = more CPU load).

Jan 18 '24 08:01 ston1th

Heya! Div2 Content Creator here. I may or may not have inspired this OCR-method implementation after finding and falling in love with this Autosplitter. :D

As for the user-perspective regarding Div2: Div2 has a short mission-description for most Checkpoints in missions, and when doing activities and other stuff it shows a 5-8 second pop-up with Text. The text is completely white, and using that with the current methods causes a false-positive when blinded by a flashbang (white screen). If one uses the shadow of the text (or puts black pixels where shadow is supposed to be), there is a somewhat working threshold difference to not trigger with flashbang - but depending on weather of open world it still gets false positives here and there. (and lowering threshold below 96 will sometimes not split in those 5-8 sec).

With the OCR this issue would probably be solved, and better yet: There's different activities and i would love to split "complete 5 activities" for example. With Text-pop-ups like "Broadcast Restored", "All Hostages Saved" and "Perimeter Secured" one could scan for the "ed" at the end and properly split those, while keeping "Watch Level up" and other false-positives away. :)

Hopefully info from the user-perspective is helpful here as well, if not ignore my comment :D

Enjoy GDQ and looking forward to testing the new method if it gets approved =)

Jan 19 '24 18:01 realRammbob

Hello again - not meaning to stress, but i'm really, really looking forward to using this method for a variety of auto-split scenarios. Any update on regarding looking at the code? :)

Jan 29 '24 17:01 realRammbob

@Avasam the README.md file is missing in the allowed paths in the lint-and-build.yml action and thus blocking the build. Could you add it please?

Feb 04 '24 10:02 ston1th

lint-and-build.yml

It's not a "allowed path", it's a "trigger the build on changing these files". README doesn't need to trigger lint/type/build checks.

It's just that workflow requires approval. You can run locally using the scripts/lint.ps1 script (or running the commands found inside individually)

Feb 04 '24 19:02 Avasam

Hello again :) Just wanted to drop in with a huge thank you for this! I went ahead and tested this version today, and it went great for the most part!

For Division 2 it was way easier to set up the splits for autosplitting - tho i made some human errors along the way. Most notably i could now autosplit events that could either be succesful or fail (only difference being in the text), as well as have one file to basically split all the random activities that i run during the randomizer-thing i'm doing :) (If you want to take a look at how the Autosplitter did during a Countdown run with the new method, you can see that here )

Regarding the FPS-limit and Tesseract: Even if the FPS-limit is high, it won't go very high due to the method. If the box i choose in the file is huge (like 1080p fullscreen), it takes way longer than 1 second until the next image is processed. Optimizing this was quite easier as a user than getting optimal image data with paint.net for me, but it also opened up a question for me:

During missions, a new objective pops up in the middle of the screen for short, travels to top (still centered), stays still shortly, then travels to the left where the other objectives are listed and stays there until it's done. To have it properly auto-split, one would need it to split when centered - will do tests if it does that 100% of the time later.

There is a trade-off i think:

Only check where it pops up in center: Best FPS-performance, best split-accuracy, highest risk of no split
Only check the vertical part: FPS-sacrifice, long time-frame for low risk of no split, somewhat good split-accuracy still
Check the whole rectangle where the text could be at any given time: Worst FPS, worst split-accuracy, almost impossible to not split I tried to visualize it here:

Is there any interest or does it even make sense for me to test stuff like this and report on it here?

Feb 05 '24 03:02 realRammbob

@realRammbob

Is there any interest or does it even make sense for me to test stuff like this and report on it here?

I value your input as one of the few known possible users of this comparison method.

For instance, what's you thoughts on this partial text matching? https://github.com/Toufool/AutoSplit/pull/272#discussion_r1477433664

And I've been thinking on it, I think I could use this for Pitfall hundo. Where we split on what is the equivalent of a menu shop (buying abilities from shaman). Image comparisons was never quite able to hold up for that specific category. So maybe I too will use this feature :D (which is good, it'll mean I can catch issues more easily)

Regarding the FPS-limit and Tesseract: Even if the FPS-limit is high, it won't go very high due to the method.

The provided comparison per second (FPS) limiter is to help not max out the CPU if you have one powerful enough to do 2-3 comparisons per second. If you're already stuck at 1 or less, it indeed doesn't do much.

There is a trade-off i think [...]

Indeed, and it's exactly as you identified. You'll have to balance this yourself as the user. I have a few ideas to improve performance and accuracy, but these will have to come later:

reducing the amount of possible characters by letting the user provide a list of possible characters to be found on screen
let advanced users provide their own trained models for a game's custom fonts
Asynchronous parallel comparisons (max out your CPU on multiple cores :P) dependant on #219 which has started in #271

Feb 05 '24 17:02 Avasam

Glad to hear! =) I will continue to test stuff and give my ideas here then :)

For instance, what's you thoughts on this partial text matching? https://github.com/Toufool/AutoSplit/pull/272#discussion_r1477433664

As a user i would try to identify text that reappears and want the autosplitter to identify it. Lets say in RDO i want to start Livesplit when a mission starts, and bounties show no indication but the mission objective text at the bottom of the screen.

There can be a "go to the named area" or "Capture target name" descriptions for example, so i would check for "go to the" and "capture". If we now differentiate that "Capture the Capturer's right hand" is no match, it'll have a very low Levenshtein ratio and not split. Having it output a 1 when the string is found is very good here then.

Putting all possible areas and targets into different comparison strings to check for all would be overkill for the user. And if "the Capturer" name causes an exception like described above, a workaround to get a match would to add that specific exception as another comparison string (which is inconventient, but works).

I can't think of a case where it would be bad to have it output 1 when the string is found - cause i can always lengthen the comparison-string to get rid of false-positives.

Disregarding that, it may be useful to have some regex or other form of variables in the string in the future? For example if the text was always like "Bring target name to named area.", one needed to rely on "bring" or "to" which isn't very safe regarding false positives (especially if "Kill Tofu" will trigger with to, or "Kill Harbringer" will trigger with "bring").

However, if one could compare to smth like "Bring * to *." (with * meaning it ignores all letters until it arrives at the ' to '), it should work properly. Regarding this idea, ston1th mentioned that regex could only do exact matches, no fuzzy matching.

So maybe if the user inputs a regular string, have it fuzzy match. And if the user puts in an advanced regex string or smth, have it compare like that? Not sure if thats something that can be implemented 😅

reducing the amount of possible characters by letting the user provide a list of possible characters to be found on screen

Regarding this... as of now, it tries to read all text in the rectangle, compares that string to the user-given string and outputs 1 if the comparison-string was found. If not, Levenshtein ratio - correct? Would it be possible to improve the performance by altering the comparison to only look for the characters given in the string? I'm pretty certain tho, that will mess with the Levenshtein ratio being a somewhat useful output, and care needs to be taken that it won't cause unexpected False-Positives.

Example Looking for "bring" Text on screen states "New objective: Secure the area." I assume it would read "n bi r r" or "n bi: r r." - No match.

Different example Looking for "bring" Text on screen states "Bring the Harbringer to Final Destination." I assume it would read "bring rbringr in inin" - match

False-Positive construed example Looking for "binge" Text on screen states "The Harbringer appears" I assume it would read "e binge e" - match (false-positive)

I assume it would be very hard to foresee the false-positives happening as a user and not sure how much performance would be saved, but wanted to mention it regardless 😋

The possibility of using multiple cores sounds like a very good thing tho, if possible! 👍

Feb 05 '24 18:02 realRammbob

So i did some further testing. Autosplit runs on my Laptop, mostly during streaming (which means laptop gets Video from Capture card, encodes and uploads 2 Streams). Autosplit with OCR runs on top of that and for now it worked without issue.

Without the Streaming stuff running, i tested the regions of the above picture. Here's the results: Green rectangle: Mostly 4 FPS, range 4-5 Yellow rectangle: Mostly 3 and 4 FPS, range 3-5 Red rectangle: Mostly 2 and 3 FPS, range 2-4

I also tested the consistency in Div2 when risking it with the green rectangle (can be seen here): It never failed to split once.

Then i tested using the {b} thingy with OCR, splitting when below a threshold (Checking for "Heroic" at top left disappearing, here. That would actually be the cleanest moment to split the start of a mission): Tesseract seems to sometimes recognize some wrong letters or too few letters, resulting in singular drops of the threshold. While it should maintain 1.00, during several checks it drops down to 0.77 and 0.42. With a threshold low enough (0.3) i think nothing bad happened, but i think there might be a need for special solution in order here. (didnt test a threshold of 0.00 tho, which should also work when expecting text to disappear)

I suggest there should be some kind of approximation or tolerance, like "if threshold is below that 3 times, then split" or "take average of last 5 checks". While that will probably fix the issue, it may also cause a delay in the split-timing. Something like "if below threshold, then split. And if the next 4 of 6 checks are 1.00, then undo the split" would avoid split-timing issue, but feels very janky (with the opportunity of being too fast inputs for Livesplit, or triggering some custom Livesplit sound falsely). Maybe you got some ideas? :D

Feb 07 '24 17:02 realRammbob

Sounds like #120 but making sure it should work with the "below" flag as well.

Feb 07 '24 22:02 Avasam

Here's some more user-feedback and testing notes:

(1) False-Positive issue with OCR on few characters I did a Level 31-40 run trying to use OCR on the two Level-digits. The results are: Duration 2 hours 40 minutes (~9600 OCR-checks) 8 out of 8 Level-ups detected 9 False-Positives

Basically when OCR checks again and again, in some images it misreads the number and as there is no hold-variable yet, it simply splits right then. If a user is depending on very few characters with this method, the false-positive likelihood is kinda huge (even with a 100% threshold). (Sidenote: For the run i can use a "Level Up" pop up at center of screen. It will split later, but not have false-positives. So there is a solution, but would've been nice if OCR was able to for example split by just checking the last digit.) Here's a screen showing the digits at top right that i tried OCR on (while it also had a false-positive)

example

(2) Capture-Device out of bounds Crash on Start-up I configured my Autosplit to use a capture device, so it now uses my capture card which should work best as it just contains the footage of the game. However when starting it up and loading a settings.toml, it chooses the device via device ID which sometimes is swapped out with my Logitech Capture or OBS Virtual Camera device. When it tries to use Logitech Capture, it doesn't have 1080p coordinates, so if there's also an Start_Auto_Splitter image that tries to read out of bounds, it crashes immediately upon start-up. To fix it, i can delete the settings.toml and create a new one, or edit the existing one with notepad for example and fix either the device-ID (by guessing) or choose a different split folder so it doesn't start OCR out of bounds.

(3) AutoSplit Integration can't split I tried using the (AutoSplit Integration) to have it start automatically when starting Livesplit with splits & Layout containing the Integration. While it starts up and works fine starting a run, when i tried it with OCR it never split even tho it got beyond the threshold and also showed going on pause. So basically when Autosplit was externally controlled, it wasn't able to progress splits in LiveSplit. (didnt test further, not sure if its an OCR bug or also not working in general)

Feb 12 '24 05:02 realRammbob

False-Positive issue with OCR on few characters

I think that's just gonna be a limitation of the technology. A hold flag (#120) is still the best solution I can think of.

out of bounds Crash

If I understand correctly, this should be easy to replicate by just setting the OCR crop outside of the capture area. Will need to be fixed first. Not certain if I wanna send an error popup (and reset, otherwise you'd be stuck in an error loop) or gracefully handle it.

I guess there's no valid reason to change the capture size mid-run, unless you're testing, so we could include that as part of the initial checks, and if it happens-mid-run, then reset AutoSplit.

AutoSplit Integration can't split

I can't immediately think of a reason why. From the main logic's PoV, there should be no difference between OCR and regular images (other than for displaying the current split). Will have to test.

Feb 12 '24 18:02 Avasam

Unsure if this is relevant or i made a mistake, but for testing i used this version until now. It worked great, especially in RDO to figure out a mission start via

texts = ["go to", "search", "capture", "find", "deliver"]

It always had a 100% match with strings like "Go to the shack", "Kill or capture Gustavo", "Help deliver the goods to Wallace Station" etc.

Yesterday i tried going down this list and downloaded newer versions. 2 Versions (didnt document which one, sorry D: ) crashed upon trying to get it started, and 2 other versions worked. However, those newer versions didnt work on the first two missions i tested, so i checked...

Next mission the text "Go to the shack." only got 50% match (and i use a 98% or 100% threshold, cause it worked amazing with the first version i tested). Did something crucial about the OCR-matching change? I went back to the earliest version i used for all the tests and will keep it that way for now... :D

Feb 13 '24 18:02 realRammbob

Probably related to the out of bounds coordinate crash, but when in the settings i'm using my Capture-Card while also having an Start_Auto_Splitter image (aka Autosplit running) and then opening the settings, it also crashes. Here's a video showcasing it:

https://github.com/Toufool/AutoSplit/assets/17615888/bcc72ba8-88ce-43a4-a8d7-602feac5c574

Feb 16 '24 13:02 realRammbob

@ston1th There is now a merge conflict due to moving out the tutorial/user guide into its own file.

Mar 09 '24 20:03 Avasam

@Avasam Noted. I'll fix this along with the rest once I find some free time again.

Mar 11 '24 13:03 ston1th

Hey @Avasam when you have time, could you please review the latest changes?

Jun 03 '24 13:06 ston1th

Oh sorry I completely forgot about this!!

Thanks for the ping. I'll test the latest changes when I have time (not today), and I think as long as it doesn't break any existing feature, I'll get it in and publish a new release where it's clearly marked as experimental (so I'm allowed to introduce a breaking change for this feature if I wanna change something)

Jun 03 '24 17:06 Avasam

@Avasam I just looked at your changes, may I ask why you changed the rectangle format back to the old one?

I think the new one was more explanatory and better understandable using just the X/Y coordinates of two points in the image. I wanted to make this fix before people adopt this feature despite it being experimental.

Jun 17 '24 10:06 ston1th

@ston1th This had been stalling for too long since I forgot about it, and didn't want to keep you waiting any longer.

Brought the PR to a state I was happy merging, and it doesn't affect existing functionality, so I did.

Feel free to open a follow-up PR for any fix and improvement ! Any follow-up should be much easier and faster to review at this point.

As for the coordinates, I found it really odd that this was the only place using two points. Especially since we effectively just immediatly split it up again in code.

If you still disagree, I can always put it up to a vote with the users on Discord to see what they think.

And once again, thanks a lot for implementing this awesome feature !

Jun 17 '24 15:06 Avasam