[BUG] GUI, Burned-in Subtitle Extraction not working
CCExtractor version (using the --version parameter preferably) : 0,85
In raising this issue, I confirm the following (please check boxes, eg [X]):
- [x] I have read and understood the contributors guide.
- [x] I have checked that the bug-fix I am reporting can be replicated, or that the feature I am suggesting isn't already present.
- [x] I have checked that the issue I'm posting isn't already reported.
- [x] I have checked that the issue I'm porting isn't already solved and no duplicates exist in closed issues and in opened issues
- [x] I have checked the pull requests tab for existing solutions/implementations to my issue/suggestion.
- [x] I have used the latest available version of CCExtractor to verify this issue exists.
My familiarity with the project is as follows (check one, eg [X]):
- [x] I have never used CCExtractor.
Necessary information
-
What platform did you use?
-
[x] Windows
-
What where the used arguments?
C:\Program Files (x86)\CCExtractor\ccextractorwin.exe --gui_mode_reports -in=mp4 -autoprogram -out=srt -bom -latin1 -hardsubx -subcolor white -conf_thresh 60 [+input files]
Without OCR in the Input files tab, gives the error Parameter -hardsubx not understood.
C:\Program Files (x86)\CCExtractor\ccextractorwinfull.exe --gui_mode_reports -in=mp4 -autoprogram -out=srt -bom -latin1 -hardsubx -subcolor white -conf_thresh 60 [+input files]
With OCR in the Input files tab , it complains about not having enough memory to initialize Tesseract. I can run Tesseract from the command line using Python on this machine without any issues. Win10, 3.4GHz, 8GB Ram.
Video links I have a Dropbox link I can share privately.
I think the problem can be solved by using destroy() method. As it can free memory used by object.
I am new to CCExtractor (attempted to use it as a part of MCEBuddy on a WTV file that resulted in Garbled random text), so I find my way here, which lead me to install the Windows GUI version of CCExtractor, on my first attempt, I am seeing this same error:
"Not Enough memory to initialize Tesseract!"
This is in the GUI, on a WTV file, Windows 10, 16 GB RAM,
I changed a few settings from the default initially. then after this error, changed back to defaults on the "About & Save" TAB.
I'll be opening separate issue for the garbled text in MCEBuddy
Command line per GUI for modified settings: C:\Program Files (x86)\CCExtractor\ccextractorwinfull.exe --gui_mode_reports -in=wtv -autoprogram -out=srt -bom -cf C:\Users\sj\Desktop\CCExtractor_Elementary_Stream.txt -goppts -latin1 --endcreditstext "Generated by CCExtractor\nhttp://www.ccextractor.org" --endcreditsforatleast 6 --endcreditsforatmost 3 -hardsubx -subcolor white -conf_thresh 60 [+input files]
I'm having the same problem.
ccextractorwinfull.exe --gui_mode_reports -autoprogram -out=srt -bom -latin1 --nofontcolor -hardsubx -subcolor white -conf_thresh 60 [+input files]

16 GB not enough?...
I also tried on WSL.
But hardsub flag is not even recognized.
~/ccextractor/linux$ ./ccextractor -autoprogram -out=srt -bom -latin1 --nofontcolor -hardsubx -subcolor white -conf_thresh 60 testfile.mkv Error: Error: Parameter -hardsubx not understood. CCExtractor 0.87, Carlos Fernandez Sanz, Volker Quetschke. Teletext portions taken from Petr Kutalek's telxcc
To build the program with hardsubx support, from the Linux directory run:- ./configure --enable-hardsubx make ENABLE_HARDSUBX=yes
This is weird, but it doesn't work, many things from the website doesn't work as well, like this tutorial from 2016 - https://abhinavshukla95.wordpress.com/2016/08/18/google-summer-of-code-work-product-submission/
:~/ccextractor/linux$ ./configure --enable-hardsubx -bash: ./configure: No such file or directory
~/ccextractor/linux$ ls -a. autogen.sh builddebug build-static.sh configure.ac Makefile.am pre-build.sh.. build build_hardsubx cleanup description-pak module_generator ubuntu@computer:~/ccextractor/linux$ ./configure.ac --enable-hardsubx -bash: ./configure.ac: Permission denied
I compiled it, have all the necessary components, but hardsubx still not recognized.
:~/ccextractor/linux$ ./configure --enable-hardsubx
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /bin/mkdir -p
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking whether gcc understands -c and -o together... yes
checking for style of include used by make... GNU
checking dependency style of gcc... gcc3
checking whether make sets $(MAKE)... (cached) yes
checking pkg-config m4 macros... yes
checking for sin in -lm... yes
checking for pkg-config... /usr/bin/pkg-config
checking pkg-config is at least version 0.9.0... yes
checking for getLeptonicaVersion in -llept... yes
checking for lept... yes
checking for TessVersion in -ltesseract... yes
checking for tesseract... yes
checking for avcodec_version in -lavcodec... yes
checking for libavcodec... yes
checking for avformat_version in -lavformat... yes
checking for libavformat... yes
checking for avutil_version in -lavutil... yes
checking for libavutil... yes
checking for swscale_version in -lswscale... yes
checking for libswscale... yes
checking how to run the C preprocessor... gcc -E
checking for grep that handles long lines and -e... /bin/grep
checking for egrep... /bin/grep -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking arpa/inet.h usability... yes
checking arpa/inet.h presence... yes
checking for arpa/inet.h... yes
checking fcntl.h usability... yes
checking fcntl.h presence... yes
checking for fcntl.h... yes
checking float.h usability... yes
checking float.h presence... yes
checking for float.h... yes
checking for inttypes.h... (cached) yes
checking limits.h usability... yes
checking limits.h presence... yes
checking for limits.h... yes
checking locale.h usability... yes
checking locale.h presence... yes
checking for locale.h... yes
checking malloc.h usability... yes
checking malloc.h presence... yes
checking for malloc.h... yes
checking netdb.h usability... yes
checking netdb.h presence... yes
checking for netdb.h... yes
checking netinet/in.h usability... yes
checking netinet/in.h presence... yes
checking for netinet/in.h... yes
checking stddef.h usability... yes
checking stddef.h presence... yes
checking for stddef.h... yes
checking for stdint.h... (cached) yes
checking for stdlib.h... (cached) yes
checking for string.h... (cached) yes
checking sys/socket.h usability... yes
checking sys/socket.h presence... yes
checking for sys/socket.h... yes
checking sys/time.h usability... yes
checking sys/time.h presence... yes
checking for sys/time.h... yes
checking sys/timeb.h usability... yes
checking sys/timeb.h presence... yes
checking for sys/timeb.h... yes
checking termios.h usability... yes
checking termios.h presence... yes
checking for termios.h... yes
checking for unistd.h... (cached) yes
checking wchar.h usability... yes
checking wchar.h presence... yes
checking for wchar.h... yes
checking for stdbool.h that conforms to C99... yes
checking for _Bool... yes
checking for inline... inline
checking for int16_t... yes
checking for int32_t... yes
checking for int64_t... yes
checking for int8_t... yes
checking for off_t... yes
checking for pid_t... yes
checking for size_t... yes
checking for ssize_t... yes
checking for uint16_t... yes
checking for uint32_t... yes
checking for uint64_t... yes
checking for uint8_t... yes
checking for ptrdiff_t... yes
checking for error_at_line... yes
checking for _LARGEFILE_SOURCE value needed for large files... no
checking for stdlib.h... (cached) yes
checking for GNU libc compatible malloc... yes
checking whether time.h and sys/time.h may both be included... yes
checking for sys/time.h... (cached) yes
checking for unistd.h... (cached) yes
checking for alarm... yes
checking for working mktime... yes
checking for stdlib.h... (cached) yes
checking for GNU libc compatible realloc... yes
checking whether strerror_r is declared... yes
checking for strerror_r... yes
checking whether strerror_r returns char *... no
checking for floor... yes
checking for ftruncate... yes
checking for gethostbyname... yes
checking for gettimeofday... yes
checking for inet_ntoa... yes
checking for mblen... yes
checking for memchr... yes
checking for memmove... yes
checking for memset... yes
checking for mkdir... yes
checking for modf... yes
checking for pow... yes
checking for realpath... yes
checking for rmdir... yes
checking for select... yes
checking for setlocale... yes
checking for socket... yes
checking for sqrt... yes
checking for strcasecmp... yes
checking for strchr... yes
checking for strdup... yes
checking for strerror... yes
checking for strndup... yes
checking for strrchr... yes
checking for strstr... yes
checking for strtol... yes
configure: avcodec library found
configure: avformat library found
configure: avutil library found
configure: swscale library found
configure: tesseract library found... tesseract 3.04.01
configure: leptonica library found... leptonica-1.73
checking that generated files are newer than configure... done
configure: creating ./config.status
config.status: creating Makefile
config.status: executing depfiles commands
~/ccextractor/linux$ make ENABLE_HARDSUBX=yes
make: Nothing to be done for 'all'.
./ccextractor testfile.mkv -hardsubx -whiteness_thresh 90 -conf_thresh 60
Error: Error: Parameter -hardsubx not understood.
CCExtractor 0.87, Carlos Fernandez Sanz, Volker Quetschke.
Teletext portions taken from Petr Kutalek's telxcc
It's not actually compiling anything:
~/ccextractor/linux$ make ENABLE_HARDSUBX=yes make: Nothing to be done for 'all'.
On Wed, Apr 11, 2018 at 4:04 PM, rudolphos [email protected] wrote:
I compiled it, have all the necessary components, but hardsubx still not recognized.
:~/ccextractor/linux$ ./configure --enable-hardsubx checking for a BSD-compatible install... /usr/bin/install -c checking whether build environment is sane... yes checking for a thread-safe mkdir -p... /bin/mkdir -p checking for gawk... gawk checking whether make sets $(MAKE)... yes checking whether make supports nested variables... yes checking for gcc... gcc checking whether the C compiler works... yes checking for C compiler default output file name... a.out checking for suffix of executables... checking whether we are cross compiling... no checking for suffix of object files... o checking whether we are using the GNU C compiler... yes checking whether gcc accepts -g... yes checking for gcc option to accept ISO C89... none needed checking whether gcc understands -c and -o together... yes checking for style of include used by make... GNU checking dependency style of gcc... gcc3 checking whether make sets $(MAKE)... (cached) yes checking pkg-config m4 macros... yes checking for sin in -lm... yes checking for pkg-config... /usr/bin/pkg-config checking pkg-config is at least version 0.9.0... yes checking for getLeptonicaVersion in -llept... yes checking for lept... yes checking for TessVersion in -ltesseract... yes checking for tesseract... yes checking for avcodec_version in -lavcodec... yes checking for libavcodec... yes checking for avformat_version in -lavformat... yes checking for libavformat... yes checking for avutil_version in -lavutil... yes checking for libavutil... yes checking for swscale_version in -lswscale... yes checking for libswscale... yes checking how to run the C preprocessor... gcc -E checking for grep that handles long lines and -e... /bin/grep checking for egrep... /bin/grep -E checking for ANSI C header files... yes checking for sys/types.h... yes checking for sys/stat.h... yes checking for stdlib.h... yes checking for string.h... yes checking for memory.h... yes checking for strings.h... yes checking for inttypes.h... yes checking for stdint.h... yes checking for unistd.h... yes checking arpa/inet.h usability... yes checking arpa/inet.h presence... yes checking for arpa/inet.h... yes checking fcntl.h usability... yes checking fcntl.h presence... yes checking for fcntl.h... yes checking float.h usability... yes checking float.h presence... yes checking for float.h... yes checking for inttypes.h... (cached) yes checking limits.h usability... yes checking limits.h presence... yes checking for limits.h... yes checking locale.h usability... yes checking locale.h presence... yes checking for locale.h... yes checking malloc.h usability... yes checking malloc.h presence... yes checking for malloc.h... yes checking netdb.h usability... yes checking netdb.h presence... yes checking for netdb.h... yes checking netinet/in.h usability... yes checking netinet/in.h presence... yes checking for netinet/in.h... yes checking stddef.h usability... yes checking stddef.h presence... yes checking for stddef.h... yes checking for stdint.h... (cached) yes checking for stdlib.h... (cached) yes checking for string.h... (cached) yes checking sys/socket.h usability... yes checking sys/socket.h presence... yes checking for sys/socket.h... yes checking sys/time.h usability... yes checking sys/time.h presence... yes checking for sys/time.h... yes checking sys/timeb.h usability... yes checking sys/timeb.h presence... yes checking for sys/timeb.h... yes checking termios.h usability... yes checking termios.h presence... yes checking for termios.h... yes checking for unistd.h... (cached) yes checking wchar.h usability... yes checking wchar.h presence... yes checking for wchar.h... yes checking for stdbool.h that conforms to C99... yes checking for _Bool... yes checking for inline... inline checking for int16_t... yes checking for int32_t... yes checking for int64_t... yes checking for int8_t... yes checking for off_t... yes checking for pid_t... yes checking for size_t... yes checking for ssize_t... yes checking for uint16_t... yes checking for uint32_t... yes checking for uint64_t... yes checking for uint8_t... yes checking for ptrdiff_t... yes checking for error_at_line... yes checking for _LARGEFILE_SOURCE value needed for large files... no checking for stdlib.h... (cached) yes checking for GNU libc compatible malloc... yes checking whether time.h and sys/time.h may both be included... yes checking for sys/time.h... (cached) yes checking for unistd.h... (cached) yes checking for alarm... yes checking for working mktime... yes checking for stdlib.h... (cached) yes checking for GNU libc compatible realloc... yes checking whether strerror_r is declared... yes checking for strerror_r... yes checking whether strerror_r returns char *... no checking for floor... yes checking for ftruncate... yes checking for gethostbyname... yes checking for gettimeofday... yes checking for inet_ntoa... yes checking for mblen... yes checking for memchr... yes checking for memmove... yes checking for memset... yes checking for mkdir... yes checking for modf... yes checking for pow... yes checking for realpath... yes checking for rmdir... yes checking for select... yes checking for setlocale... yes checking for socket... yes checking for sqrt... yes checking for strcasecmp... yes checking for strchr... yes checking for strdup... yes checking for strerror... yes checking for strndup... yes checking for strrchr... yes checking for strstr... yes checking for strtol... yes configure: avcodec library found configure: avformat library found configure: avutil library found configure: swscale library found configure: tesseract library found... tesseract 3.04.01 configure: leptonica library found... leptonica-1.73 checking that generated files are newer than configure... done configure: creating ./config.status config.status: creating Makefile config.status: executing depfiles commands
~/ccextractor/linux$ make ENABLE_HARDSUBX=yes make: Nothing to be done for 'all'.
./ccextractor testfile.mkv -hardsubx -whiteness_thresh 90 -conf_thresh 60 Error: Error: Parameter -hardsubx not understood.
CCExtractor 0.87, Carlos Fernandez Sanz, Volker Quetschke. Teletext portions taken from Petr Kutalek's telxcc
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/CCExtractor/ccextractor/issues/806#issuecomment-380622796, or mute the thread https://github.com/notifications/unsubscribe-auth/AFrJ2QVJvp6gln1qrppIbnzE1F81ViCZks5tnov2gaJpZM4QjVIP .
I fixed it. But there is another problem:
Can it detect subtitles properly like these by using OCR / burned-in sub detection?

./ccextractor testfile.mp4 -hardsubx -whiteness_thresh 90 -conf_thresh 60
CCExtractor 0.87, Carlos Fernandez Sanz, Volker Quetschke.
Teletext portions taken from Petr Kutalek's telxcc
--------------------------------------------------------------------------
Input: testfile.mp4
[Extract: 1] [Stream mode: Autodetect]
[Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto]
[Timing mode: Auto] [Debug: No] [Buffer input: No]
[Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No]
[Target format: .srt] [Encoding: UTF-8] [Delay: 0] [Trim lines: No]
[Add font color data: Yes] [Add font typesetting: Yes]
[Convert case: No] [Video-edit join: No]
[Extraction start time: not set (from start)]
[Extraction end time: not set (to end)]
[Live stream: No] [Clock frequency: 90000]
[Teletext page: Autodetect]
[Start credits text: None]
[Quantisation-mode: CCExtractor's internal function]
-----------------------------------------------------------------
Opening file: testfile.mp4
Detected MP4 box with name: ftyp
Detected MP4 box with name: free
Detected MP4 box with name: mdat
File seems to be a MP4
Analyzing data with GPAC (MP4 library)
Opening 'testfile.mp4': ok
Track 1, type=vide subtype=avc1
Track 2, type=soun subtype=MPEG
MP4: found 2 tracks: 1 avc and 0 cc
Processing track 1, type=vide subtype=avc1
100% | 34:12Processing track 2, type=soun subtype=MPEG
Closing media: ok
Found 1 AVC track(s). Found no dedicated CC track(s).
Total frames time: 00:31:56:014 (57423 frames at 29.97fps)
Min PTS: 00:00:00:120
Max PTS: 00:34:12:060
Length: 00:34:14:940
Done, processing time = 1 seconds
No captions were found in input.
Does anyone have any more info on this? @rudolphos can you give some details on "I fixed it" please? Thanks!
Ditto to Anton's request. I'm having exactly the same problem in both Windows and Ubuntu. @rudolphos, could you please explain how you fixed this?
@RobJacobson , I also "fixed" it - the problem is that it seems (at least at the time) Tesseract 4 is not properly supported yet. I was able to get it compiled and extracting subs using the latest Tesseract 3 installed from source. The extraction was not particularly accurate but I will definitely get back to trying to tweak that soon (hopefully).
Thanks for the info! Really appreciate it.
It's not actually compiling anything: ~/ccextractor/linux$ make ENABLE_HARDSUBX=yes make: Nothing to be done for 'all'.
@cfsmp3 @RobJacobson @AntonOfTheWoods
On ubuntu 18.04. I first followed the instructions here: https://github.com/CCExtractor/ccextractor/blob/master/docs/COMPILATION.MD)
git clone https://github.com/CCExtractor/ccextractor.git sudo apt-get install -y libglew-dev sudo apt-get install -y libglfw3-dev sudo apt-get install -y cmake sudo apt-get install -y gcc sudo apt-get install -y libcurl4-gnutls-dev sudo apt-get install -y tesseract-ocr sudo apt-get install -y tesseract-ocr-dev (note package doesn't exist) sudo apt-get install -y libleptonica-dev
Skipping lines with tesseract-ocr and tesseract-ocr-dev since above commenters explained that Tesseract 4 is unsupported, so I used Tesseract 3.05.02 from here: https://github.com/tesseract-ocr/tesseract/archive/3.05.02.tar.gz
./autogen.sh ./configure make sudo make install sudo ldconfig
Then I followed the instructions here as a model: https://github.com/CCExtractor/ccextractor/blob/master/docs/OCR.md
cd ccextractor/linux ./build ./autogen.sh ./configure --enable-ocr --enable-hardsubx (note typo in the instructions) make
which produced "Error: avcodec library not found."
How should I proceed?
It's not actually compiling anything: ~/ccextractor/linux$ make ENABLE_HARDSUBX=yes make: Nothing to be done for 'all'.
@cfsmp3 @RobJacobson @AntonOfTheWoods
On ubuntu 18.04. I first followed the instructions here: https://github.com/CCExtractor/ccextractor/blob/master/docs/COMPILATION.MD)
git clone https://github.com/CCExtractor/ccextractor.git sudo apt-get install -y libglew-dev sudo apt-get install -y libglfw3-dev sudo apt-get install -y cmake sudo apt-get install -y gcc sudo apt-get install -y libcurl4-gnutls-dev sudo apt-get install -y tesseract-ocr sudo apt-get install -y tesseract-ocr-dev (note package doesn't exist) sudo apt-get install -y libleptonica-dev
Skipping lines with tesseract-ocr and tesseract-ocr-dev since above commenters explained that Tesseract 4 is unsupported, so I used Tesseract 3.05.02 from here: https://github.com/tesseract-ocr/tesseract/archive/3.05.02.tar.gz
./autogen.sh ./configure make sudo make install sudo ldconfig
Then I followed the instructions here as a model: https://github.com/CCExtractor/ccextractor/blob/master/docs/OCR.md
cd ccextractor/linux ./build ./autogen.sh ./configure --enable-ocr --enable-hardsubx (note typo in the instructions) make
which produced "Error: avcodec library not found."
How should I proceed?
I get the same error. Couldn't find a solution yet.
@atulpatildbz @anonynamja @RobJacobson Is this still a problem in the current master?
Closing due to no answer