subaligner Perhaps a N00b mistake, but what am I doing wrong here? I cannot install ANY optional addons, such as translated, which I really want.

Issue Template

First of all, thank you for finding the bugs. Please help us improve Subaligner by providing the bug report as much as possible in the following format.

###Describe the bug I cannot install any (optional) installations.

###To Reproduce Typing what I shared in the image

###Expected behavior The installation as per the guide

###Screenshots and logs

###Media files N/A

###Desktop (please complete the following information):

Windows 11 - 22H2
Python 3.11 (as installed in the Conda environment I'm running this inside of)

###Additional context Any help is appreciated, thank you.

Feb 26 '24 19:02 cleverestx

Thanks for your interest in subaligner. As indicated in both readme and readthedocs, to run subaligner on Windows you need to use either WSL or Docker Desktop, albeit removing surrounding single quotes from the commands can resolve "Invalid requirement" errors.

Feb 27 '24 10:02 baxtree

Thank you. I somehow missed that detail. I'll try it tonight and let you know.

Feb 27 '24 19:02 cleverestx

Thank you, I was able to install it!

I am interested in creating Subtitles for Japanese audio videos. Is this the right command?

subaligner -m transcribe -v japanesevideo.mp4 -ml jpn -mr whisper -mf large-v2 -tr helsinki-nlp -o subtitle_aligned.srt -t eng,tgt

Feb 29 '24 02:02 cleverestx

I guess that is wrong, I got:

Feb 29 '24 03:02 cleverestx

Or is THIS correct? It seems to be attempting it now...

subaligner -m transcribe -v japanesevideo.mp4 -ml jpn -mr whisper -mf large-v2 -tr helsinki-nlp -o subtitle_aligned.srt -t jpn,eng

Feb 29 '24 04:02 cleverestx

Lastly, is there a way to leverage my high-end Nvidia (RTX) video card to speed this process? (like I can with Whisper-X for example)?

Feb 29 '24 04:02 cleverestx

Or is THIS correct? It seems to be attempting it now...

subaligner -m transcribe -v japanesevideo.mp4 -ml jpn -mr whisper -mf large-v2 -tr helsinki-nlp -o subtitle_aligned.srt -t jpn,eng

This just finished and result was complete rambling nonsense, haha....I mean like nothing was matched in a video with very little speaking, over 1600 instances were documented in the SUBS....what am I doing wrong??

Feb 29 '24 04:02 cleverestx

Hi, it uses "standard" ML libraries which should automatically hook up with cuda if correctly installed. Let me know if you spot anything quirky or if further configuration for your high-end GPUs is needed.

That command will run transcription, subtitle generation and translation in one go. For debugging, can you check if the generated subtitles are less nonsensical without translation? In your case, sth like subaligner -m transcribe -v japanesevideo.mp4 -ml jpn -mr whisper -mf large-v2 -o subtitle_aligned.srt

Feb 29 '24 18:02 baxtree

Hi, it uses "standard" ML libraries which should automatically hook up with cuda if correctly installed. Let me know if you spot anything quirky or if further configuration for your high-end GPUs is needed.

That command will run transcription, subtitle generation and translation in one go. For debugging, can you check if the generated subtitles are less nonsensical without translation? In your case, sth like subaligner -m transcribe -v japanesevideo.mp4 -ml jpn -mr whisper -mf large-v2 -o subtitle_aligned.srt

Thanks. I'm trying the debugging test now you requested without translation. Something is NOT interfacing right with my GPU I guess, because my 2hr:13min video took 50min:54sec to complete translation...@ 17.97s/Iters :-\

I have CUDA 12 installed (it is working with Whisper-X currently)

I'll post the results of the text I'm running later and let you know....I don't read Japanese myself at all, so I'll have to translate that manually with Google or something of course, lol.

Attached here are all the "errors" I got during the installations, if that helps.

Feb 29 '24 19:02 cleverestx

Here is part of the Japanese SUBS it gave for the film 13 Assassins...

`00:00:00,000 --> 00:00:01,960 おやすみなさい

2 00:00:01,960 --> 00:00:03,962 おやすみなさい

3 00:00:03,962 --> 00:00:05,964 おやすみなさい

4 00:00:05,964 --> 00:00:07,966 おやすみなさい

5 00:00:07,966 --> 00:00:09,968 おやすみなさい

6 00:00:09,968 --> 00:00:11,970 おやすみなさい

7 00:00:11,970 --> 00:00:13,972 おやすみなさい

8 00:00:13,972 --> 00:00:15,974 おやすみなさい

9 00:00:15,974 --> 00:00:17,976 おやすみなさい

10 00:00:17,976 --> 00:00:19,978 おやすみなさい

11 00:00:19,978 --> 00:00:21,980 おやすみなさい

12 00:00:21,980 --> 00:00:23,982 おやすみなさい

13 00:00:23,982 --> 00:00:25,984 おやすみなさい

14 00:00:25,984 --> 00:00:27,986 おやすみなさい

15 00:00:27,986 --> 00:00:29,988 おやすみなさい

16 00:00:57,974 --> 00:00:59,976 おやすみなさい

17 00:00:59,976 --> 00:01:01,978 おやすみなさい

18 00:01:01,978 --> 00:01:03,980 おやすみなさい

19 00:01:03,980 --> 00:01:05,982 おやすみなさい

20 00:01:05,982 --> 00:01:07,984 おやすみなさい

21 00:01:07,984 --> 00:01:09,986 おやすみなさい`

Feb 29 '24 20:02 cleverestx

Well, I just realized by looking at it that it appears to be all the same?? Lol

Google Translate says it means "Good night." ??

Feb 29 '24 23:02 cleverestx

The actual film subtitles when ran with it DO vary, the few I spot checked seemed accurate, (some way way off), and I get odd stuff like this here and there:

Or this that takes place during maniacal laughter only, no speaking here at all, lol

Mar 01 '24 02:03 cleverestx

Thanks for sharing the installation logs. I will need to find a Windows machine to reproduce the errors. Looks like your use case is not about aligning existing out-of-sync subtitles so there is no need to install "subaligner[stretch]". By not doing that, you can get rid of some errors.

Mar 01 '24 09:03 baxtree

If the subtitle content is wrong then it is largely attributed to the hallucination of whisper and/or the characteristics of your media file coz that command simply invokes whisper's API to obtain timed words with a specific language code. The funny thing is the transcription mixed Japanese and English, which is also surprising to me.

I can see you are using large-v2 which indeed performs better than large-v3 on audio in Japanese. You mentioned whiperX as well so what you could do is to compare its output to subtitles listed above. Without the media file it is difficult for me to take a closer look and nonetheless, this is whisper-related so better ask questions to their community to seek some insights.

Mar 01 '24 10:03 baxtree

Thanks for sharing the installation logs. I will need to find a Windows machine to reproduce the errors. Looks like your use case is not about aligning existing out-of-sync subtitles so there is no need to install "subaligner[stretch]". By not doing that, you can get rid of some errors.

Okay I did this:

Mar 02 '24 01:03 cleverestx

If the subtitle content is wrong then it is largely attributed to the hallucination of whisper and/or the characteristics of your media file coz that command simply invokes whisper's API to obtain timed words with a specific language code. The funny thing is the transcription mixed Japanese and English, which is also surprising to me.

I can see you are using large-v2 which indeed performs better than large-v3 on audio in Japanese. You mentioned whiperX as well so what you could do is to compare its output to subtitles listed above. Without the media file it is difficult for me to take a closer look and nonetheless, this is whisper-related so better ask questions to their community to seek some insights.

My command I'm using is:

subaligner -m transcribe -v "13 Assassins (2010)-HDTV-1080p.mkv" -ml jpn -mr whisper -mf large-v2 -o subtitle_aligned.srt

Is there a better way to do this (not using whisper for example), I'm not needing whisper if there is a more accurate solution that you are aware of to replace that with or a better way to do this using subaligner in general. Thanks.

Mar 02 '24 01:03 cleverestx

Sorry but at the time of writing whisper models are the sole transcription service. Let me know if you find any other solutions working better on your audio and we are happy to integrate them into subaligner.

Mar 02 '24 10:03 baxtree

The OP seems resolved. Issue closed due to its specificity of transcription quality related to the 3rd party tool.

Apr 16 '24 13:04 baxtree