Perhaps a N00b mistake, but what am I doing wrong here? I cannot install ANY optional addons, such as translated, which I really want.
Issue Template
First of all, thank you for finding the bugs. Please help us improve Subaligner by providing the bug report as much as possible in the following format.
###Describe the bug I cannot install any (optional) installations.
###To Reproduce Typing what I shared in the image
###Expected behavior The installation as per the guide
###Screenshots and logs
###Media files N/A
###Desktop (please complete the following information):
- Windows 11 - 22H2
- Python 3.11 (as installed in the Conda environment I'm running this inside of)
###Additional context Any help is appreciated, thank you.
Thanks for your interest in subaligner. As indicated in both readme and readthedocs, to run subaligner on Windows you need to use either WSL or Docker Desktop, albeit removing surrounding single quotes from the commands can resolve "Invalid requirement" errors.
Thank you. I somehow missed that detail. I'll try it tonight and let you know.
Thank you, I was able to install it!
I am interested in creating Subtitles for Japanese audio videos. Is this the right command?
subaligner -m transcribe -v japanesevideo.mp4 -ml jpn -mr whisper -mf large-v2 -tr helsinki-nlp -o subtitle_aligned.srt -t eng,tgt
I guess that is wrong, I got:
Or is THIS correct? It seems to be attempting it now...
subaligner -m transcribe -v japanesevideo.mp4 -ml jpn -mr whisper -mf large-v2 -tr helsinki-nlp -o subtitle_aligned.srt -t jpn,eng
Lastly, is there a way to leverage my high-end Nvidia (RTX) video card to speed this process? (like I can with Whisper-X for example)?
Or is THIS correct? It seems to be attempting it now...
subaligner -m transcribe -v japanesevideo.mp4 -ml jpn -mr whisper -mf large-v2 -tr helsinki-nlp -o subtitle_aligned.srt -t jpn,eng
This just finished and result was complete rambling nonsense, haha....I mean like nothing was matched in a video with very little speaking, over 1600 instances were documented in the SUBS....what am I doing wrong??
Hi, it uses "standard" ML libraries which should automatically hook up with cuda if correctly installed. Let me know if you spot anything quirky or if further configuration for your high-end GPUs is needed.
That command will run transcription, subtitle generation and translation in one go. For debugging, can you check if the generated subtitles are less nonsensical without translation? In your case, sth like
subaligner -m transcribe -v japanesevideo.mp4 -ml jpn -mr whisper -mf large-v2 -o subtitle_aligned.srt
Hi, it uses "standard" ML libraries which should automatically hook up with cuda if correctly installed. Let me know if you spot anything quirky or if further configuration for your high-end GPUs is needed.
That command will run transcription, subtitle generation and translation in one go. For debugging, can you check if the generated subtitles are less nonsensical without translation? In your case, sth like
subaligner -m transcribe -v japanesevideo.mp4 -ml jpn -mr whisper -mf large-v2 -o subtitle_aligned.srt
Thanks. I'm trying the debugging test now you requested without translation. Something is NOT interfacing right with my GPU I guess, because my 2hr:13min video took 50min:54sec to complete translation...@ 17.97s/Iters :-\
I have CUDA 12 installed (it is working with Whisper-X currently)
I'll post the results of the text I'm running later and let you know....I don't read Japanese myself at all, so I'll have to translate that manually with Google or something of course, lol.
Attached here are all the "errors" I got during the installations, if that helps.
Here is part of the Japanese SUBS it gave for the film 13 Assassins...
`00:00:00,000 --> 00:00:01,960 おやすみなさい
2 00:00:01,960 --> 00:00:03,962 おやすみなさい
3 00:00:03,962 --> 00:00:05,964 おやすみなさい
4 00:00:05,964 --> 00:00:07,966 おやすみなさい
5 00:00:07,966 --> 00:00:09,968 おやすみなさい
6 00:00:09,968 --> 00:00:11,970 おやすみなさい
7 00:00:11,970 --> 00:00:13,972 おやすみなさい
8 00:00:13,972 --> 00:00:15,974 おやすみなさい
9 00:00:15,974 --> 00:00:17,976 おやすみなさい
10 00:00:17,976 --> 00:00:19,978 おやすみなさい
11 00:00:19,978 --> 00:00:21,980 おやすみなさい
12 00:00:21,980 --> 00:00:23,982 おやすみなさい
13 00:00:23,982 --> 00:00:25,984 おやすみなさい
14 00:00:25,984 --> 00:00:27,986 おやすみなさい
15 00:00:27,986 --> 00:00:29,988 おやすみなさい
16 00:00:57,974 --> 00:00:59,976 おやすみなさい
17 00:00:59,976 --> 00:01:01,978 おやすみなさい
18 00:01:01,978 --> 00:01:03,980 おやすみなさい
19 00:01:03,980 --> 00:01:05,982 おやすみなさい
20 00:01:05,982 --> 00:01:07,984 おやすみなさい
21 00:01:07,984 --> 00:01:09,986 おやすみなさい`
Well, I just realized by looking at it that it appears to be all the same?? Lol
Google Translate says it means "Good night." ??
The actual film subtitles when ran with it DO vary, the few I spot checked seemed accurate, (some way way off), and I get odd stuff like this here and there:
Or this that takes place during maniacal laughter only, no speaking here at all, lol
Thanks for sharing the installation logs. I will need to find a Windows machine to reproduce the errors. Looks like your use case is not about aligning existing out-of-sync subtitles so there is no need to install "subaligner[stretch]". By not doing that, you can get rid of some errors.
If the subtitle content is wrong then it is largely attributed to the hallucination of whisper and/or the characteristics of your media file coz that command simply invokes whisper's API to obtain timed words with a specific language code. The funny thing is the transcription mixed Japanese and English, which is also surprising to me.
I can see you are using large-v2 which indeed performs better than large-v3 on audio in Japanese. You mentioned whiperX as well so what you could do is to compare its output to subtitles listed above. Without the media file it is difficult for me to take a closer look and nonetheless, this is whisper-related so better ask questions to their community to seek some insights.
Thanks for sharing the installation logs. I will need to find a Windows machine to reproduce the errors. Looks like your use case is not about aligning existing out-of-sync subtitles so there is no need to install "subaligner[stretch]". By not doing that, you can get rid of some errors.
Okay I did this:
If the subtitle content is wrong then it is largely attributed to the hallucination of whisper and/or the characteristics of your media file coz that command simply invokes whisper's API to obtain timed words with a specific language code. The funny thing is the transcription mixed Japanese and English, which is also surprising to me.
I can see you are using large-v2 which indeed performs better than large-v3 on audio in Japanese. You mentioned whiperX as well so what you could do is to compare its output to subtitles listed above. Without the media file it is difficult for me to take a closer look and nonetheless, this is whisper-related so better ask questions to their community to seek some insights.
My command I'm using is:
subaligner -m transcribe -v "13 Assassins (2010)-HDTV-1080p.mkv" -ml jpn -mr whisper -mf large-v2 -o subtitle_aligned.srt
Is there a better way to do this (not using whisper for example), I'm not needing whisper if there is a more accurate solution that you are aware of to replace that with or a better way to do this using subaligner in general. Thanks.
Sorry but at the time of writing whisper models are the sole transcription service. Let me know if you find any other solutions working better on your audio and we are happy to integrate them into subaligner.
The OP seems resolved. Issue closed due to its specificity of transcription quality related to the 3rd party tool.