My fedora computer is incredibly slow.
I'm sorry if this is dumb of me but fedora is installed on my computer and it is SO SLOW when running whisper.cpp. I need speech-to-text for my dyslexia. here is the out put of when I speek after running
./stream -m ./models/ggml-small.en.bin -t 16 --step 500 --length 5000 -c 0
init: found 1 capture devices:
init: - Capture device #0: 'Family 17h/19h HD Audio Controller Analog Stereo'
init: attempt to open capture device 0 : 'Family 17h/19h HD Audio Controller Analog Stereo' ...
init: obtained spec for input device (SDL Id = 2):
init: - sample rate: 16000
init: - format: 33056 (required: 33056)
init: - channels: 1 (required: 1)
init: - samples per frame: 1024
whisper_init_from_file_no_state: loading model from './models/ggml-small.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab = 51864
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 768
whisper_model_load: n_audio_head = 12
whisper_model_load: n_audio_layer = 12
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 768
whisper_model_load: n_text_head = 12
whisper_model_load: n_text_layer = 12
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 1
whisper_model_load: qntvr = 0
whisper_model_load: type = 3
whisper_model_load: mem required = 743.00 MB (+ 16.00 MB per decoder)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: model ctx = 464.68 MB
whisper_model_load: model size = 464.44 MB
whisper_init_state: kv self size = 15.75 MB
whisper_init_state: kv cross size = 52.73 MB
main: processing 8000 samples (step = 0.5 sec / len = 5.0 sec / keep = 0.2 sec), 16 threads, lang = en, task = transcribe, timestamps = 0 ...
main: n_new_line = 9, no_context = 1
whisper_print_timings: load time = 567.76 ms
whisper_print_timings: fallbacks = 0 p / 0 h
whisper_print_timings: mel time = 534.70 ms
whisper_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
whisper_print_timings: encode time = 0.00 ms / 1 runs ( 0.00 ms per run)
whisper_print_timings: decode time = 0.00 ms / 1 runs ( 0.00 ms per run)
whisper_print_timings: total time = 1508.83 ms
I hope we can figure out what is wrong but I'm taking a summer class on the 30th so hopefully we can find out whats wrong soon.
What CPU do you have?
lscpu
Vendor ID: AuthenticAMD
Model name: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx
dont put -t 16 , if I am not mistaken that tells it to use all 16 threads of your CPU. Try -t 8 and adjust up or down depending on how slow your PC gets, 8 threads should easily be enough. How long does the slowness last for, longer than a few seconds?
I am also using fedora with an amd cpu ( Model name: AMD Ryzen 7 5800H with Radeon Graphics) but i am guessing the computation is just naturally slow ( I am not using stream but for an audio of a few seconds it feels like it takes a few seconds whereas the tiny model is reasonably fast and I do not feel the weight of waiting) .
Before I think used https://github.com/Uberi/speech_recognition with the google cloud model that is free with usage limits (I think i looked at https://www.geeksforgeeks.org/speech-recognition-in-python-using-google-speech-api/ but I am not sure, you can maybe find others online)(all of this is vague in my memory but i think the usage limits are around 1 minute per audio and around 50 api calls per day but i saw a stack exchange question where the user claimed to be able to use it pass those limits so i do not know)
Did you build with CUDA enabled? I don't see any mention of CUDA in your output log.