whisper.cpp My fedora computer is incredibly slow.

I'm sorry if this is dumb of me but fedora is installed on my computer and it is SO SLOW when running whisper.cpp. I need speech-to-text for my dyslexia. here is the out put of when I speek after running

./stream -m ./models/ggml-small.en.bin -t 16 --step 500 --length 5000 -c 0

init: found 1 capture devices:
init:    - Capture device #0: 'Family 17h/19h HD Audio Controller Analog Stereo'
init: attempt to open capture device 0 : 'Family 17h/19h HD Audio Controller Analog Stereo' ...
init: obtained spec for input device (SDL Id = 2):
init:     - sample rate:       16000
init:     - format:            33056 (required: 33056)
init:     - channels:          1 (required: 1)
init:     - samples per frame: 1024
whisper_init_from_file_no_state: loading model from './models/ggml-small.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 768
whisper_model_load: n_audio_head  = 12
whisper_model_load: n_audio_layer = 12
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 768
whisper_model_load: n_text_head   = 12
whisper_model_load: n_text_layer  = 12
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 3
whisper_model_load: mem required  =  743.00 MB (+   16.00 MB per decoder)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: model ctx     =  464.68 MB
whisper_model_load: model size    =  464.44 MB
whisper_init_state: kv self size  =   15.75 MB
whisper_init_state: kv cross size =   52.73 MB

main: processing 8000 samples (step = 0.5 sec / len = 5.0 sec / keep = 0.2 sec), 16 threads, lang = en, task = transcribe, timestamps = 0 ...
main: n_new_line = 9, no_context = 1

whisper_print_timings:     load time =   567.76 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =   534.70 ms
whisper_print_timings:   sample time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:   encode time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:   decode time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:    total time =  1508.83 ms

I hope we can figure out what is wrong but I'm taking a summer class on the 30th so hopefully we can find out whats wrong soon.

May 20 '23 00:05 jon-bit

What CPU do you have?

May 20 '23 02:05 j-f1

lscpu

Vendor ID:               AuthenticAMD
  Model name:            AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx

May 20 '23 02:05 jon-bit

dont put -t 16 , if I am not mistaken that tells it to use all 16 threads of your CPU. Try -t 8 and adjust up or down depending on how slow your PC gets, 8 threads should easily be enough. How long does the slowness last for, longer than a few seconds?

May 26 '23 13:05 OrvinPlante

I am also using fedora with an amd cpu ( Model name: AMD Ryzen 7 5800H with Radeon Graphics) but i am guessing the computation is just naturally slow ( I am not using stream but for an audio of a few seconds it feels like it takes a few seconds whereas the tiny model is reasonably fast and I do not feel the weight of waiting) .

Before I think used https://github.com/Uberi/speech_recognition with the google cloud model that is free with usage limits (I think i looked at https://www.geeksforgeeks.org/speech-recognition-in-python-using-google-speech-api/ but I am not sure, you can maybe find others online)(all of this is vague in my memory but i think the usage limits are around 1 minute per audio and around 50 api calls per day but i saw a stack exchange question where the user claimed to be able to use it pass those limits so i do not know)

Jun 01 '23 19:06 userrand

Did you build with CUDA enabled? I don't see any mention of CUDA in your output log.

Jun 04 '24 21:06 ulatekh