Bartłomiej Różański

Results 10 comments of Bartłomiej Różański

that would be super cool, I thought about words counter but timestamp could be also easily consumed by external tools

@gkucsko any chance to bring it up? I wonder how to accurately adjust seconds to resulted samples, apologises for probably naive question but could it be calculated of generated audio...

It would be great to run Bark in Elixir, also recently this TTS model brought a lot of attention https://github.com/collabora/WhisperSpeech

I tried to port Bark and later on WhisperSpeech, they use multiple models to convert text to semantics, semantics to audio and encode... anyway there are more promising models recently...

@michelson not yet but working on it, this models aren't using standard layers or if at all they are in pickle format, I needed to move back to understand simpler...

I'm currently playing around Tacotron 2 text-to-speech and since it's simplest TTS I've found I'm trying to reproduce it in Elixir, I used `nx_signal` to process audio files and generate...

I was thinking it might be one of torchaudio vocoders like Griffin-Lim(outputs sounds robotic) or WaveRNN(most likely this) or Nvidia Waveglow to turn mel spectograms into audio, but I just...

Thank you for the update! Looks like some sort of binary conversion functions with desired 4-bit type were merged https://github.com/elixir-nx/nx/pull/1528 but for my quick research native support in XLA/pytorch is...

Initially I was looking at GGUF, but actually many quantized models on Hugging Face (like unsloth's optimized versions) use bitsandbytes library instead of GGUF format which seems to be more...

It makes sense more or less, the missing part for me was the current Axon quantization implementation so it would integrate well with plugins that might support ie 4 bit...