Music Spectrogram diffusion pipeline
For issue #320 and #544
The documentation is not available anymore as the PR was closed or merged.
@patil-suraj @anton-l maybe you could already take a look here if you find some time :-)
There's no pipeline.__call__ yet, so I'll wait for @kashif's ping when it's ready :smile:
yup getting there...
This looks like a great start! @kashif could you add a code snippet showing how to run the model for inference? Similar to https://github.com/huggingface/diffusers/pull/658#issue-1388250440 maybe? :-)
Once we can reproduce some results locally, I think we'll have a much easier time getting this PR merged :-)
@kashif, let me know as soon as you have a working example of the model and then we can take it from this example :-) Maybe a google colab using this branch of diffusers would be great!
@patrickvonplaten sure I am working on a colab and will share!
Hey,
I looked into the notebook: https://colab.research.google.com/drive/1ntgPTgR6tQ-PJ14GOmfQM7tVloot8z3S?usp=sharing
but I don't seem to be able to load it - any ideas what could be the reason here @kashif ?
Would be amazing if you we could get it to run in a public google colab notebook
@kashif thanks a lot for making all this progress here!
If possible it would be extremely helpful to add a link to a google colab that runs this pipeline in inference and works with all the required dependencies :-) Think this would really help to gauge how long inference takes, what installs are required etc...
@patrickvonplaten thank you so much! Kindly have a look here: https://colab.research.google.com/drive/1AWBX2UNQcbROPMu9tNICbIKk5A2zDqS1?usp=sharing
Thanks a lot @kashif, that's super useful. I was able to run the whole notebook and the results sound very nice!
Just two things I noticed:
-
- The first 5-6 seconds seemed to be just noise for me (think this corresponds exactly to the first generated segment). See generated audio here: https://colab.research.google.com/drive/1W0lBX_PUIcwdp7fCRVWQmTASfRHWo2pi?usp=sharing -> any idea what could be the problem here?
-
- The pipeline is really slow. It takes almost 30 mins to generate the audio on a google colab. That's roughly 15x Real-time factor. Do you think there are ways to speed it up?
Apart from this PR is in a super nice state. If ok for you I could take it over at the end of next week to do some final changes :-)
thanks, @patrickvonplaten
-
Regarding this, it is like this because at the start there is no conditioning latents so it generates zeros. Let me double-check the original sampling code to see if they throw this away rather than concating it.
-
yes indeed it is slow, due to the fact we have inference steps of 1000, and then many 5sec segments that get looped over... perhaps another scheduler and fewer inference steps could help?
Thank you for having a look. I will try to investigate issue 1.
@patrickvonplaten so I believe i fixed issue 1 above with 5 sec of empty noise for the first segment.
regarding issue 2, i reduced the amount of cpu/gpu transfer now, as well as making repeated ones and zeros tensors, and also fixed the fp16/bf16 inference pipeline... I do not however see much improvement in speed though...
@patrickvonplaten so I believe i fixed issue 1 above with 5 sec of empty noise for the first segment.
regarding issue 2, i reduced the amount of cpu/gpu transfer now, as well as making repeated ones and zeros tensors, and also fixed the fp16/bf16 inference pipeline... I do not however see much improvement in speed though...
Awesome! Thanks a lot :-) I will try to take a deeper look into the PR early next week :crossed_fingers:
@kashif should we try to finish the PR next week? Think the community would be very excited about this :-)
ok @patrickvonplaten let me refresh it and check again
@patrickvonplaten so the failing docs are because note_seq installs the latest protobuf which causes issues with tensorboard etc. so one solution was to also pin protobuf to 3.20.x which seems to make everyone happy?
@patrickvonplaten so the failing docs are because
note_seqinstalls the latestprotobufwhich causes issues withtensorboardetc. so one solution was to also pinprotobufto3.20.xwhich seems to make everyone happy?
Ahh I see - yes let's pin it :-)
Think we need a "make style" here and then we should be good to go :-)
@patrickvonplaten I could not reproduce the one failed mps test...
Think the MPS failure is unrelated - let's merge once the tests now are more or less green :-)