diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

Music Spectrogram diffusion pipeline

Open kashif opened this issue 3 years ago • 14 comments

For issue #320 and #544

kashif avatar Oct 28 '22 16:10 kashif

The documentation is not available anymore as the PR was closed or merged.

@patil-suraj @anton-l maybe you could already take a look here if you find some time :-)

patrickvonplaten avatar Oct 31 '22 18:10 patrickvonplaten

There's no pipeline.__call__ yet, so I'll wait for @kashif's ping when it's ready :smile:

anton-l avatar Nov 02 '22 13:11 anton-l

yup getting there...

kashif avatar Nov 02 '22 14:11 kashif

This looks like a great start! @kashif could you add a code snippet showing how to run the model for inference? Similar to https://github.com/huggingface/diffusers/pull/658#issue-1388250440 maybe? :-)

Once we can reproduce some results locally, I think we'll have a much easier time getting this PR merged :-)

patrickvonplaten avatar Nov 04 '22 17:11 patrickvonplaten

@kashif, let me know as soon as you have a working example of the model and then we can take it from this example :-) Maybe a google colab using this branch of diffusers would be great!

patrickvonplaten avatar Nov 28 '22 09:11 patrickvonplaten

@patrickvonplaten sure I am working on a colab and will share!

kashif avatar Nov 28 '22 10:11 kashif

Hey,

I looked into the notebook: https://colab.research.google.com/drive/1ntgPTgR6tQ-PJ14GOmfQM7tVloot8z3S?usp=sharing

but I don't seem to be able to load it - any ideas what could be the reason here @kashif ?

Would be amazing if you we could get it to run in a public google colab notebook

patrickvonplaten avatar Dec 20 '22 01:12 patrickvonplaten

@kashif thanks a lot for making all this progress here!

If possible it would be extremely helpful to add a link to a google colab that runs this pipeline in inference and works with all the required dependencies :-) Think this would really help to gauge how long inference takes, what installs are required etc...

patrickvonplaten avatar Jan 19 '23 01:01 patrickvonplaten

@patrickvonplaten thank you so much! Kindly have a look here: https://colab.research.google.com/drive/1AWBX2UNQcbROPMu9tNICbIKk5A2zDqS1?usp=sharing

kashif avatar Jan 19 '23 12:01 kashif

Thanks a lot @kashif, that's super useful. I was able to run the whole notebook and the results sound very nice!

Just two things I noticed:

    1. The first 5-6 seconds seemed to be just noise for me (think this corresponds exactly to the first generated segment). See generated audio here: https://colab.research.google.com/drive/1W0lBX_PUIcwdp7fCRVWQmTASfRHWo2pi?usp=sharing -> any idea what could be the problem here?
    1. The pipeline is really slow. It takes almost 30 mins to generate the audio on a google colab. That's roughly 15x Real-time factor. Do you think there are ways to speed it up?

Apart from this PR is in a super nice state. If ok for you I could take it over at the end of next week to do some final changes :-)

patrickvonplaten avatar Jan 22 '23 21:01 patrickvonplaten

thanks, @patrickvonplaten

  1. Regarding this, it is like this because at the start there is no conditioning latents so it generates zeros. Let me double-check the original sampling code to see if they throw this away rather than concating it.

  2. yes indeed it is slow, due to the fact we have inference steps of 1000, and then many 5sec segments that get looped over... perhaps another scheduler and fewer inference steps could help?

Thank you for having a look. I will try to investigate issue 1.

kashif avatar Jan 23 '23 08:01 kashif

@patrickvonplaten so I believe i fixed issue 1 above with 5 sec of empty noise for the first segment.

regarding issue 2, i reduced the amount of cpu/gpu transfer now, as well as making repeated ones and zeros tensors, and also fixed the fp16/bf16 inference pipeline... I do not however see much improvement in speed though...

kashif avatar Jan 30 '23 20:01 kashif

@patrickvonplaten so I believe i fixed issue 1 above with 5 sec of empty noise for the first segment.

regarding issue 2, i reduced the amount of cpu/gpu transfer now, as well as making repeated ones and zeros tensors, and also fixed the fp16/bf16 inference pipeline... I do not however see much improvement in speed though...

Awesome! Thanks a lot :-) I will try to take a deeper look into the PR early next week :crossed_fingers:

patrickvonplaten avatar Feb 03 '23 15:02 patrickvonplaten

@kashif should we try to finish the PR next week? Think the community would be very excited about this :-)

patrickvonplaten avatar Mar 02 '23 16:03 patrickvonplaten

ok @patrickvonplaten let me refresh it and check again

kashif avatar Mar 02 '23 16:03 kashif

@patrickvonplaten so the failing docs are because note_seq installs the latest protobuf which causes issues with tensorboard etc. so one solution was to also pin protobuf to 3.20.x which seems to make everyone happy?

kashif avatar Mar 21 '23 12:03 kashif

@patrickvonplaten so the failing docs are because note_seq installs the latest protobuf which causes issues with tensorboard etc. so one solution was to also pin protobuf to 3.20.x which seems to make everyone happy?

Ahh I see - yes let's pin it :-)

patrickvonplaten avatar Mar 21 '23 13:03 patrickvonplaten

Think we need a "make style" here and then we should be good to go :-)

patrickvonplaten avatar Mar 21 '23 14:03 patrickvonplaten

@patrickvonplaten I could not reproduce the one failed mps test...

kashif avatar Mar 21 '23 16:03 kashif

Think the MPS failure is unrelated - let's merge once the tests now are more or less green :-)

patrickvonplaten avatar Mar 23 '23 12:03 patrickvonplaten