Liu Zihan issues

Results 6 issues of


                                            Liu Zihan

Inquiry on Audio Prompts Implementation in musicgen Model

I am currently exploring the musicgen model and have some questions regarding the application of audio prompts within the model's architecture, particularly in relation to the cross_attention layers: 1. **Role...

Poor Audio Quality with input_values Input in Parler_TTS

I am using the Parler_TTS model with a reference audio (`input_values`) during inference, similar to MusicGen, to perform continuation tasks. `model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids, input_values=input_values)` While the model continues in the style...

Impact of RoPE and Adding Prompts in Cross Attention on Model Performance

Hello, I noticed that in the recent architecture improvements, modules for RoPE positional encoding and adding Prompts in Cross Attention were included. However, it seems that the newly released two...

Large Discrepancy in CLAP Score When Evaluating Musicgen with Different CLAP Checkpoints

I observed a significant discrepancy in the CLAP scores when using different pretrained CLAP models to evaluate Musicgen. Specifically, I used two distinct pretrained CLAP checkpoints to assess Musicgen's performance...

Inquiry on Semantic Richness and Acoustic Fidelity Variation with n_q in XCodec, and Challenges in Scaling to 44kHz

Great work! I would like to inquire if there are any results available regarding the variation of semantic richness and acoustic fidelity as the number of n_q changes in XCodec....

Batch inference

Hi, I would like to inquire about performing batch inference using XCodec. Specifically, what is the expected shape of the `wav` input in the following code snippet ? Should the...