Daisuke Niizumi

Results 45 comments of Daisuke Niizumi

Hi thank you for writing requirements, I wanted to merge but I'd like to avoid hard limitation of library versions. I'm using PyTorch 1.x actually though it is specified as...

Hi, thanks for your interest. I'm glad to hear that the pre-trained weight is fairly suitable for your tasks so far. It looks like an environmental issue, such as the...

Hi, I summarized a guideline based on what I have experienced. https://github.com/nttcslab/m2d/blob/master/Guide_app.md Based on it, quick comments for your use case are: - Pre-training from scratch using 2 8GB GPUs...

> A small addendum for others: to setup distributed mode I had to adapt the command line to be CUDA_VISIBLE_DEVICES=0,1 python3 -m torch.distributed.launch --nproc_per_node=2 train_audio.py ... > > Without adding...

Regarding your questions about the four options, yes, they are your options. It was nice to figure out the 4th option. And I also recommend the 5th option. First, the...

Regarding the loss values, I have included a log of ICBHI 2017 fur-PT here. [examples/logs/log_m2d_x_vit_base-80x200p16x4-230814-Ddffsd50ks5blr0003bs128a2nr.3-e600.out](examples/logs/log_m2d_x_vit_base-80x200p16x4-230814-Ddffsd50ks5blr0003bs128a2nr.3-e600.out) -> Fixed to [example_logs.zip](https://github.com/nttcslab/m2d/releases/download/v0.1.0/example_logs.zip) As you can see, the loss would be around 0.4 in...

Please find logs here: [example_logs.zip](https://github.com/nttcslab/m2d/releases/download/v0.1.0/example_logs.zip) I added M2D, M2D-S logs in addition to the M2D-X for ICBHI2017 log. (And, I also updated the guide document.) And it's a great question....

Hi, thank you for your interest. Quick answer: The config/m2d.yaml is in another repository called EVAR. https://github.com/nttcslab/eval-audio-repr/blob/main/config/m2d.yaml This repository is an evaluation package for audio representations mainly used in our...

Thank you for your question. We'd like to close this issue now. Please feel free to reopen whenever you need!

Hi, a quick answer is no. We provide foundation models for general sounds.