change_detection
change_detection copied to clipboard
Code for Speaker Change Detection in Broadcast TV using Bidirectional Long Short-Term Memory Networks
:warning: This repository is no longer maintained: it has been integrated into pyannote.audio.
Speaker Change Detection using Bi-LSTM
Code for Speaker Change Detection in Broadcast TV using Bidirectional Long Short-Term Memory Networks
Citation
@inproceedings{Yin2017,
Author = {Ruiqing Yin and Herv\'e Bredin and Claude Barras},
Title = {{Speaker Change Detection in Broadcast TV using Bidirectional Long Short-Term Memory Networks}},
Booktitle = {{18th Annual Conference of the International Speech Communication Association, Interspeech 2017}},
Year = {2017},
Month = {August},
Address = {Stockholm, Sweden},
Url = {https://github.com/yinruiqing/change_detection}
}
Installation
Foreword: The code is based on pyannote. You can also find a similar Readme file for TristouNet .
$ conda create --name change-detection python=2.7 anaconda
$ source activate change-detection
$ conda install gcc
$ conda install -c yaafe yaafe=0.65
$ pip install "pyannote.audio==0.2.1"
$ pip install pyannote.db.etape
What did I just install?
keras(and itstheanobackend) is used for all things deep. If you want to useGPU, please downgradeKerasversion to1.2.0yaafeis used for MFCC feature extraction inpyannote.audio.pyannote.audiois the core for this project. (Model architecture, optimizer, sequence generator)pyannote.db.etapeis the ETAPE plugin forpyannote.database, a common API for multimedia databases and experimental protocols (e.g.train/dev/testsets definition).
Then, edit ~/.keras/keras.json to configure keras with theano backend.
$ cat ~/.keras/keras.json
{
"image_dim_ordering": "th",
"epsilon": 1e-07,
"floatx": "float32",
"backend": "theano"
}
About the ETAPE database
To reproduce the experiment, you obviously need to have access to the ETAPE corpus.
It can be obtained from ELRA catalogue.
However, if you own another corpus with "who speaks when" annotations, you can fork pyannote.db.etape and adapt the code to your own database.
Training and evaluation
You can use:
$ pyannote-change-detection -h
to find the usage information.
change detection
Usage:
pyannote-change-detection train [--database=<db.yml> --subset=<subset>] <experiment_dir> <database.task.protocol>
pyannote-change-detection evaluate [--database=<db.yml> --subset=<subset> --epoch=<epoch> --min_duration=<min_duration>] <train_dir> <database.task.protocol>
pyannote-change-detection apply [--database=<db.yml> --subset=<subset> --threshold=<threshold> --epoch=<epoch> --min_duration=<min_duration>] <train_dir> <database.task.protocol>
pyannote-change-detection -h | --help
pyannote-change-detection --version
...
Example of config file can be found in change_detection/config/. Before doing the training and evaluation, you can clone this project to local directory.
git clone https://github.com/yinruiqing/change_detection.git
Training
$ pyannote-change-detection train --database change_detection/config/db.yml --subset train change_detection/config Etape.SpeakerDiarization.TV
This is the expected output:
Epoch 1/100
62464/62464 [==============================] - 171s - loss: 0.1543 - acc: 0.9669
Epoch 2/100
62464/62464 [==============================] - 117s - loss: 0.1375 - acc: 0.9692
Epoch 3/100
62464/62464 [==============================] - 115s - loss: 0.1376 - acc: 0.9691
...
Epoch 50/100
62464/62464 [==============================] - 112s - loss: 0.0903 - acc: 0.9724
...
Epoch 98/100
62464/62464 [==============================] - 115s - loss: 0.0837 - acc: 0.9732
Epoch 99/100
62464/62464 [==============================] - 112s - loss: 0.0839 - acc: 0.9732
Epoch 100/100
62464/62464 [==============================] - 112s - loss: 0.0840 - acc: 0.9731
Evaluation
$ pyannote-change-detection evaluate --database change_detection/config/db.yml --subset development change_detection/config/train/Etape.SpeakerDiarization.TV Etape.SpeakerDiarization.TV
This is the expected output:
0 95.720% 36.603%
0.0526316 95.526% 49.018%
0.105263 95.213% 57.660%
...
0.526316 92.396% 83.756%
...
0.894737 88.155% 91.139%
0.947368 87.468% 92.046%
1 86.929% 92.672%
The first column is threshold, the second column is purity and the third column is coverage.