Data for the system described in the paper submitted at Interspeech 2021
In this page I provide input features used for the experiments described in the paper accepted at Interspeech 2021, and for the submission at NeurIPS 2021, including also wav2vec 2.0 models fine-tuned on MEDIA and features extracted with such models. For a description of the system please see our git repository for Interspeech 2021.
Wav2Vec 2.0 models fine-tuned on MEDIA
Model description | Link |
---|---|
Self-supervised fine-tuned models | |
W2V2-Fr-3K-large | Download |
W2V2-Fr-7K-large | Download |
XLSR53-large | Download |
Supervised fine-tuned models (for ASR) | |
W2V2-Fr-3K-large | Download |
W2V2-Fr-7K-large | Download |
XLSR53-large | Download |
Features
The features must be used as input to the system with the option --serialized-corpus data-prefix. data-prefix is the common prefix for all filenames (train, dev, test and dict). For example, for using spectrogram features of the MEDIA corpus (the only currently provided here), the option for the system is --serialized-corpus MEDIA.user+machine.spectro-Fr-Normalized.data.
All splits plus the dictionary must be downloaded for the system to work.
All splits plus the dictionary must be downloaded for the system to work.
Results
In the following table we report results obtained on the MEDIA corpus with the system described in the Interspeech 2021 paper, and in the repository.
Token decoding (Word Error Rate) | |||
---|---|---|---|
Model | Input Features | DEV ER | TEST ER |
Comparison to our previous work | |||
ICASSP 2020 Seq | Spectrogram | 29.42 | 28.71 |
Interspeech 2021 | |||
Kheops+Basic | Spectrogram | 36.25 | 37.16 |
Kheops+Basic | W2V2-En-base | 19.80 | 21.78 |
Kheops+Basic | W2V2-En-large | 24.44 | 26.96 |
Kheops+Basic | W2V2-Fr-S-base | 23.11 | 25.22 |
Kheops+Basic | W2V2-Fr-S-large | 18.48 | 19.92 |
Kheops+Basic | W2V2-Fr-M-base | 14.97 | 16.37 |
Kheops+Basic | W2V2-Fr-M-large | 11.77 | 12.85 |
Kheops+Basic | XLSR53-large | 14.98 | 15.74 |
Concept decoding (Concept Error Rate) | |||
Model | Input Features | DEV ER | TEST ER |
Comparison to our previous work | |||
ICASSP 2020 Seq | Spectrogram | 28.11 | 27.52 |
ICASSP 2020 XT | Spectrogram | 23.39 | 24.02 |
Interspeech 2021 | |||
Kheops+Basic | Spectrogram | 39.66 | 40.76 |
Kheops+Basic +token | Spectrogram | 34.38 | 34.74 |
Kheops+LSTM +SLU | Spectrogram | 33.63 | 34.76 |
Kheops+Basic +token | W2V2-En-base | 26.79 | 26.57 |
Kheops+LSTM +SLU | W2V2-En-base | 26.31 | 26.11 |
Kheops+Basic +token | W2V2-En-large | 29.31 | 30.39 |
Kheops+LSTM +SLU | W2V2-En-large | 28.38 | 28.57 |
Kheops+Basic +token | W2V2-Fr-S-base | 27.18 | 28.27 |
Kheops+LSTM +SLU | W2V2-Fr-S-base | 26.16 | 26.69 |
Kheops+Basic +token | W2V2-Fr-S-large | 23.34 | 23.75 |
Kheops+LSTM +SLU | W2V2-Fr-S-large | 22.53 | 23.03 |
Kheops+Basic +token | W2V2-Fr-M-base | 22.11 | 21.30 |
Kheops+LSTM +SLU | W2V2-Fr-M-base | 22.56 | 22.24 |
Kheops+Basic +token | W2V2-Fr-M-large | 21.72 | 21.35 |
Kheops+LSTM +SLU | W2V2-Fr-M-large | 18.54 | 18.62 |
Kheops+Basic +token | XLSR53-large | 21.00 | 20.67 |
Kheops+LSTM +SLU | XLSR53-large | 20.34 | 19.73 |