Data for the system described in the paper submitted at Interspeech 2021

In this page I provide input features used for the experiments described in the paper accepted at Interspeech 2021, and for the submission at NeurIPS 2021, including also wav2vec 2.0 models fine-tuned on MEDIA and features extracted with such models. For a description of the system please see our git repository for Interspeech 2021.

Wav2Vec 2.0 models fine-tuned on MEDIA

Model description	Link
Self-supervised fine-tuned models
W2V2-Fr-3K-large	Download
W2V2-Fr-7K-large	Download
XLSR53-large	Download
Supervised fine-tuned models (for ASR)
W2V2-Fr-3K-large	Download
W2V2-Fr-7K-large	Download
XLSR53-large	Download

Features

The features must be used as input to the system with the option --serialized-corpus data-prefix. data-prefix is the common prefix for all filenames (train, dev, test and dict). For example, for using spectrogram features of the MEDIA corpus (the only currently provided here), the option for the system is --serialized-corpus MEDIA.user+machine.spectro-Fr-Normalized.data.
All splits plus the dictionary must be downloaded for the system to work.

Feature description
Type	Train	Dev	Test	Dict	SLU Model
Spectrogram	Download	Download	Download	Download	Download
W2V2-En-base	Download	Download	Download	Download	Download
W2V2-En-large	Download	Download	Download	Download	Download
W2V2-Fr-1K-base	Download	Download	Download	Download	Download
W2V2-Fr-1K-large	Download	Download	Download	Download	Download
W2V2-Fr-2.6K-base	Download	Download	Download	Download	Download
W2V2-Fr-3K-base	Download	Download	Download	Download	Download
W2V2-Fr-3K-large	Download	Download	Download	Download	Download
W2V2-Fr-7K-base	Download	Download	Download	Download	Download
W2V2-Fr-7K-large	Download	Download	Download	Download	Download
XLSR53	Download	Download	Download	Download	Download
Features from self-supervised fine-tuned models
W2V2-Fr-3K-large	Download	Download	Download	Download	Download
W2V2-Fr-7K-large	Download	Download	Download	Download	Download
XLSR53	Download	Download	Download	Download	Download
Features from supervised fine-tuned models (for ASR)
W2V2-Fr-3K-large	Download	Download	Download	Download	Download
W2V2-Fr-7K-large	Download	Download	Download	Download	Download
XLSR53	Download	Download	Download	Download	Download

Results

In the following table we report results obtained on the MEDIA corpus with the system described in the Interspeech 2021 paper, and in the repository.

Token decoding (Word Error Rate)
Model	Input Features	DEV ER	TEST ER
Comparison to our previous work
ICASSP 2020 Seq	Spectrogram	29.42	28.71
Interspeech 2021
Kheops+Basic	Spectrogram	36.25	37.16

Kheops+Basic	W2V2-En-base	19.80	21.78
Kheops+Basic	W2V2-En-large	24.44	26.96

Kheops+Basic	W2V2-Fr-S-base	23.11	25.22
Kheops+Basic	W2V2-Fr-S-large	18.48	19.92
Kheops+Basic	W2V2-Fr-M-base	14.97	16.37
Kheops+Basic	W2V2-Fr-M-large	11.77	12.85

Kheops+Basic	XLSR53-large	14.98	15.74
Concept decoding (Concept Error Rate)
Model	Input Features	DEV ER	TEST ER
Comparison to our previous work
ICASSP 2020 Seq	Spectrogram	28.11	27.52
ICASSP 2020 XT	Spectrogram	23.39	24.02
Interspeech 2021
Kheops+Basic	Spectrogram	39.66	40.76
Kheops+Basic +token	Spectrogram	34.38	34.74
Kheops+LSTM +SLU	Spectrogram	33.63	34.76

Kheops+Basic +token	W2V2-En-base	26.79	26.57
Kheops+LSTM +SLU	W2V2-En-base	26.31	26.11
Kheops+Basic +token	W2V2-En-large	29.31	30.39
Kheops+LSTM +SLU	W2V2-En-large	28.38	28.57

Kheops+Basic +token	W2V2-Fr-S-base	27.18	28.27
Kheops+LSTM +SLU	W2V2-Fr-S-base	26.16	26.69
Kheops+Basic +token	W2V2-Fr-S-large	23.34	23.75
Kheops+LSTM +SLU	W2V2-Fr-S-large	22.53	23.03
Kheops+Basic +token	W2V2-Fr-M-base	22.11	21.30
Kheops+LSTM +SLU	W2V2-Fr-M-base	22.56	22.24
Kheops+Basic +token	W2V2-Fr-M-large	21.72	21.35
Kheops+LSTM +SLU	W2V2-Fr-M-large	18.54	18.62

Kheops+Basic +token	XLSR53-large	21.00	20.67
Kheops+LSTM +SLU	XLSR53-large	20.34	19.73