Dati per il sistem descritto nell'articolo sottomesso a Interspeech 2021
Su questa pagina potete scaricare le features utilizzate negli esperimenti descritti nell'articolo accettato a Interspeech 2021, e per la sottomissione a NeurIPS, inclusi i modelli wav2vec 2.0 ottimizzati sul corpus MEDIA e le features estratte con tali modelli. Per una descrizione su come utilizzare il sistema descritto nell'articolo rimando al nostro repository git per Interspeech 2021.
Wav2Vec 2.0 models fine-tuned on MEDIA
Model description | Link |
---|---|
Self-supervised fine-tuned models | |
W2V2-Fr-3K-large | Download |
W2V2-Fr-7K-large | Download |
XLSR53-large | Download |
Supervised fine-tuned models (for ASR) | |
W2V2-Fr-3K-large | Download |
W2V2-Fr-7K-large | Download |
XLSR53-large | Download |
Features
The features must be used as input to the system with the option --serialized-corpus data-prefix. data-prefix is the common prefix for all filenames (train, dev, test and dict). For example, for using spectrogram features of the MEDIA corpus (the only currently provided here), the option for the system is --serialized-corpus MEDIA.user+machine.spectro-Fr-Normalized.data.
All splits plus the dictionary must be downloaded for the system to work.
All splits plus the dictionary must be downloaded for the system to work.
Risultati
Nella tabella seguente riportiamo i risultati ottenuti sul corpus MEDIA con sistema descritto nell'articolo Interspeech 2021 e nel repository git.
Token decoding (Word Error Rate) | |||
---|---|---|---|
Model | Input Features | DEV ER | TEST ER |
Comparison to our previous work | |||
ICASSP 2020 Seq | Spectrogram | 29.42 | 28.71 |
Interspeech 2021 | |||
Kheops+Basic | Spectrogram | 36.25 | 37.16 |
Kheops+Basic | W2V2-En-base | 19.80 | 21.78 |
Kheops+Basic | W2V2-En-large | 24.44 | 26.96 |
Kheops+Basic | W2V2-Fr-S-base | 23.11 | 25.22 |
Kheops+Basic | W2V2-Fr-S-large | 18.48 | 19.92 |
Kheops+Basic | W2V2-Fr-M-base | 14.97 | 16.37 |
Kheops+Basic | W2V2-Fr-M-large | 11.77 | 12.85 |
Kheops+Basic | XLSR53-large | 14.98 | 15.74 |
Concept decoding (Concept Error Rate) | |||
Model | Input Features | DEV ER | TEST ER |
Comparison to our previous work | |||
ICASSP 2020 Seq | Spectrogram | 28.11 | 27.52 |
ICASSP 2020 XT | Spectrogram | 23.39 | 24.02 |
Interspeech 2021 | |||
Kheops+Basic | Spectrogram | 39.66 | 40.76 |
Kheops+Basic +token | Spectrogram | 34.38 | 34.74 |
Kheops+LSTM +SLU | Spectrogram | 33.63 | 34.76 |
Kheops+Basic +token | W2V2-En-base | 26.79 | 26.57 |
Kheops+LSTM +SLU | W2V2-En-base | 26.31 | 26.11 |
Kheops+Basic +token | W2V2-En-large | 29.31 | 30.39 |
Kheops+LSTM +SLU | W2V2-En-large | 28.38 | 28.57 |
Kheops+Basic +token | W2V2-Fr-S-base | 27.18 | 28.27 |
Kheops+LSTM +SLU | W2V2-Fr-S-base | 26.16 | 26.69 |
Kheops+Basic +token | W2V2-Fr-S-large | 23.34 | 23.75 |
Kheops+LSTM +SLU | W2V2-Fr-S-large | 22.53 | 23.03 |
Kheops+Basic +token | W2V2-Fr-M-base | 22.11 | 21.30 |
Kheops+LSTM +SLU | W2V2-Fr-M-base | 22.56 | 22.24 |
Kheops+Basic +token | W2V2-Fr-M-large | 21.72 | 21.35 |
Kheops+LSTM +SLU | W2V2-Fr-M-large | 18.54 | 18.62 |
Kheops+Basic +token | XLSR53-large | 21.00 | 20.67 |
Kheops+LSTM +SLU | XLSR53-large | 20.34 | 19.73 |