Downloads: 107

Files in This Item:
File Description SizeFormat 
j.csl.2017.11.001.pdf758.4 kBAdobe PDFView/Open
Title: Exploiting Automatic Speech Recognition Errors to Enhance Partial and Synchronized Caption for Facilitating Second Language Listening
Authors: Mirzaei, Maryam Sadat
Meshgi, Kourosh
Kawahara, Tatsuya  kyouindb  KAKEN_id  orcid (unconfirmed)
Author's alias: 河原, 達也
Keywords: Computer-assisted language learning
Second language listening skill
Automatic speech recognition
Partial and synchronized caption
Issue Date: May-2018
Publisher: Elsevier BV
Journal title: Computer Speech and Language
Volume: 49
Start page: 17
End page: 36
Abstract: This paper addresses the viability of using Automatic Speech Recognition (ASR) errors as the predictor of difficulties in speech segments, thereby exploiting them to improve Partial and Synchronized Caption (PSC), which we have proposed to train second language (L2) listening skill by encouraging listening over reading. The system uses ASR technology to make word-level text-to-speech synchronization and generates a partial caption. The baseline system determines difficult words based on three features: speech rate, word frequency and specificity. While it encompasses most of the difficult words, it does not cover a wide range of features that hinder L2 listening. Therefore, we propose the use of ASR systems as a model of L2 listeners and hypothesize that ASR errors can predict challenging speech segments for these learners. Among different cases of ASR errors, annotation results suggest the usefulness of four categories of homophones, minimal pairs, negatives, and breached boundaries for L2 listeners. A preliminary experiment with L2 learners focusing on these four categories of the ASR errors revealed that these cases highlight the problematic speech regions for L2 listeners. Based on the findings, the PSC system is enhanced to incorporate these kinds of useful ASR errors. An experiment with L2 learners demonstrated that the enhanced version of PSC is not only preferable, but also more helpful to facilitate the L2 listening process.
Rights: © 2018. This manuscript version is made available under the CC-BY-NC-ND 4.0 license
The full-text file will be made open to the public on 1 May 2020 in accordance with publisher's 'Terms and Conditions for Self-Archiving'.
This is not the published version. Please cite only the published version.
DOI(Published Version): 10.1016/j.csl.2017.11.001
Appears in Collections:Journal Articles

Show full item record

Export to RefWorks

Export Format: 

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.