国会会議録のための音声から書き言葉への end-to-end 変換

三村, 正人; 河原, 達也

このアイテムのアクセス数: 260

http://hdl.handle.net/2433/284724

このアイテムのファイル:

ファイル	記述	サイズ	フォーマット
jnlp.30.88.pdf		1.32 MB	Adobe PDF	見る/開く

完全メタデータレコード

DCフィールド	値	言語
dc.contributor.author	三村, 正人	ja
dc.contributor.author	河原, 達也	ja
dc.contributor.alternative	Mimura, Masato	en
dc.contributor.alternative	Kawahara, Tatsuya	en
dc.date.accessioned	2023-08-21T10:53:37Z	-
dc.date.available	2023-08-21T10:53:37Z	-
dc.date.issued	2023	-
dc.identifier.uri	http://hdl.handle.net/2433/284724	-
dc.description.abstract	従来の音声認識システムは，入力音声に現れるすべての単語を忠実に再現するように設計されているため，認識精度が高いときでも，人間にとって読みやすい文を出力するとは限らない．これに対して，本研究では，フィラーや言い誤りの削除，句読点や脱落した助詞の挿入，また口語的な表現の修正など，適宜必要な編集を行いながら，音声から直接可読性の高い書き言葉スタイルの文を出力する新しい音声認識のアプローチについて述べる．我々はこのアプローチを単一のニューラルネットワークを用いた音声から書き言葉への end-to-end 変換として定式化する．また，音声に忠実な書き起こしを疑似的に復元し，end-to-end モデルの学習を補助する手法と，句読点位置を手がかりとした新しい音声区分化手法も併せて提案する．700 時間の衆議院審議音声を用いた評価実験により，提案手法は音声認識とテキストベースの話し言葉スタイル変換を組み合わせたカスケード型のアプローチより高精度かつ高速に書き言葉を生成できることを示す．さらに，国会会議録作成時に編集者が行う修正作業を分類・整理し，これらについて提案システムの達成度と誤り傾向の分析を行う．	ja
dc.description.abstract	Because conventional automatic speech recognition (ASR) systems are designed to faithfully reproduce utterances word-by-word, their outputs are not necessarily easy to read even when they have few speech recognition errors. To address this issue, we propose a novel ASR approach that outputs readable and clean text directly from speech by removing fillers and disfluent regeons, substituting colloquial expressions with formal ones, insertintg punctuation and recovering omitted particles, and performing other types of appropriate corrections. We formalize this approach as an end-to-end generation of written-style text from speech using a single neural network. We also propose a method to guide the training of this end-to-end model using automatically generated faithful transcripts, as well as a novel speech segmentation strategy based on online punctuation detection. An evaluation using 700 hours of Japanese Parliamentary speech data demonstrates that the proposed direct approach successfully generates clean transcripts suitable for human consumption more accurately at a faster decoding speed than the conventional cascade approach. We also provide an in-depth analysis on the types of edits performed by professional human editors to create the official written records of Japanese Parliamentary meetings, and evaluate the level of achievement of the proposed system in terms of each of the edit types.	en
dc.language.iso	jpn	-
dc.publisher	言語処理学会	ja
dc.publisher.alternative	Association for Natural Language Processing	en
dc.rights	© 2023 一般社団法人言語処理学会	en
dc.rights	Licensed under CC BY 4.0	en
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/	-
dc.subject	end-to-end 音声認識	ja
dc.subject	話し言葉スタイル変換	ja
dc.subject	整形	ja
dc.subject	国会会議録	ja
dc.subject	End-to-End Speech Recognition	en
dc.subject	Speaking Style Transformation	en
dc.subject	Parliamentary Report	en
dc.title	国会会議録のための音声から書き言葉への end-to-end 変換	ja
dc.title.alternative	End-to-End Generation of Written-style Transcript of Speech from Parliamentary Meetings	en
dc.type	journal article	-
dc.type.niitype	Journal Article	-
dc.identifier.jtitle	自然言語処理	ja
dc.identifier.volume	30	-
dc.identifier.issue	1	-
dc.identifier.spage	88	-
dc.identifier.epage	124	-
dc.relation.doi	10.5715/jnlp.30.88	-
dc.textversion	publisher	-
dcterms.accessRights	open access	-
dc.identifier.pissn	1340-7619	-
dc.identifier.eissn	2185-8314	-
dc.identifier.jtitle-alternative	Journal of Natural Language Processing	en
出現コレクション:	学術雑誌掲載論文等

アイテムの簡略レコードを表示する

Export to RefWorks

このアイテムは次のライセンスが設定されています: クリエイティブ・コモンズ・ライセンス