BERTMeSH: deep contextual representation learning for large-scale high-performance MeSH indexing with full text

You, Ronghui; Liu, Yuxuan; Mamitsuka, Hiroshi; Zhu, Shanfeng

このアイテムのアクセス数: 139

http://hdl.handle.net/2433/275589

このアイテムのファイル:

ファイル	記述	サイズ	フォーマット
bioinformatics_btaa837.pdf		517.97 kB	Adobe PDF	見る/開く

完全メタデータレコード

DCフィールド	値	言語
dc.contributor.author	You, Ronghui	en
dc.contributor.author	Liu, Yuxuan	en
dc.contributor.author	Mamitsuka, Hiroshi	en
dc.contributor.author	Zhu, Shanfeng	en
dc.contributor.alternative	馬見塚, 拓	ja
dc.date.accessioned	2022-07-25T06:23:09Z	-
dc.date.available	2022-07-25T06:23:09Z	-
dc.date.issued	2021-03-01	-
dc.identifier.uri	http://hdl.handle.net/2433/275589	-
dc.description.abstract	[Motivation] With the rapid increase of biomedical articles, large-scale automatic Medical Subject Headings (MeSH) indexing has become increasingly important. FullMeSH, the only method for large-scale MeSH indexing with full text, suffers from three major drawbacks: FullMeSH (i) uses Learning To Rank, which is time-consuming, (ii) can capture some pre-defined sections only in full text and (iii) ignores the whole MEDLINE database.[Results] We propose a computationally lighter, full text and deep-learning-based MeSH indexing method, BERTMeSH, which is flexible for section organization in full text. BERTMeSH has two technologies: (i) the state-of-the-art pre-trained deep contextual representation, Bidirectional Encoder Representations from Transformers (BERT), which makes BERTMeSH capture deep semantics of full text. (ii) A transfer learning strategy for using both full text in PubMed Central (PMC) and title and abstract (only and no full text) in MEDLINE, to take advantages of both. In our experiments, BERTMeSH was pre-trained with 3 million MEDLINE citations and trained on ∼1.5 million full texts in PMC. BERTMeSH outperformed various cutting-edge baselines. For example, for 20 K test articles of PMC, BERTMeSH achieved a Micro F-measure of 69.2%, which was 6.3% higher than FullMeSH with the difference being statistically significant. Also prediction of 20 K test articles needed 5 min by BERTMeSH, while it took more than 10 h by FullMeSH, proving the computational efficiency of BERTMeSH.	en
dc.language.iso	eng	-
dc.publisher	Oxford University Press (OUP)	en
dc.rights	This is a pre-copyedited, author-produced PDF of an article accepted for publication in 'Bioinformatics' following peer review. The version of record [Bioinformatics, Volume 37, Issue 5, 1 March 2021, Pages 684–692] is available online at: https://doi.org/10.1093/bioinformatics/btaa837	en
dc.rights	The full-text file will be made open to the public on 25 September 2021 in accordance with publisher's 'Terms and Conditions for Self-Archiving'	en
dc.rights	This is not the published version. Please cite only the published version. この論文は出版社版でありません。引用の際には出版社版をご確認ご利用ください。	en
dc.title	BERTMeSH: deep contextual representation learning for large-scale high-performance MeSH indexing with full text	en
dc.type	journal article	-
dc.type.niitype	Journal Article	-
dc.identifier.jtitle	Bioinformatics	en
dc.identifier.volume	37	-
dc.identifier.issue	5	-
dc.identifier.spage	684	-
dc.identifier.epage	692	-
dc.relation.doi	10.1093/bioinformatics/btaa837	-
dc.textversion	author	-
dc.identifier.pmid	32976559	-
dcterms.accessRights	open access	-
datacite.date.available	2021-09-25	-
datacite.awardNumber	JPMJAC1503	-
datacite.awardNumber	19H04169	-
datacite.awardNumber.uri	https://projectdb.jst.go.jp/grant/JST-PROJECT-15666456/	-
datacite.awardNumber.uri	https://kaken.nii.ac.jp/ja/grant/KAKENHI-PROJECT-19H04169/	-
dc.identifier.pissn	1367-4803	-
dc.identifier.eissn	1460-2059	-
jpcoar.funderName	科学技術振興機構	ja
jpcoar.funderName	日本学術振興会	ja
jpcoar.awardTitle	濃厚ポリマーブラシのレジリエンシー強化とトライボロジー応用	ja
jpcoar.awardTitle	複数のテンソルからの効率的なデータ構造推定	ja
出現コレクション:	学術雑誌掲載論文等

アイテムの簡略レコードを表示する

Export to RefWorks

このリポジトリに保管されているアイテムはすべて著作権により保護されています。