Improving imbalanced classification using near-miss instances

Tanimoto, Akira; Yamada, So; Takenouchi, Takashi; Sugiyama, Masashi; Kashima, Hisashi

このアイテムのアクセス数: 133

http://hdl.handle.net/2433/279254

このアイテムのファイル:

ファイル	記述	サイズ	フォーマット
j.eswa.2022.117130.pdf		1.96 MB	Adobe PDF	見る/開く

完全メタデータレコード

DCフィールド	値	言語
dc.contributor.author	Tanimoto, Akira	en
dc.contributor.author	Yamada, So	en
dc.contributor.author	Takenouchi, Takashi	en
dc.contributor.author	Sugiyama, Masashi	en
dc.contributor.author	Kashima, Hisashi	en
dc.contributor.alternative	谷本, 啓	ja
dc.contributor.alternative	鹿島, 久嗣	ja
dc.date.accessioned	2023-02-13T09:37:42Z	-
dc.date.available	2023-02-13T09:37:42Z	-
dc.date.issued	2022-09-01	-
dc.identifier.uri	http://hdl.handle.net/2433/279254	-
dc.description.abstract	The class imbalance is a major issue in classification, i.e., the sample size of a rare class (positive) is often a performance bottleneck. In real-world situations, however, “near-miss” positive instances, i.e., negative but nearly-positive instances, are sometimes plentiful. For example, natural disasters such as floods are rare, while there are relatively plentiful near-miss cases where actual floods did not occur but the water level approached the bank height. We show that even when the true positive cases are quite limited, such as in disaster forecasting, the accuracy can be improved by obtaining refined label-like side-information “positivity” (e.g., the water level of the river) to distinguish near-miss cases from other negatives. Conventional cost-sensitive classification cannot utilize such side-information, and the small size of the positive sample causes high estimation variance. Our approach is in line with learning using privileged information (LUPI), which exploits side-information for training without predicting the side-information itself. We theoretically prove that our method reduces the estimation variance, provided that near-miss positive instances are plentiful, in exchange for additional bias. Results of extensive experiments demonstrate that our method tends to outperform or compares favorably to existing approaches.	en
dc.language.iso	eng	-
dc.publisher	Elsevier BV	en
dc.rights	© 2022 The Authors. Published by Elsevier Ltd.	en
dc.rights	This is an open access article under the CC BY license.	en
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/	-
dc.subject	Imbalanced classification	en
dc.subject	Learning using privileged information	en
dc.subject	Generalized distillation	en
dc.title	Improving imbalanced classification using near-miss instances	en
dc.type	journal article	-
dc.type.niitype	Journal Article	-
dc.identifier.jtitle	Expert Systems with Applications	en
dc.identifier.volume	201	-
dc.relation.doi	10.1016/j.eswa.2022.117130	-
dc.textversion	publisher	-
dc.identifier.artnum	117130	-
dcterms.accessRights	open access	-
datacite.awardNumber	20K03753	-
datacite.awardNumber	19H04071	-
datacite.awardNumber	20H04244	-
datacite.awardNumber.uri	https://kaken.nii.ac.jp/grant/KAKENHI-PROJECT-20K03753/	-
datacite.awardNumber.uri	https://kaken.nii.ac.jp/grant/KAKENHI-PROJECT-19H04071/	-
datacite.awardNumber.uri	https://kaken.nii.ac.jp/grant/KAKENHI-PROJECT-20H04244/	-
dc.identifier.pissn	0957-4174	-
dc.identifier.eissn	1873-6793	-
jpcoar.funderName	日本学術振興会	ja
jpcoar.funderName	日本学術振興会	ja
jpcoar.funderName	日本学術振興会	ja
jpcoar.awardTitle	非確率モデルを用いた統計的推定の枠組みの構築とヘテロな構造を持つデータへの応用	ja
jpcoar.awardTitle	高次元・大規模・多ドメインデータの特徴抽出と情報統合による統計的学習	ja
jpcoar.awardTitle	複雑な関係データに基づく意思決定のための機械学習研究	ja
出現コレクション:	学術雑誌掲載論文等

アイテムの簡略レコードを表示する

Export to RefWorks

このアイテムは次のライセンスが設定されています: クリエイティブ・コモンズ・ライセンス