Downloads: 16

Files in This Item:
File Description SizeFormat 
27660400.2022.2109447.pdf6.28 MBAdobe PDFView/Open
Title: Effects of data bias on machine-learning–based material discovery using experimental property data
Authors: Kumagai, Masaya  kyouindb  KAKEN_id
Ando, Yuki
Tanaka, Atsumi
Tsuda, Koji
Katsura, Yukari
Kurosaki, Ken  kyouindb  KAKEN_id  orcid https://orcid.org/0000-0002-3015-3206 (unconfirmed)
Author's alias: 熊谷, 将也
黒﨑, 健
Keywords: machine learning
material informatics
large-scale material data
data bias
Issue Date: Dec-2022
Publisher: Taylor & Francis
Journal title: Science and Technology of Advanced Materials: Methods
Volume: 2
Issue: 1
Start page: 302
End page: 309
Abstract: Materials informatics (MI) research, which is the discovery of new materials through machine learning (ML) using large-scale material data, has attracted considerable attention in recent years. However, in general, the large-scale material data used in MI are biased owing to differences in the targeted material domains. Moreover, most studies on MI have not clearly demonstrated the influence of data bias on ML models. In this study, we clarify the influence of data bias on ML models by combining the concept of the applicability domain and clustering for large-scale experimental property data in the Starrydata2 material database previously developed by our group. The results show that data bias influences the error and reliability of the predictions made by the ML model. The predictions of the ML model within the applicability domain are highly reliable compared to those made outside the domain. This indicates that the material space that can be reliably discovered by the constructed ML model is limited. Nonetheless, we apply the ML model to a large dataset comprising various material classes and find that new materials similar to known materials can be proposed within a limited space. Thus, our findings demonstrate the importance of considering data bias when constructing and evaluating ML models in MI.
Rights: © 2022 The Author(s). Published by National Institute for Materials Science in partnership with Taylor & Francis Group
This is an Open Access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
URI: http://hdl.handle.net/2433/285724
DOI(Published Version): 10.1080/27660400.2022.2109447
Appears in Collections:Journal Articles

Show full item record

Export to RefWorks


Export Format: 


This item is licensed under a Creative Commons License Creative Commons