Downloads: 123

Files in This Item:
File Description SizeFormat 
ACCESS.2022.3156073.pdf2.79 MBAdobe PDFView/Open
Title: Fine Grain Synthetic Educational Data: Challenges and Limitations of Collaborative Learning Analytics
Authors: Flanagan, Brendan  kyouindb  KAKEN_id  orcid https://orcid.org/0000-0001-7644-997X (unconfirmed)
Majumdar, Rwitajit  KAKEN_id  orcid https://orcid.org/0000-0003-4671-0238 (unconfirmed)
Ogata, Hiroaki  kyouindb  KAKEN_id  orcid https://orcid.org/0000-0001-5216-1576 (unconfirmed)
Author's alias: 緒方, 広明
Keywords: Synthetic learner data
student modeling
data sharing
data challenge
Issue Date: 2022
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Journal title: IEEE Access
Volume: 10
Start page: 26230
End page: 26241
Abstract: While data privacy is a key aspect of Learning Analytics, it often creates difficulty when promoting research into underexplored contexts as it limits data sharing. To overcome this problem, the generation of synthetic data has been proposed and discussed within the LA community. However, there has been little work that has explored the use of synthetic data in real-world situations. This research examines the effectiveness of using synthetic data for training academic performance prediction models, and the challenges and limitations of using the proposed data sharing method. To evaluate the effectiveness of the method, we generate synthetic data from a private dataset, and distribute it to the participants of a data challenge to train prediction models. Participants submitted their models as docker containers for evaluation and ranking on holdout synthetic data. A post-hoc analysis was conducted on the top 10 participant’s models by comparing the evaluation of their performance on synthetic and private validation datasets. Several models trained on synthetic data were found to perform significantly poorer when applied to the non-synthetic private dataset. The main contribution of this research is to understand the challenges and limitations of applying predictive models trained on synthetic data in real-world situations. Due to these challenges, the paper recommends model designs that can inform future successful adoption of synthetic data in real-world educational data systems.
Rights: This work is licensed under a Creative Commons Attribution 4.0 License.
URI: http://hdl.handle.net/2433/279306
DOI(Published Version): 10.1109/ACCESS.2022.3156073
Appears in Collections:Journal Articles

Show full item record

Export to RefWorks


Export Format: 


This item is licensed under a Creative Commons License Creative Commons