Sunday, June 20, 2021

Testing AI fairness in predicting college dropout rate

To help college students in trouble before it’s too late, more universities are using machine learning models to identify students at risk of dropping out.

The information that goes into these models can have a huge impact on how accurate and fair they are, especially when it comes to protected student traits such as gender, race, and family income. But in a new study, the largest exam of a college AI system to date, researchers find no evidence that removing protected student characteristics from a model improves the accuracy or fairness of predictions.

This result surprised René Kizilcec, Assistant Professor of Information Science and Director of the Future of Learning Lab.

“We expected that removing socio-demographics would make the model less accurate, as those characteristics are well established when examining academic performance,” he said. “While we find that adding these attributes has no empirical advantage, we recommend including them in the model as it at least recognizes the existence of educational inequalities associated with it.”

Kizilcec is the senior author of Should College Dropout Prediction Models Include Protected Attributes? Presented at the Association for Computing Machinery Virtual Conference on Learning at Scale, May 22-25. June. The work was nominated for the conference’s Best Paper Award.

Co-authors are the Future of Learning Lab members Hannah Lee, graduate student in computer science, and lead author Renzhe Yu, graduate student at the University of California, Irvine.

For this work, Kizilcec and his team examined data from students in both a residential college and a fully online program. The institution in the study is a large public university in the American Southwest, which is not mentioned in the paper.

By systematically comparing predictive models with and without protected attributes, the researchers wanted to determine both how the inclusion of protected attributes affects the accuracy of the prediction of dropouts and whether the inclusion of protected attributes affects the fairness of the prediction of dropouts.

The researchers’ data set was huge: a total of 564,104 classroom course records for 93,457 unique students and 2,877 unique courses; and 81,858 online course records for 24,198 unique students and 874 unique courses.

From the data set, the Kizilcec team created 58 identifiers in four categories, including four protected attributes – student gender; University status of the first generation; Member of an under-represented minority group (neither defined as Asian nor white); and high financial needs. To determine the consequences of using protected attributes to predict drop-outs, the researchers generated two sets of features – one with protected attributes and one without.

Their main finding: The inclusion of four key protected attributes does not significantly affect three general measures of overall predictive performance when commonly used characteristics, including academic records, are already in the model.

“What is important in identifying students at risk is already explained by other attributes,” said Kizilcec. “Protected attributes don’t do much. There may be a gender gap or a race gap, but the link with dropout is negligible compared to traits like the previous GPA. “

Even so, Kizilcec and his team still advocate including protected attributes in predictive modeling. They point out that college data reflects long-standing injustices, and they cite recent work in the broader machine learning community that supports the notion of “fairness through awareness.”

“There is research showing that the way certain attributes, such as B. Academic performance that affects the likelihood of staying in college may vary between different protected attribute groups, ”he said. “By including student characteristics in the model, we can take this variation between different student groups into account.”

The authors concluded by stating, “We hope this study inspires more researchers in the learning analytics and educational data mining communities to explore issues of algorithmic bias and fairness in the models and systems they develop and evaluate deal. “

Kizilcec’s lab has worked far too far on algorithmic fairness in education, a subject that he feels has been under-explored.

“That’s partly because the algorithms [in education] are not as visible and often work differently than criminal justice or medicine, ”he said. “Education is not about sending someone to jail or being mistakenly diagnosed with cancer. But it can be a big deal for the individual student to be classified as at risk. “



source https://collegeeducationnewsllc.com/testing-ai-fairness-in-predicting-college-dropout-rate/

No comments:

Post a Comment