A Class Density-Weighted Gain Ratio Feature Selection for Multiclass Student Engagement Classification
Keywords:
Student Engagement, Feature Selection, Class Imbalance, Machine Learning, ClassificationAbstract
Educational Data Mining (EDM) uses vast educational datasets for discovering meaningful student participation patterns and academic achievements. Developing accurate multiclass classification models remains challenging due to it difficulties caused by class imbalance issues and irrelevant as well as redundant attributes. Filter-based feature selection methods demonstrate efficiency yet prove ineffective at resolving these problems so they create biased output performance which targets majority classes specifically. This study introduces Equitable Gain Ratio Feature Selection (EquiGR) which utilizes k-nearest neighbors to weight the class density levels for better minority group representation. The uses of Spearman Correlation Coefficient to detect and remove both strongly related redundant features along with low-ranking ones. The evaluation of proposed EquiGR method relied on four machine learning algorithms: Random Forest (RF), Naïve Bayes (NB), Support Vector Machine (SVM) and Logistic Regression (LR) as different learning paradigms for assessment. The experimental analysis of the imbalanced dataset with AE:PE:NE class distribution = 3624:4264:1183 showed EquiGR delivered better outcomes than baseline feature selection techniques for accuracy measures alongside precision and recall and F1-score metrics. The combination of RF with EquiGR reached 92.23% accuracy and 92.48% value for the NE-class F1-score. The proposed method demonstrates effective enhancement of classification results while showing remarkable improvements for minority class predictions in educational predictive modeling scenarios.