Identifying Patterns For Short Answer Scoring
Using Graph-based Lexico-Semantic Text Matching
Ramachandran et al 2015
Features
rubric text
top-scoring student responses, prompt and stimulus text
used to generate text patterns containing content tokens
used to generate text patterns containing sentence structure information
Model (supervised)
Tandalla’s Approach: two Random Forests and two Gradient Boosting Machines; regression problem on Kaggle Short Answer Dataset
Generators of regular expressions which are used as features. See sections 3.1 and 3.2.
Results
Kaggle Short Answer Dataset
On 8 out of the 10 sets our patterns perform better than the manual regular expressions.
The mean QW Kappa achieved by our patterns is
0.78 and that achieved by Tandalla’s manual regular
expressions is 0.77
Mohler et al. (2011)’s Short Answer Dataset (Pearson = 0.52, RMSE = 0.98, Md(RMSE) = 0.86)
We use a Random Forest regressor as the learner
to build models. The learner is trained on the average of the human grades. We stack results from models created with each type of pattern to compute final results. (Mohler 2011 CS dataset)
On questions
On assigments
Pearson = 0.61, RMSE = 0.88, Md(RMSE) = 0.02
Pearson = 0.61, RMSE = 0.86, Md(RMSE) = 0.77
Room for improvement
To use RNN model trained on each question with domain corpus vectors as words ??
Achieve better performance without regexes ?