Identifying Patterns For Short Answer Scoring
Using Graph-based Lexico-Semantic Text Matching
Ramachandran et al 2015

Features

rubric text

top-scoring student responses, prompt and stimulus text

used to generate text patterns containing content tokens

used to generate text patterns containing sentence structure information

Model (supervised)

Tandalla’s Approach: two Random Forests and two Gradient Boosting Machines; regression problem on Kaggle Short Answer Dataset

Generators of regular expressions which are used as features. See sections 3.1 and 3.2.

Results

Kaggle Short Answer Dataset

On 8 out of the 10 sets our patterns perform better than the manual regular expressions.

The mean QW Kappa achieved by our patterns is
0.78 and that achieved by Tandalla’s manual regular
expressions is 0.77

Mohler et al. (2011)’s Short Answer Dataset (Pearson = 0.52, RMSE = 0.98, Md(RMSE) = 0.86)

We use a Random Forest regressor as the learner
to build models. The learner is trained on the average of the human grades. We stack results from models created with each type of pattern to compute final results. (Mohler 2011 CS dataset)

On questions

On assigments

Pearson = 0.61, RMSE = 0.88, Md(RMSE) = 0.02

Pearson = 0.61, RMSE = 0.86, Md(RMSE) = 0.77

Room for improvement

To use RNN model trained on each question with domain corpus vectors as words ??

Achieve better performance without regexes ?