Please enable JavaScript.
Coggle requires JavaScript to display documents.
Learning to Grade Short Answer Questions using Semantic SimilarityMeasures…
Learning to Grade Short Answer Questions using Semantic SimilarityMeasures and Dependency Graph Alignments
Mohler et al. 2011.
Features
36 are based upon the semantic similarity
of four subgraphs defined by Nx [0..3]. All eight WordNet-based similarity measures listed in Section 3.3 plus the LSA model are used to produce these features.
#
In the first stage (Section 3.1), the system is provided with the dependency graphs for each pair of
instructor (Ai) and student (As) answers. For each node in the instructor’s dependency graph, we compute a similarity score for each node in the student’s dependency graph based upon a set of lexical, semantic, and syntactic features applied to both the pair of nodes and their corresponding subgraphs
The remaining 32 features are lexicosyntactic features defined only for Nx 3
#
For a given answer pair (Ai, As), we assemble the
eight graph alignment scores into a feature vector
ψG(Ai, As)
We combine the alignment scores ψG(Ai, As) with
the scores ψB(Ai, As) from the lexical semantic similarity measures into a single feature vector
ψ(Ai, As) = [ψG(Ai, As)|ψB(Ai, As)]
#
Model (mostly supervised)
1)
Node matching model
: averaged
version of the perceptron algorithm
2a)
Graph to Graph Alignment
: Hungarian algorithm (the result is ψG(Ai, As))
2b)
Text similarity models
: shortest path
[PATH], [LCH], [WUP], [RES], [JCN], [HSO], and two corpusbased measures: Latent Semantic Analysis [LSA]
and Explicit Semantic Analysis [ESA]. The result is ψB(Ai, As)
3)
Regression
: An SVM model for regression (SVR)
Ranking
: An SVM model for ranking (SVMRank), both with linear kernel (quadratic and RBF kernels didn't improve performance significantly)
we use isotonic regression to convert the system
scores onto the same [0..5] scale used by the annotators
Room for improvement
Future work will concentrate on improving the
quality of the answer alignments by training a model
to directly output graph-to-graph alignments.
Achieve a better results :smiley:
Results
Question Demoting
One surprise while building this system was the consistency with which the novel technique of question demoting improved scores for the BOW similarity measures.
With this relatively minor change the average correlation between the BOW methods’ similarity scores and the student grades improved by up to 0.046 with an average improvement of 0.019
across all eleven semantic features.
for tf*idf measure, the improvement is 0.063 which
brings its RMSE score close to the lowest of all BOW metrics - reasons are note entirely clear
Perceptron Alignment
we find an F-measure of 0.72, with
precision(P) = 0.85 and recall(R) = 0.62
By manually varying the
threshold, we find a maximum
F-measure of 0.76, with P=0.79 and R=0.74
Alignment Score Grading
we first test the quality of the eight graph alignment
features ψG(Ai, As) independently
the basic alignment score (Hungarian algorithm) performs comparably to most BOW approaches
The introduction of idf weighting seems to degrade performance somewhat, while introducing question demoting causes the correlation with the grader to increase while increasing RMSE somewhat.
SVM Score Grading
BOW features
ψB(Ai, As) only
alignment features ψG(Ai, As)
in SVMRank:
full feature vector
: highest Pearson’s
ρ is 0.518
and lowest
0.998 RMSE
for normalized alignment features + BOW features;
lowest median RMSE is 0.865
for normalized alignment features
in SVR, best result is accomplished with
full feature vector
(normalized + unnormalized alignment features + BOW) :
ρ = 0.464
,
RMSE = 0.978
,
median RMSE = 0.862