TOP-DOWN SPEARMAN CORR

Usage Case

Compare feature importance of two models with emphasize on top most important features

Useful for SKEWED ranked-values such as ctr, clicks, ...

Compare two rank lists with emphasize on top or bottom items

How

Weighted Pearson correlation with flexible weight choices

E.g such as reciprocal of index, discount factor of NDCG

Savage score was used in the original paper

With the notation of lower index for more IMPORTANT rank, S[k] = summation{iii from k to n}(1.0/iii)

This is used as weights of Pearson correlation now

Hypothesis Testing

H0: Two ranking lists are independent

if the marginal distribution is skewed, then this statistic (top-down spearman corr) is powerful

if the marg dist is normal, Pearson correl is a better choice

If the marg dist is uniform, normal Spearman or Kendall's Tau are better choice

If the marg dist is at two extremes, not the center, Van Der Waerden is a better statistic

Distribution of top-down...

td spearman ~ Normal(0, 1.0/(n-1))

Order statistic

then, U_1, U_2, ..., U_n be order statistics of Uniform(0,1)

U_k ~ Beta(k, n+1-k) distribution

U_i = Cumulated_distribution(X_i)

E[U_k] = k/(n+1) and Var[U_k] = k (n+1-k)/(n+1)*2 / (n+2)

let X_1, ..., X_n be any continuous distribution and ASCENDING

E[U_(k+1) - U_k] = 1/(n+1) for all k. So come the (100* k/(n+1)) - percentile of the underlying distribution is X_k