One value to predict
SMAPE is 0 when the predicted value is equal to the true value.
Under estimate is penalized more than an
above the true
value. This may lead to
many local minima
Two values to predict
In this case there are
two global minima
with SMAPE = 80 for the two cases where our constant prediction is equal to one of the value in the true series.
The function reaches a
with SMAPE = 100 for y_pred = 3.
And the value of the
(y_pred = 5) is about 95.24, i.e. it is
than the global minima.
More points, uniformly sampled distribution
of the smape function is met
near the median
which is good. It could explain why the public kernels do well.
median does well
The function is
discontinue at 0
It is equal to 200 everywhere except at 0 where it equals 0.
local minima at 100
when y_pred equals one of the values in the y_true series.
There is a
discontinuity at 0
of the series SMAPE(0, x) when x tends to 0 is
. It means that there is no local maxima near 0 as the value 200 cannot be reached. But we can get values as close as we want to 200
median is really not a good choice
here and that
0 would be way better.
Moreover, a gradient descent from anywhere except 0 will miss the global minima, by large.
Discontinuity at 0
to optimize SMAPE with constant predictions
10k random pages for train/validation
Log and back
64GB RAM server
Check how Olympic games influenced data
Problem with dayofweek of lags - has to be consistent
one-hot-encoding: rank by mean
LR result (after 50)
page ID constant (rank by mean)
Project/access/agent - didn't help
Rolling statistics (mean, std, max, median)
Rolling aggregates (peaks, counts)
with different timeperiod
always gives good прирост
Day lags for previous month
Day lags for previous year