Please enable JavaScript.
Coggle requires JavaScript to display documents.
Feature Selection using R caret package (rank features by imprtnce (mthd…
Feature Selection using R caret package
benefits
rank features in dataset according to their importance
know how to select features from dataset using
recursive feature elimination method
know how to remove redundant features
redundant features
features with highly correlated between each other is not good. many method work better without them
r caret package findCorrelation code will analyze correlation matrix and suggest which attribute that need to be remove
generally want to remove highly correlated features which is greater than 0.75
code example
et.seed(7)
library(mlbench)
library(caret)
data(PimaIndiansDiabetes)
correlationMatrix <- cor(PimaIndiansDiabetes[,1:8])
print(correlationMatrix)
highlyCorrelated <- findCorrelation(correlationMatrix, cutoff=0.5)
print(highlyCorrelated)
rank features by imprtnce
mthd like DecTree have built in tht report variable imprtnce
other algorithms can used roc curve analysis for each attr.
constructs an Learning Vector Quantization (LVQ) model
used VarImp to estimate var imprtnce by print n plot
feature selection
coding
set.seed(7)
library(mlbench)
library(caret)
data(PimaIndiansDiabetes)
control <- rfeControl(functions=rfFuncs, method="cv", number=10)
results <- rfe(PimaIndiansDiabetes[,1:8], PimaIndiansDiabetes[,9], sizes=c(1:8), rfeControl=control)
print(results)
predictors(results)
plot(results, type=c("g", "o"))
in R caret, recursive feature elimination is popular
random forrest algorithm
automatic feature slctn can be used to build many model with different subset of features