Please enable JavaScript.
Coggle requires JavaScript to display documents.
Titanic Data Science Solutions (prepare data steps (analyse by (describing…
Titanic Data Science Solutions
use
speedml
python package for speed starting machine learning projects
work flow stages
question or problem definition
acquire training and testing data
wrangle, prepare, cleanse the data
analyse, identify patterns & explore the data
model, predict and solve the problem
visualize, report and present the problem solving steps and final solutions
supply or submit the results
question and problem definition
Knowing from a training set of samples listing passengers who survived or did not survive the Titanic disaster, can our model determine based on a given test dataset not containing the survival information, if these passengers in the test dataset survived or not.
work flow goals
classifying
correlating
converting
completing
correcting
creating
charting
best practice
performing feature correlation analysis early in the project
using multiple plots instead of overlays for readibility
prepare data steps
acquire data
analyse by
describing data
pivoting features
visualizing data
correct by
dropping features
creating new feature
by extracting from existing
converting a categorical feature
we can convert features which contains strings to numerical values
completing a numerical continious feature
a simple way is to generate random numbers between mean and standard deviation
more accurate way guessing missing values is to use other correlated features
combine both
create new feature combining existing features
completing a categorical feature
simply fill this with most common occurance
convert categorical feature to numerical
model predict & solve
machine learning methods
Logistic regression
KNN or k-nearest neighbors
support vector machines
Naive bayes classifier
decision tree
random forest
perceptron
artificial neural network
RVM-relevance vector machine
take a close look and do some research on each method
prepare data
fit model
train.df - > X_train
take a closer look at data preparation part