Please enable JavaScript.
Coggle requires JavaScript to display documents.
FEATURE ENGINEERING (Feature Engineering Techniques (Tools (Amazon Glue…
FEATURE ENGINEERING
Feature Engineering Techniques
Imputation
Drop
Mean
Median
For TimeSeries data use DeepAR
KNN for NonTime Series
Handling Outliers
Detect Outlier
Percentile
Standard deviation
Binning
Normal Binning
Quantile Binninge.i.e. put same kind of data in same bucket.and binnings are equal.
Log Transform
Log based scaling data
One-Hot Encoding
Ordinal(Order is important.e.g. Ratings)
Nominal:Order is not important for e.g. Category
Feature Split
For e.g. Date is devide into month-year-days to grab more information
Scaling
Scale various column in same scale
Normalization. Scale value between 0 and 1
Standardization -> in case outlier
become problem you should use this
SageMaker GroundTrouth
Feature Selection
Drop Column
Implement PCA to Reduce Dataset for fast computing
Data Shuffling
Unbalanced Data
Text-Based Engineering
TF-IDF
pending:TF-IDF Example of Acloud.guru
Filter out less important words.
N-Gram Transformation
Use to use find Phrases (Group of word)
OSB(ORTHOGONAL SPARSE BIGRAM),
window of 2-words.
"_" represent distance of perticular word.
mantra is bad boy->
mantra_is,mantra
bad,mantra_
boy
common word combination/bigram
CARTESIAN PRODUCT TRANSFORMATION
Bag of Words
Tools
Amazon Glue
Managed Apache Spark Environment
Notebook suppose to host in Sagemaker
EMR
Athena
SageMaker
Data Pipe - Line
Data PipeLine