Please enable JavaScript.
Coggle requires JavaScript to display documents.
Data mining (Data and data type (Data : (Categorical (Ordinal, Nominal),…
Data mining
-
Data is collection of objects defined by attributes.
- Other names of attribute: variable, field, feature, predictor...
-
- Similariy and Dissimilarity
Similarity
- Numerical measure of how alike two data objects are
- Larger when objects are more alike
- Often falls in the range [0,1]
-
-
Dissimilarity
- Numerical measure of how difference two data objects are
- Smaller when objects are more alike
- Minimum dissimilarity is often 0; upper limit varies
- Euclidean distance
dist((x, y), (a, b)) = √(x - a)² + (y - b)²
-
Question to ask:
-
-
Missing values
Reasons:
- information is not collected or lost
- attributes may not be applicable to all cases
-
-
-
- how to detect these propolems?
-
-
Tecnichques:
Sampling
-
Sample zise: Note to take too small sample
-
-
-
-
-
-
-
- Data Exploration and visualization
-