Please enable JavaScript.

Coggle requires JavaScript to display documents.

Chapter 10: Data Transformations (Creating new features from data…

- - - - If EmployeeID = 1, THEN Emp_1 = 1, ELSE Emp_1 = 0, ENDIF
  - - - "Dummy" encoding in statistics
- - - - Adding different columns can increase predictive signals
      - Ex: Adding family size to understand behavior in certain situations like air travel
    - - Subtracting one column from another to make the similarity or difference between them more apparent
      - Closer to 0 = the more simliar
      - Ex: Predicting whether a person is likely to engage in outside activities
    - - Similar to subtraction, but used in cases where the actual distance between two number is of importance
    - - When two related columns interact with a target
      - Interaction effect is often called a moderated relationship between a column and the target
      - Ex: bad interaction with customer service vs. bad tempered customer & bad interaction in relation to churn
        
        Churn = cancel customer realtionship
    - - Makes information that is otherwise hidden made available
      - Ex: Dividing income by number of kids might reveal available funds
    - - If # of seats in car is smaller than family size, might be predictive of purchasing a new car
    - - If # of bedrooms is small than or equal to the family size one year after a new child, may be predictive of buying bunk beds
    - - If family is larger than # of seats in car, camping vacations become less likely
    - - If # of seats is greater than or equal to family size, purchasing a new van likeliness may be lower
    - - When two data points are not the same, it can effect the prediction
      - If vibration of machine is different during operation than day before
    - - If two data points are the same, they may cancel each other out or indicate a higher likelihood of a target phenomenon occuring
    - - Ex: current interest of a continuously compounded loan
      - P = Ce**(rt) can be used to capture the exponential relationship of e and (r*t)
  - - - Used to linearize exponential data
      - Ex: Higher a family's total income, less likely to visit national parks since they could afford other experiences; however, love of national parks would trump doubling income at some point(ex: $500,000 to $1,000,00 is not likely to negatively impact desire to visit NPs
    - - Similar to log transformation, but works for a different distribution of data
    - - Makes large values even larger
      - Ex: Square St.dev to find variance