Please enable JavaScript.

Coggle requires JavaScript to display documents.

Data Cleaning, Data Operation - Coggle Diagram

- - - - FuzzyWuzzy
      - Manually Mapping : dataframe['gender'].map({'m': 'male', fem.': 'female', ...})
      - Pattern Matching : re.sub(r"\^m\$", 'Male', 'male', flags=re.IGNORECASE)
    - - Pandas : df['column'].unique()
  - - - Pandas : df['column'].unique()
    - - Everything lower : Pandas : df['column'] = df['column'].str.lower()
      - Capitalized : Pandas : df['column'] = df['column'].str.title()
  - - - Pandas : df['column'].unique()
    - - Pandas : df['column'] = df['column'].str.strip()
- - - - Pair-Wise Deletion
      - Drop the whole column
        
        Pandas : df.drop(['column'], axis=1)
        (axis = 0 >> index ; axis = 1 >> column default = 0)
      - List-Wise Deletion
        
        Pandas : df.dropna()
        
        Pandas : df.dropna(how='all')
        
        Pandas : df.dropna(subset=['column'])
    - - Statistical Value
        
        Pandas : df['column'] = df['column'].fillna(df['column'].mean())
      - Linear Regression
      - Observation
        
        LOCF
        
        Pandas : df.fillna(method="ffill")
        
        NOCB
        
        Pandas : df.fillna(method="bfill")
      - Hot-deck
- - - - Matplotlib : df.plot(kind='scatter', x='Sales', y='Buyers', rot=70)
  - - - Pandas : df = df[df.column<X]
- - - - Pandas : df.drop_duplicates()
      - Pandas : df.drop_duplicates('column', keep='')
- - - - df[df['column'] > 5]
    - - (df['column'] > 5).astype('int').value_counts(normalize=True)
        ---- astype('int') turns the boolean value into integer, in order to count
- - - - Add the new col at
        specific order
        
        df.insert(loc=numofcolumns_before_newcolumn, column='new_column_name', value=(new_column_value)
      - Add the new col at the end
        
        df.insert(loc=len(df.columns), column='new_column', value=(df['column1'] + df['column2'])
        
        df['new_clolumn_name'] = df['column1'] - df['column2']
    - - df.drop(['column_to_drop1', 'column_to_drop2'], axis=1, inplace=True)