Please enable JavaScript.

Coggle requires JavaScript to display documents.

Pandas(py) - Coggle Diagram

- - - - df.read_csv('my-csv-file.csv')
    - - df.to_csv('new-csv-file.csv')
- - - - df['Quantity'] = [100, 150, 50, 35]
  - - - df['In Stock?'] = True
  - - - df['Sales Tax'] = df.Price * 0.075
        ! "Price" this is column !
  - - - df['Name'] = df.Name.apply(str.upper)
  - - - The output:
        "oh hi mark!"
  - - - myfunction = lambda x: 40 + (x - 40) * 1.50 if x > 40 else x
  - - - df['Price with Tax'] = df.apply(lambda row:
        row['Price'] * 1.075
        if row['Is taxed?'] == 'Yes'
        lse row['Price'],
        axis=1
        )
  - - - df = pd.DataFrame({
        'name': ['John', 'Jane', 'Sue', 'Fred'],
        'age': [23, 29, 21, 18]
        })
        df.columns = ['First Name', 'Age']
  - - - Using inplace=True lets us edit the original DataFrame.
        
        df = pd.DataFrame({
        'name': ['John', 'Jane', 'Sue', 'Fred'],
        'age': [23, 29, 21, 18]
        })
        df.rename(columns={
        'name': 'First Name',
        'age': 'Age'},
        inplace=True)
- - - - new_df = pd.merge(orders, customers)
    - - new_df = orders.merge(customers)
        
        This produces the same DataFrame as if we had called pd.merge(orders, customers)
        
        sample:
        results = all_data[(all_data.revenue > all_data.target) & (all_data.women > all_data.men)]
    - - pd.merge(orders, customers.rename(columns={'id': 'customer_id'}))
    - - pd.merge(
        orders,
        customers,
        left_on='customer_id',
        right_on='id')
        
        Pandas won’t let you have two columns with the same name, so it will change them to id_x and id_y.
        
        id_x and id_y. it's unreadable for us. So, use the keyword suffixes
        
        pd.merge(
        orders,
        customers,
        left_on='customer_id',
        right_on='id',
        suffixes=['_order', '_customer']
        )
    - - OUTER join combine all rows, even if they don't match each other. Any missing values are filled in with None or nan
        
        pd.merge(company_a, company_b, how='outer')
    - - left merge includes all rows from first table(left), and rows from seond(right) table, that perfectly match row from first table
        
        pd.merge(company_a, company_b, how='left')
    - - sometimes dataset is split on multiple pices. For instance, often dataset is split into multiple CSV files. And if we want to combine in together we can use the method:
        pd.concat([df1, df2, df3, ...])
        realy imortant, that columns in dataframes are the same
        
        Example:
        pd.concat([df1, df2])
  - - - The general syntax for these calculations is:
        df.column_name.command()
        
        examples:
        
        print(customers.age)
        '>> [23, 25, 31, 35, 35, 46, 62]
        print(customers.age.median())
        '>> 35
        
        print(inventory.color)
        '>> ['blue', 'blue', 'blue', 'blue', 'blue', 'green', 'green', 'orange', 'orange', 'orange']
        print(inventory.color.unique())
        '>> ['blue', 'green', 'orange']
        
        common commands:
        
        Command:
        1.mean
        2.std
        3.median
        4.max
        5.min
        6.coun
        7.nunique
        8.unique
        
        Description:
        1.Average of all values in column
        2.Standard deviation
        3.Median
        4.Maximum value in column
        5.Minimum value in column
        6.Number of values in column
        7.Number of unique values in column
        8.List of unique values in column
    - - df.groupby('column1').column2.measurement()
        
        example:
        grades = df.groupby('student').grade.mean()
        
        output:
        student | grade
        Amy | 80
        Bob | 90
        Chris | 75
    - - Generally, groupby statement followed by reset_index
        
        df.groupby('column1').column2.measurement().reset_index()
        
        Example:
        teas_counts = teas.groupby('category').id.count().reset_index()
        
        output:
        category | id
        0 black 3
        1 green 4
        2 herbal 8
        3 white 2
      - Rename column:
        teas_counts = teas_counts.rename(columns={"id": "counts"})
        
        output:
        category | counts
        0 black 3
        1 green 4
        2 herbal 8
        3 white 2
    - - np.percentile can calculate any percentile over an array of values
        
        high_earners = df.groupby('category').wage
        .apply(lambda x: np.percentile(x, 75))
        .reset_index()
    - - we can group by more than one column
        
        df.groupby(['Location', 'Day of Week'])['Total Sales'].mean().reset_index()
    - - In Pandas, the command for pivot is:
        df.pivot(columns='ColumnToPivot',
        index='ColumnToBeRows',
        values='ColumnToBeValues')
        
        Example
        
        First use the groupby statement:
        unpivoted = df.groupby(['Location', 'Day of Week'])['Total Sales'].mean().reset_index()
        Now pivot the table:
        pivoted = unpivoted.pivot(
        columns='Day of Week',
        index='Location',
        values='Total Sales')
        
        Remember to use reset_index() at the end of your code!