Please enable JavaScript.

Coggle requires JavaScript to display documents.

Data Science datascience - Coggle Diagram

- - - - x[0,1]
      - x[-1]
      - x[1, 1:2]
      - to get subset of the values x[start:stop:step]
      - https://numpy.org/doc/stable/reference/arrays.indexing.html
      - x[0][1]
      - Fancy indexing: passing arrays as parameters
    - - .reshape()
      - parameter is a tuple
      - x[..., 0] to remove a dimension
    - - .concatenate([a,b])
      - always takes an array as input
      - .concatenate([a,b], axis =1)
      - number of axis you can have is equal to the number of dimensions
      - axis = 0 is default behaviour
    - - .vstack([a,b]) vertical stack
      - .hstack([a,b])
      - .T will transpose the matrix
      - .newaxis makes an array of your array
    - - the split method takes as input the split points
      - x, y, z = np.split(a, [2,4])
      - .vsplit(x, [2]) and .hsplit(x, [2]])
    - - np.sort(x)
      - np.argsort(x) gives index of sorted elements
      - np.sort(x, axis = 0)
      - Partial Sorting: np.partition(x3)
    - - zeros, full, arange, random, identitity, diagonal
- - - - table with heterogenous elements and column labels and rows with indexes.
      - states = pd.DataFrame({'population': population, 'area': area})
      - pd.DataFrame([[1,2,3[,[3,4,5]], columns = ['A', 'B', 'C'], index = ['1','2'])
      - Creating a DataFrame with Hierarchical Columns: df = pd.DataFrame(d, index=[])
      - Opoerations on DataFrames: add, A.stack.mean()
      - stack Method: .stack() gives back a series. If you call it with parameter, it tells what level in a hierarchical df to disappear
      - Broadcasting: .describe to get information about x
      - np.nan to specify a missing value. Data cleaning. dropna() keeps only rows and creates a new dataframe. data.fillna() to fill Nan values
      - .setindex() to move a column to be the index
      - joining dataframes
        
        .join: on attribute or index. joining needs to happen on same type
        
        .merge: more complete than join
        
        how="" could be either inner (things are the same) or outer(all the elements)
      - analyzing dfs
        
        groupby(): count(). .groups (for lookups) .
        
        .nunique(): number of unique
        
        .value_counts(): number of rows
      - Pivot Tables
        
        numeric values
    - - data.loc[0] to access elements in series. loc is inclusive and .iloc is like regular python
      - data frames with a single column
      - using custom indexes: data = pd.Series(np.arange(5) + 5, index['a', ..])
      - mySeries = pd.Series(pythonDictionary)