Please enable JavaScript.

Coggle requires JavaScript to display documents.

Advanced Data Warehousing (tips (cardinality (you have 3 years of…

- - - - Characteristics
        
        Smaller tables
        
        Only the immediate parent ID stored in child tables
        
        more joins in SQL
        
        Many tables
    - - the only reason why to introduce redundancy is to improve query performance.
      - Characteristics
        
        Many Tables
        
        Slightly larger tables
        
        All higher-level IDs stored in child tables
        
        grandparent, grand-grandparent, and so on .
        
        Fewer joins in SQL.
    - - Characteristics
        
        Many tables
        
        Much larger tables
        
        All higher-level IDs/Desc stored in child tables.
        
        Fewest joins in SQL
  - - - Very few tables.
      - Very larger tables
      - All IDs/Desc for all attributes in a hierarchy stored in a single table.
      - Single join in SQL.
- - - - very good for large cardinality attribute
      - 但是需要很多的空间储存index本身
      - 因为是基于primary key的，所以easier to maintain（因为不需要重新计算key）
    - - requires the database to generate a binary string for each row
      - so this is better for attribute that is with a lower cardianlity
      - 比如使用row number作为index
    - - The table will be organized, that being said, to change the order of rows, to make the table data stored with rows index organized.
      - 嵌套排序，比如按照定义有Store_ID, Date_ID, Cust_ID.则按照从外到内Store_ID, Date_ID, Cust_ID的顺序将table中的row进行排序。
      - The index needs no additional space to store. You build the index into the table, when generate the table.
      - The index is build , or we put the data into this order, when moving the data from the data source to the data warehouse.
      - excellent solution for large tables.
    - - denormalized table will store foreign keys. Those are often used to join are good to build index on.
      - frequently filtered elements
        
        比如总做state的filter，就可以在fact table的state_ID这个column上做index
      - Columns that are often used to join with other tables are excellent candidates to build index.
- - - - parent-child relationship
      - defined by lookup table
      - three types of direct attribute relationships
        
        One-to-One
        
        you can keep information for both attributes in the same table.
        
        One-to-Many
        
        Many-to-Many
        
        You need to have a separate relationship table.
        
        scenario 1, analytical capability
        
        how many items in a certain color were sold?
        
        keep analytical capability in your data model
        
        the fact table doesn't cover all the possible combinations, for this you need the relationship table.
        
        scenario 2, total sales for red items
        
        three ways to resolve this
        
        a separate relationship table
        
        a compound child attribute
        
        use "Item_ID" and "Color_ID" together as the key for Item
        
        a common child attribute
        
        using "SKU"
    - - relationship through a fact
      - defined in a fact table
      - e.g, Item-Date <Revenue table> Item table joins this fact table, and date table joins this fact table.
  - - - Cons: create and have to maintain multiple views;
    - - Query optimizations -> Engine Attribute Role Options
      - Limitations: the two attribute roles cannot be in a same hierarchy or have a common child; the alias is created in the memory, so there is a 100 role limit an attribute.
- - - - The Region_ID column is added to the LU_ACCT_EXEC table to directly relate regions to account executives.
    - - populate with
        
        Parent Value
        
        Child Value
        
        Generated Value
- - - - <start_date, end_date> combined with employee, together as a record in "LU_EMPLOYEE" table
      - Cons: tedious and time consuming