Advanced Project Design
Managing project schema
Using MSTR multisource
fact level extensions
transformations
partitioning
data warehouse catalog
Primary and secondary database instances
Warehouse Catalog
Aggregation-awareness
Data marts
Multisouce report execution
Intro
schema objects
basic
2.fact
3.table
1.attribute
4.hierarchy
additional
5.transformations
6.partition mapping
7.functions and operators
e.g. Last Year's
divide large table into smaller tables in the data warehouse
the reason is that sometimes the table is too large to query
A database instance is the logical object in MSTR metadata that typically represents a connection to a data warehouse.
A standard or a non-standard report always executes SQL against a single data source(database instance).
combine data from multiple data sources:
create a Report Services document and include reports in the same project that connect to different data sources as its datasets.
You cannot access two distinct data sources within a single standard or non-standard report unless you use MultiSource Option.
base fact table
lowest level attribute
aggregate fact table
higher level of aggregation
created in the data warehouse, and has to be updated, by the ETL team, in the ETL process.
two actions that you need to perform to integrate an aggregate fact table into an existing project
- Add the table to the project using the Warehouse Catalog.
- If necessary, map the existing attributes and facts to the aggregate table.
logical table size
table editor
logical size editor
lock the size: so the size does't get updated when update the schema
A data mart is a relational table containing results of a report.
You create the data mart report in MSTR Desktop and you save the data mart table in a warehouse of your choice.
After you create a data mart table, you can add it to a project and then use it as a source table from which to execute reports.
Some common applications:
Creating aggregate fact tables
Creating tables for very large result sets and then using other applications such as MS Excel or MS Access to access the data;
Creating tables for off-line analysis.
Create
- data mart report
- data mart table
new or existing report
to create, execute the data mart report
MSTR Multiple Source Option
An add-on component to MSTR Intelligence Server (需要另外付钱)
- enables you to define a single project schema across multiple data sources;
- enables you to create standard reports that execute SQL against multiple data sources;
(1) You can define primary and secondary database at table level;
(2) You can connect to any data source that you access using an ODBC driver.
Defining a Project Schema with MultiSource Option
- You select a primary database instance for the project.
- You can add tables to the project from different database instances
- you can associate a single project table with multiple database instances.
e.g. the LU_REGION table
This is called "Duplicate Tables".
- You create a duplicate table when you map the same project table to more than one database instance.
- you can have any type of table (lookup, relationship, or fact) duplicated across multiple data sources.
- Duplicate tables are beneficial only for lookup and relationship tables.
Duplicate tables must have identical table name, column names, and same/compatible data types for corresponding columns.
The table associated to the primary database instance cannot have extra columns.
You can have extra columns in the tables associated to secondary database instance.
SQL Generation
the SQL Engine performs two additional tasks for multisource reports
- determines the optimal database instance for each pass of SQL
- Identifies when joins need to occur across database instances.
Selecting the Optimal Data Source for Fact Tables
- The Engine uses the primary database instance for the project if the fact is available in this database instance.
- The Engine uses a secondary database instance if the fact is not available in the primary database instance for the project. The Engine selects the secondary database instance with the smallest GUID first.
Select the Optimal Data Source for Lookup Tables
- The Engine uses the primary database instance for the lookup table if the attribute is not available in the fact data source.
- The Engine uses the fact data source if the attribute is available in this database instance.
Joining Data from Different Sources
- The Engine selects data from the first data source to the memory of the Intelligence Server.
- The Engine creates temporary table in the second data source and inserts the data into this table.
- The Engine joins columns using data type compatibility rules.
- Different data source are often optimal for different passes of SQL.
Uses Engine data type definition, not physical database definition.
The Engine moves the attribute data to the fact source.
Split Lookup Tables
scenario: need to use lookup data from another data source.
Fact
1.You can report on the fact level, or aggregate to a higher level. But not available for lower level.
- there might be a scenario, where a metric uses two metrics which defined on different levels. (You need day-level data, but only have week-level).
Fact Degradation
three types
1.Degration
2.Extension
3.Dsiallow
Enables you to extend a fact's level to a lower level within the same hierarchy. This concept is the opposite of aggregation.
Enables you to extend a fact's level to include a level from a different hierarchy not currently related to the fact.
Enables you to prevent unnecessary cross joins to lookup tables when reporting on fact.
Creating a Fact Degration
- Choose the attribute level to which you want to lower the fact.
- Select an attribute the SQL Engine can use to join the fact table with the attribute to which you want to lower the fact.
- Determine how to perform the join.
- If necessary, define an allocation expression (how to do the mapping).
2(1)only to the attribute you choose (2)also join to the children of the attribute you choose.
three ways to create extensions
- Through a table relation
- Through a fact relation
- Through a cross product
You specify a particular table to use for the join. You should select this option if you always want the Engine to use the same table for the join.
Instead of a table, you specify a fact. This option enables the Engine to choose any table containing that fact for the join. You should select this option if you want to allow the Engine choose the optimal table for a particular query.
You tell the Engine to perform a Cartesian join between the lookup and fact tables in question.
this is not recommended.
you can do this if there is no other option.But this causes a large overhead and may not give you the result you want.
Extension through (a Table Relation) or (a Fact Relation)
- Choose the attribute level to which you want to extend the fact.
- Choose the table you want the SQL Engine to use to perform the join.
- Select the attribute of set of attributes the SQL Engine can use for the join.
- Determine how to perform the join.
- If necessary, define an allocation expression.
Doesn't resolve a problem, but it prevents the SQL Engine from spending a lot of energy trying to run a report which doesn't bring back appropriate data.
Transformations are schema objects most often used to compare values at different times.
e.g. This year versus last year.
e.g. Today versus "Month to date"
types
- table based
- expression based
can be used for very complex tasks
You implement these transformations using a transformation table in the warehouse.
Transformation Components
all transformations have these.
- Member attributes
- Member expressions
- Member tables
- Mapping type
The member attributes are attributes to which the transformation applies, which is the lowest-level attribute on the report to which you want to apply the transformation.
Each member attribute has a corresponding expression that needs to be resolved in the report SQL to obtain valid data.
The member tables store the data for the member attributes.
The mapping type determines the way the transformation is created based on the nature of data.
one-to-one mapping type
many-to-many mapping type
one day corresponding to many days.
在table-based的情况下,expression就是column name
The division of larger table into smaller tables.
Used to improve query performance
Because you are going to look into smaller number of records.
can also reduce loading time for data.
types
- Server-Level Parttitioning
- Application-Level Partitioning
Physical tables into logical partitions, not actual separate tables.
the DB software handles this completely. Just write SQL in MSTR
This goes in the DB server itself.
Warehouse and Metadata Partition Mapping
MicroStrategy supports application-level partitioning for fact tables through one of the two methods.
- Warehouse partitioning mapping
This method is based on a physical warehouse partition mapping table(PMT) that resides in the data warehouse and describes the partitioned base tables that are part of a conceptual whole.
- Metadata partition mapping
This method is based on rules logically defined in MicroStrategy Architect and stored in the metadata.
将partition base tables放在data warehouse里而不是分割前的table
The partition mapping schema object is generated by MSTR automatically, when the Partition Mapping Table in the data warehouse is included into the project.
对比:只添加Partition Mapping Table到project中, 通过pre-query确定应该run SQL against data warehouse的哪一张表
对比:将所有Partition Base Table都添加到data warehouse中,不需要在data warehouse中定义PMT, 但是在添加这些表到MSTR project中之后,需要在MSTR中定义PMT
在这里定义PMT的时候实际上是对每个PBT,选取了attribute element的一部分