Advanced Project Design

Managing project schema

Using MSTR multisource

fact level extensions

transformations

partitioning

data warehouse catalog

Primary and secondary database instances

Warehouse Catalog

Aggregation-awareness

Data marts

Multisouce report execution

Intro

schema objects

basic

2.fact

3.table

1.attribute

4.hierarchy

additional

5.transformations

6.partition mapping

7.functions and operators

e.g. Last Year's

divide large table into smaller tables in the data warehouse

the reason is that sometimes the table is too large to query

A database instance is the logical object in MSTR metadata that typically represents a connection to a data warehouse.

A standard or a non-standard report always executes SQL against a single data source(database instance).

combine data from multiple data sources:

create a Report Services document and include reports in the same project that connect to different data sources as its datasets.

You cannot access two distinct data sources within a single standard or non-standard report unless you use MultiSource Option.

base fact table

lowest level attribute

aggregate fact table

higher level of aggregation

created in the data warehouse, and has to be updated, by the ETL team, in the ETL process.

two actions that you need to perform to integrate an aggregate fact table into an existing project

Add the table to the project using the Warehouse Catalog.

If necessary, map the existing attributes and facts to the aggregate table.

logical table size

table editor

logical size editor

lock the size: so the size does't get updated when update the schema

A data mart is a relational table containing results of a report.

You create the data mart report in MSTR Desktop and you save the data mart table in a warehouse of your choice.

After you create a data mart table, you can add it to a project and then use it as a source table from which to execute reports.

Some common applications:

Creating aggregate fact tables

Creating tables for very large result sets and then using other applications such as MS Excel or MS Access to access the data;

Creating tables for off-line analysis.

Create

data mart report

data mart table

new or existing report

to create, execute the data mart report

MSTR Multiple Source Option

An add-on component to MSTR Intelligence Server (需要另外付钱)

enables you to define a single project schema across multiple data sources;

enables you to create standard reports that execute SQL against multiple data sources;

(1) You can define primary and secondary database at table level;

(2) You can connect to any data source that you access using an ODBC driver.

Defining a Project Schema with MultiSource Option

You select a primary database instance for the project.

You can add tables to the project from different database instances

you can associate a single project table with multiple database instances.

e.g. the LU_REGION table

This is called "Duplicate Tables".

You create a duplicate table when you map the same project table to more than one database instance.

you can have any type of table (lookup, relationship, or fact) duplicated across multiple data sources.

Duplicate tables are beneficial only for lookup and relationship tables.

Duplicate tables must have identical table name, column names, and same/compatible data types for corresponding columns.

The table associated to the primary database instance cannot have extra columns.

You can have extra columns in the tables associated to secondary database instance.

SQL Generation

the SQL Engine performs two additional tasks for multisource reports

determines the optimal database instance for each pass of SQL

Identifies when joins need to occur across database instances.

Selecting the Optimal Data Source for Fact Tables

The Engine uses the primary database instance for the project if the fact is available in this database instance.

The Engine uses a secondary database instance if the fact is not available in the primary database instance for the project. The Engine selects the secondary database instance with the smallest GUID first.

Select the Optimal Data Source for Lookup Tables

The Engine uses the primary database instance for the lookup table if the attribute is not available in the fact data source.

The Engine uses the fact data source if the attribute is available in this database instance.

Joining Data from Different Sources

The Engine selects data from the first data source to the memory of the Intelligence Server.

The Engine creates temporary table in the second data source and inserts the data into this table.

The Engine joins columns using data type compatibility rules.

Different data source are often optimal for different passes of SQL.

Uses Engine data type definition, not physical database definition.

The Engine moves the attribute data to the fact source.

Split Lookup Tables

scenario: need to use lookup data from another data source.

Fact

1.You can report on the fact level, or aggregate to a higher level. But not available for lower level.

there might be a scenario, where a metric uses two metrics which defined on different levels. (You need day-level data, but only have week-level).

Fact Degradation

three types

1.Degration

2.Extension

3.Dsiallow

Enables you to extend a fact's level to a lower level within the same hierarchy. This concept is the opposite of aggregation.

Enables you to extend a fact's level to include a level from a different hierarchy not currently related to the fact.

Enables you to prevent unnecessary cross joins to lookup tables when reporting on fact.

Creating a Fact Degration

Choose the attribute level to which you want to lower the fact.

Select an attribute the SQL Engine can use to join the fact table with the attribute to which you want to lower the fact.

Determine how to perform the join.

If necessary, define an allocation expression (how to do the mapping).

2(1)only to the attribute you choose (2)also join to the children of the attribute you choose.

three ways to create extensions

Through a table relation

Through a fact relation

Through a cross product

You specify a particular table to use for the join. You should select this option if you always want the Engine to use the same table for the join.

Instead of a table, you specify a fact. This option enables the Engine to choose any table containing that fact for the join. You should select this option if you want to allow the Engine choose the optimal table for a particular query.

You tell the Engine to perform a Cartesian join between the lookup and fact tables in question.

this is not recommended.

you can do this if there is no other option.But this causes a large overhead and may not give you the result you want.

Extension through (a Table Relation) or (a Fact Relation)

Choose the attribute level to which you want to extend the fact.

Choose the table you want the SQL Engine to use to perform the join.

Select the attribute of set of attributes the SQL Engine can use for the join.

Determine how to perform the join.

If necessary, define an allocation expression.

Doesn't resolve a problem, but it prevents the SQL Engine from spending a lot of energy trying to run a report which doesn't bring back appropriate data.

Transformations are schema objects most often used to compare values at different times.

e.g. This year versus last year.

e.g. Today versus "Month to date"

types

table based

expression based

can be used for very complex tasks

You implement these transformations using a transformation table in the warehouse.

Transformation Components

all transformations have these.

Member attributes

Member expressions

Member tables

Mapping type

The member attributes are attributes to which the transformation applies, which is the lowest-level attribute on the report to which you want to apply the transformation.

Each member attribute has a corresponding expression that needs to be resolved in the report SQL to obtain valid data.

The member tables store the data for the member attributes.

The mapping type determines the way the transformation is created based on the nature of data.

one-to-one mapping type

many-to-many mapping type

one day corresponding to many days.

在table-based的情况下，expression就是column name

The division of larger table into smaller tables.

Used to improve query performance

Because you are going to look into smaller number of records.

can also reduce loading time for data.

types

Server-Level Parttitioning

Application-Level Partitioning

Physical tables into logical partitions, not actual separate tables.

the DB software handles this completely. Just write SQL in MSTR

This goes in the DB server itself.

Warehouse and Metadata Partition Mapping

MicroStrategy supports application-level partitioning for fact tables through one of the two methods.

Warehouse partitioning mapping

This method is based on a physical warehouse partition mapping table(PMT) that resides in the data warehouse and describes the partitioned base tables that are part of a conceptual whole.

Metadata partition mapping

This method is based on rules logically defined in MicroStrategy Architect and stored in the metadata.

将partition base tables放在data warehouse里而不是分割前的table

The partition mapping schema object is generated by MSTR automatically, when the Partition Mapping Table in the data warehouse is included into the project.

对比：只添加Partition Mapping Table到project中，通过pre-query确定应该run SQL against data warehouse的哪一张表

对比：将所有Partition Base Table都添加到data warehouse中，不需要在data warehouse中定义PMT，但是在添加这些表到MSTR project中之后，需要在MSTR中定义PMT

在这里定义PMT的时候实际上是对每个PBT，选取了attribute element的一部分