Minimize
11
Purpose
The purpose of this document is to provide some “best practices” guidance for designers and architects ODSs and Data Warehouses . This document will grow over time as we add your ideas and comments. A basic requirement for this ODS/Data Warehouse is that we expect to include data from other sources/applications and companies. We do not know currently who they are or what their requirements will be. We do know that their data types may not match our source systems, and that we should be able to tag each row with the source system which contributed the data.
Strong Surrogate Key Policy
Each table should have a surrogate key with the following attributes:
  • Single column
  • Unique
  • Integer
  • Different than the source system key
  • No Smart Keys
 
A surrogate key is simply a single, non-intelligent key that provides uniqueness for each row in a given table. The surrogate key should never be the source key from the system of record. Source keys are carried forward as attributes.
 
This surrogate key requirement is true for stand-alone tables, many-many relationship tables, and code-lookup tables.
 
The reason for this is that the table rows within the ODS and DW need to be independent of the Source of Record. If the Source of Record changes a key value, the ODS and DW would be able to handle the change with little impact because this key would simply be an attribute and not the key of the table itself.
 
There will be many references to most tables, including auditing, replication, and inclusion as a foreign key reference in the DW fact tables. Multi-column keys increase the size, and complexity of the database. Since the model for the DW will be flattened, carrying forward multi-column keys is not a best practice.
 
Although the use of integer keys is the default, a larger number of rows may require the use of a bigint, instead of an integer.

This is called a Strong Surrogate Key Policy, pictured below.

Exception to Surrogate Key Policy
The Primary Key for Date Dimensions in Data Warehouses and Data Marts should be an integer formatted as yyyymmdd for daily grain.
The Primary Key for Time Dimensions should be an integer formatted as hhmmss for second grain.
 
Source System Keys
The Source Primary Key(s) should be kept as attributes in the destination database. For the ODS this would be the Source of Record Primary Key. For the DW this would be ODS PK, if the row comes from the ODS, or the Source of Record PK if the row originates outside the ODS. The ETL process which loads the tables will have to be able to match incoming source rows with the appropriate row.
 
The source system keys should be concatenated into a single key field name as “Source_System_Operational_ID” and should be VARCHAR(max). “|” (pipe) character should be used as concatenation character. When ETL is looking up for the matching columns between source and target, it should first concatenate the source columns and then can be used for lookup.
 
Nullable Columns
 
Columns in the ODS and DW other than the primary key should be nullable. Though this is not typically a best practice in general data modeling, but it does serve a purpose when constructing an Enterprise Model that is flexible and built to integrate data from multiple sources over time. Note: The exception to this is that Fact Table Foreign Keys to the dimensions should not allow nulls. They are required to allow inclusion of all fact rows in an SSAS cube. SSAS requires all dimension members be known.
Required Columns
Each row must be tagged with Create_Date, Last_Updated_Date
Each table should have the following columns:
            Create_Date: The date the row was inserted into the table, Datetime data type with a default of GetUTCdate()
            Last_Updated_Date: The date the row was more recently updated, Datetime data type with a default of GetUTCdate(). When a row is newly inserted both the Create_Date and Last_Updated_Date will have the same date and time.
 
These dates are all UTC.
 
This is used by downstream consumers to determine which rows are new, and which have changed. The downstream consumer can do this easily using these columns.
 
Each row must be tagged with the Source System ID
The Source of Record ID should be included in every table. Each row should be tagged with the ID from the Source Table in Common. This will allow auditing to trace data back to its true source, and make future maintenance easier.
            Source_ID – Integer ID from the Source Table
 
Use Not For Replication
It is very likely that replication will be used to make copies of data for performance reasons. Copied data does not have to have its quality checked first at the origination point, then again during replication. Therefore the Not For Replication Clause should be used on ALL FKs, Check Constraints, Identity Columns and Triggers which are built for referential integrity.
 
Codes and Code Tables
Codes used in the ODS and DW must be in Code Tables. The table(s) should have a complete list of all valid values. In the ODS the PK will be used as the identifier for the code in related tables, and not any smart code value. In the DW, since the tables may be flattened, actual code values may be used instead of the key value.
 
Handling Date/Times
All Date/Times in the ODS and DW must be stored as UTC(GMT). If you need to store a local time, that may be added in addition to the UTC version.
Naming convention for these is
Business_Name_DateTime_UTC for UTC and
Business_Name_DateTime _Local for Local times and
Business_Name_TimeZone if you wish to store the TimeZone information for the local time
 
If the field represents date only or time only values, then no suffix (UTC or Local) is required. You can name as <field name>_Date or <field name>_Time e.g Effective_Date or Efffective_Time
 
Type 2 Slowly Changing Dimensions
Type 2 slowly changing dimensions need the effective dates of the row. These date/times should be stored in UTC. The naming convention for these is
 
Effective_Start_DateTime_UTC DateTime NOT NULL
Effective_End_DateTime_UTC DateTime NOT NULL
 
The current row should have a non-null maximum date, instead of a null value.
Posted in: SQL Server 2008

Comments

There are currently no comments, be the first to post one.

Post Comment

Only registered users may post comments.

LatestArticles Minimize
Changing Granularity of Leaf Level Calculations in SSAS Tabular by Jason Thomas

Finding Nearest Stores using SSRS Map Reports by Jason Thomas

Heat Maps for SSRS using Map Control by Jason Thomas

Parameters for Analysis Services Reporting: Introduction, Pt. 3 by William Pearson
BI Architect Bill Pearson continues an extended examination of parameterization within Analysis Services reports. In this, Part 3 of the article, we continue to get hands-on practice cr...

Parameters for Analysis Services Reporting: Introduction, Pt. 2 by William Pearson
BI Architect Bill Pearson continues an extended examination of parameterization within Analysis Services reports. In this, Part 2 of the article, we continue to get hands-on practice cr...

Using Annotation Tables in SSAS to Show Last Processed Time and Latest Data Updates as a Measure in a Cube by John Hall

Using Color in SSRS Charts by Melissa Coates
Effective data presentation techniques help users  interpret information quickly and reliably.  Layout, formatting, sizing, labeling, and other report elements may all be used to facilitate ...

Is TOAD faster than BIDS Query Builder? by John Hall

SSRS: Unexplained Warning “This field is missing from the returned result set from the data source” And Checking a Field for Null Exception by John Hall
If you ever see the warning message “This field is missing from the returned result set form the data source” and get unexplained #Errors in columns on your report it may be from SSAS not returning a ...

Beyond Excel pivot tables: Leveraging cube formulas with MDX by Javier Guillen
PowerPivot and DAX are a powerful technologies, but there is still a good amount of work that can be done with traditional cube functions. In its current version, PowerPivot lacks the concept of ...


Advertisement Minimize

Advertisement Minimize

Copyright 2004-2012 MSBICentral.com Terms Of Use Privacy Statement