Minimize
05

This article introduces attribute discretization in SQL Server Analysis Services 2008 / 2008 R2, and extends the examination of the dimensional model that we began in  Learning SQL Server Analysis Services: Analysis Services 2008 R2 Dimensional Model: Dimensions, Parts I  and IIHere, we will introduce attribute discretization, focusing upon the purpose and benefits of this capability, which, as we shall see, affords us a means of creating a manageable number of groups of attribute values, which are separated by distinct boundaries. In this way, discretization (or “bucketization”) allows us to group contiguous values into sets of discrete values.

Introduction
 
In Learning SQL Server Analysis Services: Analysis Services 2008 R2 Dimensional Model: Dimensions, Parts I  and II, we undertook a general introduction to the dimensional model, noting its wide acceptance as the preferred structure for presenting quantitative and other organizational data to information consumers. We then began our examination of dimensions, the analytical “perspectives” upon which the dimensional model relies in meeting the primary objectives of business intelligence, including its capacity to support:
 
  • the presentation of relevant and accurate information representing business operations and events;
  • the rapid and accurate return of query results;
  • “slice and dice” query creation and modification;
  • an environment wherein information consumers can pose questions quickly and easily, and achieve rapid results datasets.
To extend the metaphor we used earlier in describing dimensions as nouns and measures as verbs, we might consider attributes as somewhat similar to adjectives. That is, attributes help us to define with specificity what dimensions cannot define by themselves. Dimensions alone are like lines in geometry: they don't define “area” within multidimensional space, nor do they themselves even define the hierarchies that they contain. A database dimension is a collection of related objects called attributes, which we use to specify the coordinates required to define cube space.
 
Dimensions and dimension attributes should support the way that management and information consumers of a given organization describe the events and results of the business operations of the entity. Because we maintain dimension and related attribute information within the database underlying our Analysis Services implementation, we can support business intelligence for our clients and employers even when these details are not captured within the system where transaction processing takes place. Within the analysis and reporting capabilities we supply in this manner, dimensions and attributes are useful for aggregation, filtering, labeling, and other purposes.
 
In this article, we will introduce a capability in Analysis Services – to which we refer as attribute discretization - that allows us to group members of an attribute into a number of member groups. We will discuss design, and other, considerations involved in the discretization of attributes, and touch upon best practices surrounding the use of this capability. Our focus will be upon the general exploitation of discretization in providing support for the selection and delivery of enterprise data. (In other articles designed specifically for the purpose, we will examine attribute discretization in a manner similar to previous articles within this subseries, gaining hand-on exposure to the use of discretization in a practice scenario.) 
 
Our examination will include:
  • An overview of attribute discretization in Analysis Services, potential benefits that accrue from discretization in our UDMs,and how the process can help us to meet the primary objectives of business intelligence.
·         A discussion of general considerations and best practices surrounding attribute discretization.
  • A look forward to subsequent articles that within our series, where we will perform detailed examinations of the properties underlying attribute discretization, along with a review of the respective settings associated with each property, based upon a representative dimension attribute within our sample UDM.
 
Attribute Discretization
 
As we have learned, attributes serve as the foundation for our dimensions and cubes. Whenever we work with attributes, we can expect to encounter two general types of values:
 
·         Discrete attributesDiscrete values stand apart distinctly, and have clearly defined logical “boundaries” between themselves. The possible values are naturally discrete for the lion’s share of attributes occurring in the business world.
 
Example: The Gender attribute, within the Customer dimension of the Adventure Works sample UDM, is (at least for purposes of the sample cube) considered to have only one of two discrete values, female or male
 
·         Contiguous attributes: Contiguous values do not stand apart distinctly, but flow along, as if in a continuous line. Contiguous values, especially within large populations, can have very large numbers of possible values. Information consumers can find it difficult to work effectively and efficiently within such wide ranges of values.
 
Example: the Vacation Hours attribute, within the Employee dimension of the Adventure Works sample UDM, can have a wide range of possible values, depending upon how many employees are involved, whether there are limits on how many vacation days they can accumulate, and considerations of this nature. The member values are based directly upon the unique values contained within the VacationHours column of the DimEmployee table (with many of the unique values shared among multiple employees). The sheer number of values might make working with them cumbersome for information consumers, if they are simply made available in their existing state.
 
 
Overview
 
Discretization can help us to make it easier for information consumers to work with large numbers of possible attribute member values. As we have learned, discretization is the process of creating a manageable number of groups of attribute values that are clearly separated by boundaries. We can thereby use discretization as a means of group contiguous values into sets of discrete values, via a system-generated collection of consecutive dimension members known as member groups.
 
The discretization process groups the attribute members into the member groups, once they are generated; the member groups are then housed within a level within the dimensional hierarchy. (A given level within a dimensional hierarchy can contain either members or member groups, but not both.) When information consumers browse a level that contains member groups, they see the names and cell values of the member groups. The members generated by Analysis Services to support member groups are called grouping members, and they look like ordinary members.
 
Analysis Services affords us several variations of attribute discretization, based upon algorithms of varying complexity. The different methods of discretization all have the same function – to group contiguous values into sets of discrete values. They simply manage grouping via different approaches. Beyond the Dimension Editor, Analysis Services also supports user-defined discretization, via data definition language (DDL), should the “out – of – the – box” approaches not meet the business needs of our local environments. Moreover, we can alternatively implement custom discretization via the underlying data warehouse, using views at the relational level; named calculations in the data source view; calculated members in Analysis Services; or via other approaches.
 
As we have noted throughout my articles on the subject here and elsewhere, one of the most important objectives in building a high performance Analysis Services solution is an efficient and effective dimension design. The identification of opportunities wherewe can effectively use attribute discretization, and the effective design and placement of the member groups that we generate thereby, can mean the provision of a much more consumer-friendly interface for our clients and employers.
 
Best Practices and Other Considerations Surrounding Attribute Discretization
 
Best practices dictate that, in creating attribute / member groups, we give those groups names that are intuitive for information consumers – names that best represent the semantics of the business. Analysis Services provides a naming template that we can use – or modify and use as a custom template or templates - to make this easier for us. Member group names are generated automatically, via the template, when the member groups are created. Unless we specify a naming template, the default naming template is used. (We can change this method of naming by specifying a naming template in the Format option for the NameColumn property of an attribute). Different naming templates can be defined for every language specified in the Translations collection of the column binding that has been used for the NameColumn property of the attribute.
 
Another best practice is to make the sort order of members meet the requirements of information consumers. Sort order is controlled via the OrderBy property of the attribute. Based on this sort order, the members in a member group are ordered consecutively. Finally, another best practice, from the perspective of “consumer-friendly” member group generation, is the consideration of intuitive, logical drill down paths. For example, one common use for member groups is to support drill down from a level with few members to a level with many members. (We will get some hands-on exposure to providing this sort of support in subsequent articles of this series).
 
We need to keep in mind that, when we process a dimension within Analysis Services, a discretized attribute is rediscretized only with a full update (ProcessFull). To rediscretize an attribute, we must perform a full update of the dimension – processing via an incremental update (ProcessAdd), will not rediscretize a discretized attribute, meaning that the names and children within the new “buckets” remain the same.
 
NOTE: We will examine in detail the properties (and their settings) that support attribute discretization within the sample Adventure Works cube in the hands-on practice sections of subsequent articles in this series.
 
Other significant considerations in working with attribute discretization include usage limitations. First, we need to be aware that member groups (and, therefore, attribute discretization) are not supported for dimensions that use the ROLAP storage mode. Moreover, we cannot create member groups in the topmost or bottommost level of a hierarchy. (We can approach the need to accomplish this by simply adding a level in such a way that the level in which we want to create member groups is no longer the top or bottom level. We can hide the added level by setting its Visible property to False, as appropriate). An additional consideration is that we cannot create member groups in two consecutive levels of a hierarchy. 
 
Finally, we need to keep in mind that, if the dimension table of a dimension that contains member groups is updated, and the dimension is subsequently fully processed, a new set of member groups is generated. Under such a set of circumstances, the names and children of the new member groups may be different from the old member groups.
 
We will examine the many properties, and their settings, that we use in performing and maintaining attribute discretization in subsequent articles of this column, where we will gain hands-on exposure to these in a working environment.
 
Conclusion
 
In this article we introduced attribute discretization in Analysis Services, extending the examination of the dimensional model that we began in Learning SQL Server Analysis Services: Analysis Services 2008 R2 Dimensional Model: Dimensions, Parts I and IIIn a manner similar to previous articles within this subseries, we overviewed the general concepts involved and looked ahead to subsequent articles, where we will perform hands-on, detailed examination of the properties that support attribute discretization within a working sample environment. Our focus, as we stated, was upon the appropriate use of attribute discretization in providing support for the selection and delivery of enterprise data in a more focused and consumer-friendly manner.
 
Our introduction included an overview of attribute discretization, a capability in Analysis Services that allows us to group members of an attribute into a number of member groups. We noted potential benefits that accrue from discretization in our UDMs,and how the process can help us to meet the primary objectives of business intelligence. We discussed design, and other, considerations involved in the discretization of attributes, and touched upon best practices surrounding the use of this capability, in providing support for the selection and delivery of enterprise data. Finally, we looked ahead to subsequent articles of this series, where we will examine the individual properties underlying attribute discretization, and conduct a review of the respective settings associated with each property, through hands-on exposure to these in a working environment.

Comments

There are currently no comments, be the first to post one.

Post Comment

Only registered users may post comments.

LatestArticles Minimize
Changing Granularity of Leaf Level Calculations in SSAS Tabular by Jason Thomas

Finding Nearest Stores using SSRS Map Reports by Jason Thomas

Heat Maps for SSRS using Map Control by Jason Thomas

Parameters for Analysis Services Reporting: Introduction, Pt. 3 by William Pearson
BI Architect Bill Pearson continues an extended examination of parameterization within Analysis Services reports. In this, Part 3 of the article, we continue to get hands-on practice cr...

Parameters for Analysis Services Reporting: Introduction, Pt. 2 by William Pearson
BI Architect Bill Pearson continues an extended examination of parameterization within Analysis Services reports. In this, Part 2 of the article, we continue to get hands-on practice cr...

Using Annotation Tables in SSAS to Show Last Processed Time and Latest Data Updates as a Measure in a Cube by John Hall

Using Color in SSRS Charts by Melissa Coates
Effective data presentation techniques help users  interpret information quickly and reliably.  Layout, formatting, sizing, labeling, and other report elements may all be used to facilitate ...

Is TOAD faster than BIDS Query Builder? by John Hall

SSRS: Unexplained Warning “This field is missing from the returned result set from the data source” And Checking a Field for Null Exception by John Hall
If you ever see the warning message “This field is missing from the returned result set form the data source” and get unexplained #Errors in columns on your report it may be from SSAS not returning a ...

Beyond Excel pivot tables: Leveraging cube formulas with MDX by Javier Guillen
PowerPivot and DAX are a powerful technologies, but there is still a good amount of work that can be done with traditional cube functions. In its current version, PowerPivot lacks the concept of ...


Advertisement Minimize

Advertisement Minimize

Copyright 2004-2012 MSBICentral.com Terms Of Use Privacy Statement