08

Lets examine what the options are for calculating Quartile / Percentiles in PowerPivot.  On my last post, I explained one way to calculate Median using pure DAX.   Though the calculation works, don’t expect it to be speedy when dealing with large datasets;  The reason for this is that it relies on row context iterations to rank values prior to selecting the median.  

As similar technique can be used to calculate Quartile or Percentile, following this blog entry.  However, the result is not always optimal:  not only the calculation can be slow, be the results do not match Excel’s native functions.    The reason for this is that Excel uses interpolation to calculate both Quartile and Percentile, whereas the DAX expression above picks one value out of the set (in other words, its just an approximation).

Can we use a technique to match Excel’s results within a PowerPivot model?   The answer is yes, though the technique discussed here does not use DAX.   What I want to share is another way of using the inherent capabilities of PowerPivot: being itself a version of SQL Server Analysis Services, it is actually able to understand MDX.  It is quite interesting to notice that PowerPivot can behave in ways that reveal its SSAS nature:  from the ability to respond to Excel cube functions to generating a trace file which can be read in SQL Profiler.

So what does this have to do with calculating Percentile?   Well, even though DAX does not include any of those functions you can leverage PowerPivot’s multidimensional interface to create an MDX calculated member that will encapsulate each of those functions.  Lets see how.

In order to see PowerPivot queries to the Vertipaq engine in MDX format, we can use a free codeplex add-in called ‘OLAP PivotTable Extensions’ (download it here, and be sure to select the correct bitness version).  This add-in was created with SSAS in mind, but – without any further effort from the add-in itself – PowerPivot responds to it. Once installed, create a PowerPivot table and then use the add-in to view the MDX query (select the ‘MDX’ tab on the window that appears when you click on the OLAP Pivot Table Extensions menu item)

image

Sample PowerPivot table using AdventureWorks data

The MDX query we get is:

SELECT
    NON EMPTY Hierarchize(
        {DrilldownLevel({[DimDate].[EnglishMonthName].[All]},,,INCLUDE_CALC_MEMBERS)})
        DIMENSION PROPERTIES PARENT_UNIQUE_NAME,HIERARCHY_UNIQUE_NAME ON COLUMNS 
FROM [Sandbox]
    WHERE (
        [DimDate].[CalendarYear].&[2004],
        [Measures].[Sum of SalesAmount]
        )

So lets create a calculated member to compute Median.

Select the ‘Calculations’ tab on the add-in menu.  This is used to submit calculated members along with the rest of the pivot table query.  Type the following query:

Median(                                  
        (
         [DimDate].[CalendarYear].CurrentMember *
         [DimDate].[EnglishMonthName].[All].Children
        ) 
        , CoalesceEmpty ( [Measures].[Sum of SalesAmount] , 0 )                          
      )

This should look in the add-in window as follows:

image

The result matches Excel regular ‘Median’ function:

image

Lets examine the actual query.   MDX has a built-in ‘Median’ function ( see definition here ).   As you can see, the expression uses parameters that look like regular SSAS dimensions.  This is the PowerPivot data model though, and we didn’t have to use Visual Studio to create it: the PowerPivot engine automatically creates dimensions based on the tabular model loaded into Vertipaq.

The measure [Sum of SalesAmount] is the aggregate PowerPivot created based on the Sales Amount column of the in-memory fact table.  Once it became an actual measure in the PowerPivot model, we can use it as the measure expression of the MDX function.

Lets try now creating Quartile and Percentiles.  It gets more interesting here:  Unlike the ‘Median’ function, MDX has no built-in Percentile or Quartile function.   So here we have it:  neither DAX nor MDX will help us here.  However, we can leverage a another feature of SSAS: the ability to perform COM interop calls through the ExcelMDX assembly.   You can see this library on a regular SSAS installation by browsing the assemblies folder:

image

It is quite interesting when you think about it.   We are asking an Excel pivot table to communicate with the PowerPivot engine through a calculated member which in turn invokes Excel functionality not present in the MDX language. Pretty cool indeed. So lets try it:  Using the following expression we can compute Quartiles:

Excel!Quartile(
                                SetToArray ( 
                                                
                                                                ( [DimDate].[CalendarYear].CurrentMember * [DimDate].[EnglishMonthName].[All].Children 
                                                ) 
                                , CoalesceEmpty ( [Measures].[Sum of SalesAmount]  , 0 )
                                ) 
                ,1)

The output is the following:

image

As you can see, the output matches Excel’s native quartile function. The calculation is now built into the PowerPivot model, and hence is dynamic and can respond to slicers and filters, as well as cell context by using MDX properties like ‘CurrentMember’.  And though the PowerPivot is performing a COM interop call to Excel (marshalling calls from one environment to the other), it should still perform better than a DAX calculation based heavily on row context.

If you examine the MDX that is sent to the Vertipaq engine, you will notice the calculated member is declared as a query scoped expression:

WITH
MEMBER [Measures].[Quartile] as Excel!Quartile(
                                SetToArray ( 
                                                
                                                                ( [DimDate].[CalendarYear].CurrentMember * [DimDate].[EnglishMonthName].[All].Children 
                                                ) 
                                , CoalesceEmpty ( [Measures].[Sum of SalesAmount]  , 0 )
                                ) 
                ,1)

SELECT
    {
    [Measures].[Sum of SalesAmount],
    [Measures].[Quartile]
    }
    DIMENSION PROPERTIES PARENT_UNIQUE_NAME,HIERARCHY_UNIQUE_NAME ON COLUMNS ,
NON EMPTY
    Hierarchize(
        DrilldownMember(
            CrossJoin(
                {
                [DimDate].[CalendarYear].[All],
                [DimDate].[CalendarYear].[CalendarYear].AllMembers
                },
                {([DimDate].[EnglishMonthName].[All])}
                ),
                [DimDate].[CalendarYear].[CalendarYear].AllMembers,
                [DimDate].[EnglishMonthName]))
    DIMENSION PROPERTIES PARENT_UNIQUE_NAME,HIERARCHY_UNIQUE_NAME ON ROWS 
FROM [Sandbox] CELL PROPERTIES VALUE, FORMAT_STRING, LANGUAGE, BACK_COLOR, FORE_COLOR, FONT_FLAGS

The dimension used (DimDate) as well as the measure (Sum of SalesAmount) were automatically created by the PowerPivot engine.  This is what provides compatibility with OLAP interfaces:  for each table in the PowerPivot model, a default ‘count’ measure has been created as well as a dimension, in which each column is an attribute.  In other words, dimensions and measures are somewhat interchangeable in PowerPivot, following a more flexible model than with the traditional SSAS engine.  Also, notice the query is executed against a cube called ‘Sandbox’.  This is the default cube-interface name for Vertipaq PowerPivot databases.  It is the same name you will see in the cubes generated in the dedicated SSAS instance used on PowerPivot for SharePoint (This is an instance of SQL Server Analysis Services 2008 R2 in Vertipaq mode).

One downside of this method is that the PowerPivot field list is not aware of the calculation.  However, if you open the standard pivot table field list you will see it there. In other words, and as I said before, this calculation is now part of the data model.  Which means you can access it outside of the constraints of the pivot table: through Excel cube functions.   As an example, using the following formula expression on a worksheet cell we can dynamically invoke the MDX calculated member against PowerPivot, with the ability to specify a query context to it:

=CUBEVALUE("PowerPivot Data",CUBESET("PowerPivot Data","Head([DimDate].[CalendarYear].[All].Children)"),"[Measures].[Quartile]")

The result will be 925,015 again.  In other words, we were able to dynamically invoke the first member of the CalendarYear attribute on the DimDate dimension and leverage the MDX calculated member, all in one cell – no need of pivot tables.  Pretty powerful right?

You can verify that our Quartile calculation is now part of the intrinstic model by browsing through measures members when constructing the cube formula:

image

What I find pretty interesting is that the list of available measures on the PowerPivot model includes now regular DAX measures ([Measures].[Sum of Sales]) as well as MDX measures ([Measures].[Quartile]).

How about Percentile?  Using the same method, it is pretty simple to calculate it.  Just replace Quartile in the MDX expression… that’s it!

Excel!Percentile(
                                SetToArray ( 
                                                
                                                                ( [DimDate].[CalendarYear].CurrentMember * [DimDate].[EnglishMonthName].[All].Children 
                                                ) 
                                , CoalesceEmpty ( [Measures].[Sum of SalesAmount]  , 0 )
                                ) 
                ,0.30)

image

Be careful to not define your underlying measure as ‘currency’ type.  Though I always thought data types in the PowerPivot window were purely decorative (as they don’t carry over the pivot table), the MDX calculated member will not work on currency types.  Instead, try decimal type and you will be fine.

And one last thing… as you may have noticed, this technique opens the possiblity to a huge range of calculated expressions.  An though it is not the standard PowerPivot/DAX implementation, it still leverages PowerPivot technology by making use of its inherent data model. In other words, PowerPivot applications can not only benefit from the amazing power of DAX , but also that of MDX (which in turn opens the door to further functionality through interfaces with Excel and VBA).  Things like Internal Rate of Return (IRR) are also possible using this method Smile

Excel!IRR( SetToArray (  [Table3].[b].[All].Children, [Measures].[Sum of a]  ) )

image

 

powered by metaPost
01
These are the slides for the presentation I gave on Friday April 29, 2011 For the Professional Association for SQL Server (PASS) Business Intelligence Virtual Chapter.

http://cid-3f1bb9738e005da8.office.live.com/self.aspx/.Public/Dynamic%20Formulas%20in%20PowerPivot.pdf

If you have any questions or feedback about them, please send me a message through my email: javiguillen@hotmail.com

Thank you!

28

PowerPivot is generally marketed as a ‘Self-Service BI’ tool.  However, it can alternatively be used as a way to enhance the design phases of traditional Business Intelligence projects, even if the final implementation does not use PowerPivot technology at all.

Why is that?  Because it more closely aligns business experts with BI developers during architectural discussions. In order to explain this better, I think it’s relevant to first describe the process by which data analysts can provide relevant input to BI designers:

A lot of intermediate Excel users will probably never use VBA macros and will barely feel comfortable with Pivot Tables.  For advanced Excel users, however, there is a thrill associated with getting a cool formula to work, or knowing the most interesting formatting shortcuts or even knowing how to add dropdowns to a spreadsheet to achieve dynamic interactivity.

These folks do not normally belong to the technology team; however, in addition to being savvy Excel users they are in the front lines of the decision making process which many times makes them subject matter experts (SMEs). It is likely they will eventually be motivated to start playing with the data import facility in PowerPivot and start developing DAX calculations.

Fast forward a little into the future, and imagine these users developing meaningful and insightful PowerPivot reports.  These contain connections to a multitude of data sources and aggregate data in a way that provides enlightening information to decision makers.

Though not always aware of it, these users will have accomplished three things of tremendous value for Business Intelligence professionals:

·         Validated one or more sources of quality data

·         Validated a useful data model

·         Validated one or more business calculations

In other words, as they benefit and make use of ‘Self-Service BI’ technology they will unknowingly serve as ‘data explorers’ to the Business Intelligence and Data Warehousing team that benefits from their discoveries.

In the past, these ‘spreadsheet-based discoveries’ went unnoticed by members of the technology team (and became spreadmarts).  However, the Business Intelligence team members can now gain insight into popular spreadsheet reports across the enterprise (PowerPivot for SharePoint). In addition to that, they will be able to take these PowerPivot applications and open them in Visual Studio as a SQL Server Analysis Services Business Intelligence Semantic Models (BISM) in SQL Server 2011.  This is the server equivalent of a PowerPivot for Excel workbook: it is built using the same technology (the columnar based storage engine "Vertipaq") but with a number of robust features proper of an enterprise solution.

One of the great things of this new approach to BI that Microsoft is proposing is that it will most likely help close the gap between Excel power users and Business Intelligence professionals by creating an analytical environment built on the same technical foundation on both ends: DAX.  This will most certainly help avoid communication challenges that normally occur when both sides speak in totally different “languages”:  Excel formulas \ business calculations on one side, and T-SQL/MDX and OLAP on the other.

So what about BI prototyping? And how can we benefit from it even before SQL Server 2011 is released? Well, imagine having the ability to engage subject matter experts in their most familiar environment (Excel) and with their help be able to rapidly test assumptions about BI project deliverables, before starting development.  Rather than purely describing a calculation in words or spreadsheets disconnected from a database, we can instead quickly and iteratively build a sample of it tied to a data source and observe the calculation in action as its being sliced by multiple dimensions.  This can generate the necessary early insights that can help correct potential misunderstandings quickly and generate a more complete and accurate data model that can be used to guide actual development. And these insights have been already validated by the SMEs (in collaboration with the BI team).

As an example, imagine a dimension called “Orders”.  During the initial design phase, it becomes clear it must be added to the cube in order to generate the expected reports. Fast forward a few months (and many hours of development).  While building the final reports, the BI team becomes aware something that is missing:  Even though DimOrders is used in many reports (as business users started at the beginning of the project) it was built at a different grain that what reports require;  conditional logic is discovered – yes, at the last stages of the project – that treats some Order Details differently from others.  What to do? Different approaches are possible, from redesigning parts of the ETL to creating new measure groups based on named queries in order include the data that is missing. Architectural re-arrangements are going to have to be made in order to complete it on time; and any ‘quick’ solution will risk compromising the ad hoc nature of the final product (which will in turn limit the ability to use it for analysis).

How would this change if PowerPivot is used as a prototyping tool?  Using an Agile approach, during  early scrum meetings it will be possible to ‘sample’ the Order dimension against a copy of all relevant reports.  By loading a relatively small set of data into PowerPivot and designing ad hoc reports to mirror actual views across any functional existing reports, one is able to quickly uncover the fact that different grain is required for the dimension in question.  This work would probably be done in very small amount of time compared to what it will require to re-desing and re-develop parts of the ETL and OLAP solution.  Time – and money – can be saved while increase accuracy of the design.

How would it be possible to ‘mimic’ the dimension ‘Orders’ without building into a real SSAS cube? This is the most interesting part:  PowerPivot was built on SQL Server Analysis Services code. In other words, it is a version of SSAS.  All rules expected by SSAS when building cubes apply to PowerPivot as well; for example, dimensional tables in a star schema translate to ‘lookup’ tables in the PowerPivot window.  PowerPivot data is consumed by Excel using ‘cube’ interfaces  (MS OLAP provider, Excel CUBE functions, etc); However, PowerPivot was built for fast development which makes it an ideal tool for ‘mimicking’ an SSAS cube without the complexity of advanced server features.

I want to emphasize, we are not trying to build the entire BI project or corporate data warehouse in PowerPivot.  We are just ‘sampling’ the data model and ‘testing’ the different calculations in order to allow subject matter experts to help us gain a much clarity regarding the quality of the data, the relationships among the different data entities and the validity and relevance of the business calculations.   It is often the case that business users will start ‘refining’ their idea of what they want once they see a working model. 

This final result of a BI project that used PowerPivot for prototyping can very well be a traditional UDM/OLAP environment using MDX calculations. The discoveries gained during the prototyping phase have no direct relationship to the specific technology that will be implemented; they only offer pointers to a better architecture that will increase the chances of a successful implementation, one that will have a profound impact on the business.

A lot of Business Intelligence projects fail for two major reasons: lack of Business Sponsorship and poor Data Quality.  There is a good chance that using PowerPivot as a prototyping tool can help tackle these issues as subject matter experts use their familiar Excel environment while collaborating with BI professionals under the iterative Agile framework. Rapid dimension/measure sampling can quickly uncover data quality issues while establishing a solid data model that analysts can understand which gradually builds support for the BI vision (in contrast to solutions they perceive as overly complex and many times inaccurate, which is the mean reason why ‘spreadmarts’ exist in the first place).

04

Recently, I encountered a situation that required the use of SUMX as part of a DAX measure.  Interestingly, SUMX seemed to only be taking in the integer part of the decimal used as part of the expression.

DAX expression # 1:

SUMX(FactResellerSales, FactResellerSales[SalesAmount] * 0.9 )

The result is 0 for all the cells on the pivot table.





DAX expression # 2:

SUMX(FactResellerSales, FactResellerSales[SalesAmount] * 2.9 )

The result reflects the fact that DAX only used 2 in the multiplication (instead of 2.9)




Hopefully this will be fixed in the next release.

As a workaround, you can create a DAX calculated column (instead of a measure), as the decimal is used properly there:


 

 

 

 

 

 

 

24

One of the interesting DAX Time Intelligence functions is the TOTALYTD, which can define a measure calculation over a year in a cumulative way.

The same output can be achieved in alternate ways using the CALCULATE function.  Nevertheless, as it is a very common computation in Excel reports, the PowerPivot team has created a dedicated function in order to help developing YTD calculations much faster.

A TOTALYTD behavior that is worth nothing however, is the fact that if a filter context defines multiple years (perhaps accidentally), this function will then use the last year available in order to perform the aggregation.

The screenshots below show a simple pivot table using Adventure Works data.  The DAX measure defined for YTD aggregate is:

TOTALYTD(SUM([SalesAmount]), DimTime[FullDateAlternateKey], ALL(DimTime))

When the slicer selection specifies only one year, the regular aggregate of [Sales Amount] over each month as well as the DAX [YTD] measure act on that sole year.

 However when two years are selected, the regular aggregate of [Sales Amount] acts over the correct filter context (defined over two years), whereas the YTD measure uses only the last year (2002). 

 

At first it can be confusing, but if you select different sets of multiple years in the slicers it becomes evident pretty quick that the YTD is only showing the latest year of the set. 

 

Although it can be argued that it is a good assumption given that the TOTALYTD function is not meant to work over multiple years, it is still worth documenting this behavior and keeping it in mind in case a complex report produces unexpected results due to a YTD formula that is picking for you what year should go into the calculation.

07

 

PowerPivot and DAX are a powerful technologies, but there is still a good amount of work that can be done with traditional cube functions. In its current version, PowerPivot lacks the concept of dimension hierarchies; however very customized Excel reports that leverage cube hierarchies can still be built using MDX enabled cube functions. This approach goes beyond the limiting nature of traditional pivot tables by enabling power users to transverse dimensions using custom formulas that are a mix of MDX with traditional Excel functionality.
This approach can be used, for example, to determine the top city for each of the displayed countries in a traditional SSAS connected pivot table, as shown below (using Adventure Works 2008 SSAS cube)
 
By converting the pivot table to formulas, we free ourselves from any of the inherent pivot table limitations -
 
Notice that filters are still kept as dropdowns, which is a powerful feature as it allows the construction of very dynamic formula based reports.
 
Selecting the member Germany (cell A8 in screenshot above) shows us how Excel navigates the cube in order to retrieve the member name -
 
 
We can use a similar approach with the CUBESET Excel function in order to create an MDX set of the top cities by reseller sales for each of the countries displayed in the spreadsheet;  In this example, I leave the set name empty which will make it invisible to people viewing the spreadsheet cell values. However, even though there no value is displayed, the cell contains a reference to the set object now, and we can use reference it via Excel formulas in order to complete our report:
 
 
Using the CUBERANKEDMEMBER function, we retrieve the first member of the set and show the city name (In this example the set only contained one member)
 
Just as easily, a power user with understanding of hierarchies could slightly modify the MDX descendant formula to retrieve a different level of the hierarchy, like the top postal code (using the value of 3 for the descendants formula, in effect changing the level of granularity retrieved from the dimension)
 
Even though is a fairly simple technique, I hope this small article shows the potential of using MDX enabled cube formulas. More complicated calculations like weighted averages or trends can be also derived using this technique, enabling power users to quickly satisfy questions from decision makers and build highly customized reports. In turn, the BI team can leverage the successful and popular calculations and ‘promote’ them to the cube (which will most likely happen with DAX measures as well).

 

 My Blog Friends Minimize


    
Copyright 2004-2012 MSBICentral.com Terms Of Use Privacy Statement