NewsArticles Minimize
03

Although PowerPivot – in its currently implementation – lacks a dedicated ‘median’ function, it is possible to recreate it using existing functionality.   There is a great blog entry here by Marco Russo where he describes how to calculate median over the values of a PowerPivot column. 

His calculation is as follows:

MINX( FILTER( VALUES( People[Age] ),
              CALCULATE( COUNTROWS( People ),
                         People[Age] <= EARLIER( People[Age] ) ) 
              > COUNTROWS( People ) / 2 ),
      People[Age] )

Recently I tried to adapt this calculation to a PowerPivot application I was working on.  I ran into two issues:

1) The calculation can be applied over a regular column on a table in the PowerPivot window or a DAX calculated column. However, what if you must calculated the median over the output of a DAX measure instead of a regular or calculated column?  In this case, you won’t be able to use the DAX functions EARLIER and VALUES, as these only take columns as parameters.  Due to this, the calculation above won’t work

2) I realized there should be a separate logic depending on whether the set of values is even or odd.   For even sets, there is no discrete middle value and one must calculate an average of both values in the middle.

Fortunately, it is fairly easy to deal with both issues.  With Alberto Ferrari’s help, I implemented the calculation below.  There are three steps for it, the first one which implies the creation of an index, which we will use to select the values in the middle.  I then created an alternate execution path for even sets, which simply averages the two values in the middle.

To see how it works, copy and past the data below and import it into PowerPivot (as a linked table):

Month

Amount

January

1,317,542

February

2,384,847

March

1,563,955

April

1,865,278

May

2,880,753

June

1,987,873

July

2,665,651

August

4,212,972

September

4,047,574

October

2,282,116

November

3,483,161

December

3,510,949

Step 1

Create a measure.  This is to test that we can calculate a median over a column or measure using the same method.

[Test Measure] =SUM([Amount]) / 2

image

Step 2

Create another measure which will serve to rank the months by the value in [Test Measure]. We will use this ranking value to determine which value is the median (in case of odd sets), or to average the two values in the middle (in case of even sets)

[Index] =IF (
    COUNTROWS ( VALUES ( Table1[Month] ) ) = 1,
    COUNTROWS (
        FILTER (
            ALL ( Table1[Month] ),
            [Test Measure] <= CALCULATE (
               [Test Measure], Table1[Month] = VALUES ( Table1[Month] )
            )
        )
    )
)
image

Step 3

Create median expression, reflecting the logic discussed above:

[Median] =IF( COUNTROWS( VALUES( Table1[Month] ) ) > 1,
     IF( MOD( COUNTROWS( ALL( Table1[Month] ) ) ,2 ) = 0,
          (
            CALCULATE( MAXX( Table1, [Test Measure] ),
                FILTER(
                 ALL(Table1[Month]),
                 [Index] <= ROUND( COUNTROWS(ALL(Table1[Month])) / 2, 0)
                )
            ) +
            CALCULATE( MINX( Table1, [Test Measure] ),
                FILTER(
                 ALL(Table1[Month]),
                 [Index] > ROUND( COUNTROWS(ALL(Table1[Month])) / 2, 0)
                )
            )
          ) / 2
  ,
          CALCULATE(
           [Test Measure],
            FILTER(
             ALL(Table1[Month]),
             [Index] = ROUND( COUNTROWS(ALL(Table1[Month])) / 2, 0)
            )
          )
        )
    )

The calculation first uses the MOD function to determine if the set is even (which is the case in our example).  It then gets the two values in the middle by:

1) Filtering the months for all that have an index of less or equal to the count of months divided by two.  Take the maximum value of [Test Measure] for this set.

2) Filter the months for all that have an index greater than the count of months divided by two. Take the minimum value of [Test Measure] for this set

3) add these values and divide by two (simple average)

image

The calculation is correct and matches Excel’s regular median function output. One word of caution: this expression relies heavily on row context which can greatly impact performance over big datasets.    In the following blog post, I will talk about a potential way around this issue in which we can arrive to correct the Median output by leveraging PowerPivot’s intrinsic compatibility with MDX.

powered by metaPost
Posted in: Blog

Post Comment

Only registered users may post comments.

LatestArticles Minimize
Computed column 'mydate' in table 'test2' Cannot be Persisted because the Column is Non-deterministic. by Wayne Snyder
How to create a calculated column which converts a string to a date and persist it. The problem is that the calculated column must be deterministic.

SSAS Process Full from SSIS with Parameter for DatabaseName by Wayne Snyder
We have recently had a need to process tabular models programmatically, but we needed it to be easily done by an end user – self service is the key – right? To solve this problem we have a...

Using Linked Tables in PowerPivot by Wayne Snyder
PowerPivot allows you to create a table in an Excel worksheet, and import it into your PowerPivot model. You can import data from Excel spreadsheets, but this differs in that the Excel data we wish to...

Errors Creating and Maintaining Relationships with PowerPivot Tables by Wayne Snyder
In this article you will learn how to create relationships in PowerPivot. More Importantly, you will learn about many-to-many relationships. Since PowerPivot only supports 1-to-many relationships, you...

Importing Flat Files into PowerPivot– Overcoming Problems by Wayne Snyder
You may run into issues when trying to import flat files into PowerPivot. This article documents some of the issues, and includes workarounds when available. As I bump into new problems, I will try to...

Importing Flat Files Into PowerPivot by Wayne Snyder
Bringing flat files into your Powerpivot model is easy. We will go through the process step by step, so that you can begin mashing up data to get your job done more quickly, more effectively, and with...

Changing Granularity of Leaf Level Calculations in SSAS Tabular by Jason Thomas

Finding Nearest Stores using SSRS Map Reports by Jason Thomas

Heat Maps for SSRS using Map Control by Jason Thomas

Parameters for Analysis Services Reporting: Introduction, Pt. 3 by William Pearson
BI Architect Bill Pearson continues an extended examination of parameterization within Analysis Services reports. In this, Part 3 of the article, we continue to get hands-on practice cr...


Advertisement Minimize

Advertisement Minimize

Copyright 2004-2013 MSBICentral.com Terms Of Use Privacy Statement