Minimize
03

Although PowerPivot – in its currently implementation – lacks a dedicated ‘median’ function, it is possible to recreate it using existing functionality.   There is a great blog entry here by Marco Russo where he describes how to calculate median over the values of a PowerPivot column. 

His calculation is as follows:

MINX( FILTER( VALUES( People[Age] ),
              CALCULATE( COUNTROWS( People ),
                         People[Age] <= EARLIER( People[Age] ) ) 
              > COUNTROWS( People ) / 2 ),
      People[Age] )

Recently I tried to adapt this calculation to a PowerPivot application I was working on.  I ran into two issues:

1) The calculation can be applied over a regular column on a table in the PowerPivot window or a DAX calculated column. However, what if you must calculated the median over the output of a DAX measure instead of a regular or calculated column?  In this case, you won’t be able to use the DAX functions EARLIER and VALUES, as these only take columns as parameters.  Due to this, the calculation above won’t work

2) I realized there should be a separate logic depending on whether the set of values is even or odd.   For even sets, there is no discrete middle value and one must calculate an average of both values in the middle.

Fortunately, it is fairly easy to deal with both issues.  With Alberto Ferrari’s help, I implemented the calculation below.  There are three steps for it, the first one which implies the creation of an index, which we will use to select the values in the middle.  I then created an alternate execution path for even sets, which simply averages the two values in the middle.

To see how it works, copy and past the data below and import it into PowerPivot (as a linked table):

Month

Amount

January

1,317,542

February

2,384,847

March

1,563,955

April

1,865,278

May

2,880,753

June

1,987,873

July

2,665,651

August

4,212,972

September

4,047,574

October

2,282,116

November

3,483,161

December

3,510,949

Step 1

Create a measure.  This is to test that we can calculate a median over a column or measure using the same method.

[Test Measure] =SUM([Amount]) / 2

image

Step 2

Create another measure which will serve to rank the months by the value in [Test Measure]. We will use this ranking value to determine which value is the median (in case of odd sets), or to average the two values in the middle (in case of even sets)

[Index] =IF (
    COUNTROWS ( VALUES ( Table1[Month] ) ) = 1,
    COUNTROWS (
        FILTER (
            ALL ( Table1[Month] ),
            [Test Measure] <= CALCULATE (
               [Test Measure], Table1[Month] = VALUES ( Table1[Month] )
            )
        )
    )
)
image

Step 3

Create median expression, reflecting the logic discussed above:

[Median] =IF( COUNTROWS( VALUES( Table1[Month] ) ) > 1,
     IF( MOD( COUNTROWS( ALL( Table1[Month] ) ) ,2 ) = 0,
          (
            CALCULATE( MAXX( Table1, [Test Measure] ),
                FILTER(
                 ALL(Table1[Month]),
                 [Index] <= ROUND( COUNTROWS(ALL(Table1[Month])) / 2, 0)
                )
            ) +
            CALCULATE( MINX( Table1, [Test Measure] ),
                FILTER(
                 ALL(Table1[Month]),
                 [Index] > ROUND( COUNTROWS(ALL(Table1[Month])) / 2, 0)
                )
            )
          ) / 2
  ,
          CALCULATE(
           [Test Measure],
            FILTER(
             ALL(Table1[Month]),
             [Index] = ROUND( COUNTROWS(ALL(Table1[Month])) / 2, 0)
            )
          )
        )
    )

The calculation first uses the MOD function to determine if the set is even (which is the case in our example).  It then gets the two values in the middle by:

1) Filtering the months for all that have an index of less or equal to the count of months divided by two.  Take the maximum value of [Test Measure] for this set.

2) Filter the months for all that have an index greater than the count of months divided by two. Take the minimum value of [Test Measure] for this set

3) add these values and divide by two (simple average)

image

The calculation is correct and matches Excel’s regular median function output. One word of caution: this expression relies heavily on row context which can greatly impact performance over big datasets.    In the following blog post, I will talk about a potential way around this issue in which we can arrive to correct the Median output by leveraging PowerPivot’s intrinsic compatibility with MDX.

powered by metaPost
Posted in: Blog

Post Comment

Only registered users may post comments.

MSBiCentral Bloggers Minimize

Latest Entries Minimize
Using VBA to enhance interactivity of PowerPivot reports by JavierGuillen
Presenting for PowerPivot Analytics Expo (hosted by SSWUG on July 15) by JavierGuillen
PowerPivot Analytics Expo
Quartile, Percentile and Median in PowerPivot by JavierGuillen
Using MDX in a PowerPivot model
Median calculation in PowerPivot / DAX by JavierGuillen
How to calculate median over a PowerPivot measure
Calculating moving averages in DAX by JavierGuillen
After SSRS 2008 R2 Upgrade: Some Reports Hang by wsnyder
You just upgraded to SSRS 2008 R2 and some reports are hanging up, while others seem to work fine. How do you debug for this, and what is the fix?
SSRS SystemOverflowexception: Value was either too large or too small for an Int32 by wsnyder
You install or upgrade to SSRS 2008R2 Gold, and get the following error "SSRS SystemOverflowexception: Value was either too large or too small for an Int32". Here is what is going on....
PASS Virtual Chapter Presentation Slides on PowerPivot / DAX by JavierGuillen
Presentation slides for PASS VC webinar on PowerPivot and DAX
Business Intelligence Prototyping with PowerPivot by JavierGuillen
PowerPivot can be a really powerful tool for prototyping Business Intelligence projects.
Speaking for PASS Business Intelligence Virtual Chapter by JavierGuillen
Page 1 of 3First   Previous   [1]  2  3  Next   Last   

Copyright 2004-2012 MSBICentral.com Terms Of Use Privacy Statement