Gautham KAMATH

BI Developer specializing in SSAS and MDX

Archive for the ‘MDX’ Category

Examining MDX Query performance using Block Computation

leave a comment »

In 2007 I read a very interesting blog by Mosha outlining how to Optimize a Count(Filter(…)) expression in MDX. Since then I have been particularly careful in writing MDX queries and also make sure to run it past the MDX Studio to check it. But quite frequently I come across queries on forums and blogs that take the machine and its resources for a ride and still question why aren’t my queries faster when I have enough resources (disk space, RAM, paging, processors etc.). Its quite simple at times to change the query but in most occasions it is simply not possible to change the way the query returns data. Lets first get a sense of how the Analysis Services processes MDX queries. There are two major parts of the Analysis Services engine.

Formula Engine: The Formula Engine (FE) processes and parses the incoming MDX query and figures out what data is required to satisfy them. This information is then forwarded to the Storage Engine.

Storage Engine: Apart from reading and writing of data the Storage Engine (SE) also retrieves and aggregates the data from the Analysis Services that the Formula Engine requests.

In a nut-shell, when an MDX query is executed, the query is passed to the formula engine where it is parsed then passed to the storage engine. The raw data comes back from the storage engine (with necessary aggregations) and the formula engine performs any calculations required on that data before publishing it.

The Formula Engine uses a mechanism called “SONAR” to create subcubes requests to the storage engine (In MDX Studio they appear as Sonar subcubes). The Storage Engine, with the help of subcube query events (in the profiler trace they appear as Query Subcube event class and in MDX Studio you can see them as SE Queries), uses the aggregation data cache or the aggregation data registry to retrieve the aggregated data. The data then comes back to the Formula Engine where the necessary calculations are applied before publishing it.

Now let me take up a simple query that returns the number of products where Product Line = Accessory and Product Color = Black that were sold by Date.

    WITH MEMBER [Measures].[Count Of Products] AS
        IIF({NonEmpty(
                    Exists([Product].[Product].[Product].Members,
                        ([Product].[Product Line].[Accessory], [Product].[Color].&[Black]))
                    , [Measures].[Internet Sales Amount]) AS [ProductSet]}.Count = 0
            , NULL, [ProductSet].Count)
    SELECT {[Measures].[Count Of Products]} ON 0,
    NON EMPTY [Date].[Date].[Date].Members ON 1
    FROM [Adventure Works]

The Performance Monitor counters on the MDX Studio show the following information.

On cold cache…..

        Time              : 1 sec 453 ms
        Calc covers       : 0
        Cells calculated  : 427
        Sonar subcubes    : 2393
        NON EMPTYs        : 1189
        Autoexists        : 1194
        EXISTINGs         : 0
        SE queries        : 1198
        Flat cache insert : 0
        Cache hits        : 2378
        Cache misses      : 27
        Cache inserts     : 11
        Cache lookups     : 2405
        Memory Usage KB   : 160

On warm cache…..

	Time              : 1 sec 78 ms
	Calc covers       : 0
	Cells calculated  : 390
	Sonar subcubes    : 2377
	NON EMPTYs        : 1189
	Autoexists        : 1188
	EXISTINGs         : 0
	SE queries        : 1188
	Flat cache insert : 0
	Cache hits        : 2379
	Cache misses      : 0
	Cache inserts     : 0
	Cache lookups     : 2379
	Memory Usage KB   : 4

As you can see there is no significant improvement in the query performance even in warm cache. The number of Sonar subcubes and Subcube Event queries (SE queries) fired does not change. The explanation is that having NonEmpty or Exists inside an MDX calculation is not a good idea. In our case the set inside the Exists does not change and hence context remains fixed. Almost all the Formula Engine requests (SONAR) are having cache hits and a significant number of Subcube Event queries (SE queries) are being fired. Now lets change the query and derive the calculated member using SUM function.

    WITH MEMBER [Measures].[Count Of Products] AS
        SUM([Product].[Product].[Product].Members,
            IIF(([Product].[Product Line].[Accessory],
                [Product].[Color].&[Black],
                [Measures].[Internet Sales Amount]) > 0, 1, NULL)
            )
    SELECT {[Measures].[Count Of Products]} ON 0,
    NON EMPTY [Date].[Date].[Date].Members ON 1
    FROM [Adventure Works]

The Performance Monitor counters show the following information.

On cold cache…..

	Time              : 875 ms
	Calc covers       : 0
	Cells calculated  : 427
	Sonar subcubes    : 14
	NON EMPTYs        : 1
	Autoexists        : 6
	EXISTINGs         : 0
	SE queries        : 10
	Flat cache insert : 0
	Cache hits        : 4
	Cache misses      : 29
	Cache inserts     : 12
	Cache lookups     : 33
	Memory Usage KB   : 848

As you can see there is significant reduction in number of formula engine requests (SONAR subcubes) and subcube event queries (SE queries) even though the cells calculated remains same. This is because now the query is executing under block-computation mode rather than the cell-by-cell computation.

Block-computation allows MDX queries to calculate a cell value once and use it numerous times. Please note that this does not work in all situations, but wherever possible it should be used.

Now if you u look at the same query in the warm cache scenario there is only one SONAR subcube request and SE queries and cache hits fills up the rest.

On warm cache…..

	Time              : 578 ms
	Calc covers       : 0
	Cells calculated  : 390
	Sonar subcubes    : 1
	NON EMPTYs        : 1
	Autoexists        : 0
	EXISTINGs         : 0
	SE queries        : 1
	Flat cache insert : 0
	Cache hits        : 7
	Cache misses      : 0
	Cache inserts     : 0
	Cache lookups     : 7
	Memory Usage KB   : 0

EXISTING and Block-Computation mode

EXISTING is an operator that can also be best avoided but certain situations may force you to use it. Block-computation with EXISTING is also possible that will give a performance boost to the query. Consider the following query

    WITH MEMBER [Measures].[Count Of Products] AS
        COUNT(EXISTING NONEMPTY([Product].[Product].[Product].Members
                                , [Measures].[Internet Sales Amount]))
    SELECT { [Measures].[Count Of Products], [Measures].[Internet Sales Amount]} ON 0,
    NONEMPTY(
        {[Product].[Product Line].&[S]}
        * {[Product].[Color].&[Black]}
        * {[Date].[Date].[Date].Members}
    ,[Measures].[Internet Sales Amount]) ON 1
    FROM [Adventure Works]

The Performance Monitor counters on the MDX Studio show the following information.

	Time              : 1 sec 328 ms
	Calc covers       : 0
	Cells calculated  : 817
	Sonar subcubes    : 798
	NON EMPTYs        : 391
	Autoexists        : 6
	EXISTINGs         : 390
	SE queries        : 403
	Flat cache insert : 390
	Cache hits        : 422
	Cache misses      : 746
	Cache inserts     : 370
	Cache lookups     : 1168
	Memory Usage KB   : 80

The warm cache for this query does not trouble the numbers much. Now lets change the query to use the block-computation mode.

    WITH MEMBER [Measures].[Count Of Products] AS
        SUM(EXISTING [Product].[Product].[Product].Members ,
            IIF([Measures].[Internet Sales Amount] > 0, 1, NULL))
    SELECT { [Measures].[Count Of Products], [Measures].[Internet Sales Amount]} ON 0,
    NONEMPTY(
        {[Product].[Product Line].&[S]}
        * {[Product].[Color].&[Black]}
        * {[Date].[Date].[Date].Members}
    ,[Measures].[Internet Sales Amount]) ON 1
    FROM [Adventure Works]

On cold cache the following are the perfmon numbers from MDX Studio.

	Time              : 421 ms
	Calc covers       : 0
	Cells calculated  : 817
	Sonar subcubes    : 19
	NON EMPTYs        : 1
	Autoexists        : 6
	EXISTINGs         : 2
	SE queries        : 13
	Flat cache insert : 0
	Cache hits        : 8
	Cache misses      : 32
	Cache inserts     : 12
	Cache lookups     : 40
	Memory Usage KB   : 80

Check how the number of EXISTING usage has changed from 390 to 2. Now on the warm cache, even though the EXISTING usage remains the same, most of the query data is drawn from the cache.

	Time              : 62 ms
	Calc covers       : 0
	Cells calculated  : 780
	Sonar subcubes    : 5
	NON EMPTYs        : 1
	Autoexists        : 0
	EXISTINGs         : 2
	SE queries        : 2
	Flat cache insert : 0
	Cache hits        : 10
	Cache misses      : 2
	Cache inserts     : 0
	Cache lookups     : 12
	Memory Usage KB   : 0

To summarize, it is important to understand how the Analysis Services engine analyses MDX queries and very important to find a way to use block-computation wherever possible.

Written by Gautham KAMATH

08/02/2012 at 10:07

Too many Attributes in the SSAS dimension can spoil the MDX query

leave a comment »

All of us have heard about the importance of having the correct and appropriate attribute relationships and their impact on MDX query performance. There are a few instances when having too many attributes and attribute relationships can backfire. One such instance is with the Explicit and Implicit conversions. Let me explain.

Attribute relationships:

Attribute relationships create dependencies between attributes. In other words, if City has a RELATED attribute State, then City has a many to one relation with State and conversely State has a one to many relation with City. More specifically, if the current city is Seattle, then we know the State must be Washington.

Related and Relating relationship:

If State is RELATED to City, then City is said to be a RELATING to State.

Because RELATING attribute has a many-to-one relation with RELATED attribute the coordinates for RELATING attributes (City in this case) move to [All] member. For RELATED attribute, that has a one-to-many relation with the RELATING attribute, the coordinates move to one specific member.

Explicit and Implicit Overwrite:

An Explicit overwrite takes place in an expression. Related and Relating attributes undergo an Implicit overwrite. Consider the following query

     WITH MEMBER [Measures].[x] AS ([Measures].[Internet Sales Amount]),
                                    FORMAT_STRING = "currency"
     SELECT {[Measures].[x], [Measures].[Internet Sales Amount]} on 0
     FROM [Adventure Works]
     WHERE [Customer].[Customer Geography].[City].&[Seattle]&[WA]

It is a simple MDX query that returns the Internet Sales Amount for the City of Seattle in Washington.

x Internet Sales Amount
$75,164.86 $75,164.86

Now lets play with it and add [Customer].[Stare-Province].CurrentMemberto the calculated member expression

     WITH MEMBER [Measures].[x] AS ([Measures].[Internet Sales Amount],
                                    [Customer].[State-Province].CurrentMember),
                                    FORMAT_STRING = "currency"
     SELECT {[Measures].[x], [Measures].[Internet Sales Amount]} on 0
     FROM [Adventure Works]
     WHERE [Customer].[Customer Geography].[City].&[Seattle]&[WA]

The query returns the following results

x Internet Sales Amount
$2,467,248.34 $75,164.86

As we know WHERE clause reduces the dimensionality of cube space and hence changes the cube context for the query. What has happened here is

  • [City].[Seattle] is current context (WHERE Clause reduces the dimensionality of cube space and performs an implicit EXISTS with the AXES).
  • Explicit conversion of [State-Province].CURRENTMEMBER to [State-Province].[Washington] (one member from State)
  • Because [State-Province] is related to [City], [State-Province].[Washington] remains as it is (one member from State).
  • Now because [City] is relating to [State-Province], [City].[Seattle] gets implicitly converted to [City].[ALL Customers]
  • Hence the tuple expression for [Measures].[x] becomes ([State-Province].[Washington], [City].[ALL Customers], [Measures].[Internet Sales Amount])

Out of curiosity, let us see what happens when we have subselects instead of WHERE clause?

     WITH MEMBER [Measures].[x] AS ([Measures].[Internet Sales Amount],
                                    [Customer].[State-Province].CurrentMember),
                                    FORMAT_STRING = "currency"
     SELECT {[Measures].[x], [Measures].[Internet Sales Amount]} on 0
     FROM  (SELECT [Customer].[Customer Geography].[City].&[Seattle]&[WA] ON 0
           FROM [Adventure Works]) 
x Internet Sales Amount
$29,358,677.22 $75,164.86

Subselects do not reduce dimensionality of cube space and therefore do not change the cube context in which the MDX expressions are to be evaluated. In this case there is no Implicit EXISTS between the Subselect and the AXES and hence the [State-Province].CurrentMember gets Explicitly converted to All member [State-Province].[ALL Customers]. The tuple expression for [Measures].[x] in this query becomes ([State-Province].[ALL Customers], [Measures].[Internet Sales Amount]).

So far so good. This is as per design and how it should behave. Now let us look at the above mentioned scenario but with different dimension ([Product]) and members. Consider the following query

     WITH MEMBER [Measures].[x] AS ([Measures].[Internet Sales Amount]),
                                    FORMAT_STRING = "currency"
     SELECT {[Measures].[x], [Measures].[Internet Sales Amount]} on 0
     FROM [Adventure Works]
     WHERE [Product].[Product Categories].[Product].[Mountain-100 Silver, 38]

Again a plain and simple MDX query that returns the Internet Sales Amount for the Product “Mountain-100 Silver, 38”.

x Internet Sales Amount
$197,199.42 $197,199.42

If we tweek the expression

     WITH MEMBER [Measures].[x] AS ([Measures].[Internet Sales Amount],
                                    [Product].[Subcategory].CurrentMember),
                                    FORMAT_STRING = "currency"
     SELECT {[Measures].[x], [Measures].[Internet Sales Amount]} on 0
     FROM [Adventure Works]
     WHERE [Product].[Product Categories].[Product].[Mountain-100 Silver, 38]

We get the same result as the previous expression.

x Internet Sales Amount
$197,199.42 $197,199.42

Result does not seem to be correct as we were expecting $9,952,759.56 but we are getting $197,199.42. It is supposed to be doing the same thing as I mentioned before

  • [Product].[Mountain-100 Silver, 38] is current context (WHERE Clause performs an implicit EXISTS with the AXES).
  • Explicit conversion of [Subcategory].CURRENTMEMBER to [Subcategory].[Mountain Bikes] (one member from Subcategory)
  • Because [Subcategory] is related to [Product], [Subcategory].[Mountain Bikes] remains as it is (one member from Subcategory).
  • Also because [Product] is relating to [Subcategory], [Product].[Mountain-100 Silver, 38] gets implicitly converted to [Product].[ALL Products]
  • Hence the tuple for [Measures].[x] becomes ([Subcategory].[Mountain Bikes], [Product].[ALL Products]).

Yet the result seems to be incorrect. Why? This is because there are TOO MANY ATTRIBUTES in [Product] dimenion and most of the attributes are RELATED to the Product level (leaf level). All those attributes get set by auto-exists when filtering to a specific Product. The following is the subcube information from profiler when the query is fired.

00000000, 000, 000, 010111111111111111111, 000000000000000000000, 00, 100, 000000000000000000000, 000000000000000000000, 000000000000000000000, 00

It equates to the following.

     Dimension:   Product (21)
          Attribute:    Standard Cost
          Attribute:    Color
          Attribute:    Safety Stock Level
          Attribute:    Reorder Point
          Attribute:    List Price
          Attribute:    Size
          Attribute:    Size Range
          Attribute:    Weight
          Attribute:    Days to Manufacture
          Attribute:    Dealer Price
          Attribute:    Class
          Attribute:    Style
          Attribute:    Model Name
          Attribute:    Product Line
          Attribute:    Subcategory
          Attribute:    Status
          Attribute:    Start Date
          Attribute:    End Date
          Attribute:    Large Photo
     Dimension:   Destination Currency (3)
          Attribute:    Destination Currency

There are 21 attributes in the Product dimension of which 15 attributes are related to the Product level. What it means is that the above 15 attributes perform auto exist at their respective all level with the Product Level (leaf level) because of the attribute relationship. In context of the query featuring City and State there was only one related attribute (State) at the City level and hence the issue did not arise there.

ROOT function:

The desired behaviour is got back by application of ROOT function. As defined by mdsn, the ROOT function returns a tuple that consists of the All members from each attribute hierarchy within the current scope in a cube, dimension, or tuple. So by modifying the above query in the following way

     WITH MEMBER [Measures].[x] AS ([Measures].[Internet Sales Amount],
                                    ROOT([Product].[Subcategory].CurrentMember)),
                                    FORMAT_STRING = "currency"
     SELECT {[Measures].[x], [Measures].[Internet Sales Amount]} on 0
     FROM [Adventure Works]
     WHERE [Product].[Product Categories].[Product].[Mountain-100 Silver, 38]

we get the desired results

x Internet Sales Amount
$9,952,759.56 $75,164.86

The profiler subcube trace reads the following

00000000, 000, 000, 000000000000000010000, 000000000000000000000, 00, 100, 000000000000000000000, 000000000000000000000, 000000000000000000000, 00

and it equates to

     Dimension:   Product (21)
          Attribute:    Subcategory
     Dimension:   Destination Currency (3)
          Attribute:    Destination Currency

P/S To read the profiler trace I have used a tool created by James Snape

To summarize:

  • Have the correct attribute relationships.
  • Be particularly careful with CurrentMember and know where the Explicit and Implicit overwrite takes place.
  • As far as possible keep the attributes to the minimum.

Written by Gautham KAMATH

04/12/2011 at 17:27

Always use {} and ()

leave a comment »

When SSAS 2005 was released, MDX has a very nice change (at least it looked nice that time). The change was that it was no longer necessary to be very strict about i.e curly bracing {} or bracketing () the MDX syntax. This leniency sounded fun at that time but now after years relying in the intrinsic conversion between members, tuples and sets, it does not seem all that good now. Let me explain.

Consider the following simple query

     SELECT {[Measures].[Internet Sales Amount]} ON 0,
     {[Customer].[Customer Geography].[City].&[Berkshire]&[ENG]
     * [Product].[Product Categories].[Category].Members
     * [Date].[Calendar].[Calendar Year].[CY 2008]} ON 1
     FROM [Adventure Works]

The results are good.

But if we swap a couple of members ([Date] and [Product]) in the ROWS axis, it bangs.

     SELECT {[Measures].[Internet Sales Amount]} ON 0,
     {[Customer].[Customer Geography].[City].&[Berkshire]&[ENG]
     * [Date].[Calendar].[Calendar Year].[CY 2008]
     * [Product].[Product Categories].[Category].Members} ON 1
     FROM [Adventure Works]

The error reads…

Executing the query …
Query (2, 2) The * function expects a tuple set expression for the 1 argument. A string or numeric expression was used.
Execution complete

There is so much internal conversion going on between members, tuples, and sets that interpretation and construction of MDX statements have become more complex.
The row and column axes require sets. CrossJoin statement also require sets and neither of the above queries properly declares it.
The lesson: Always declare your sets with {} and tuples with ().

So if the above query is written as

     SELECT {[Measures].[Internet Sales Amount]} ON 0,
     {[Customer].[Customer Geography].[City].&[Berkshire]&[ENG]}
     * {[Date].[Calendar].[Calendar Year].[CY 2008]}
     * {[Product].[Product Categories].[Category].Members} ON 1
     FROM [Adventure Works]

…everyone is happy

Please note that by contrast the WHERE clause is not as lenient as the rest of the query. A single hierarchy in the WHERE clause without round brackets is acceptable as the member will be explicitly converted to a tuple. Modify it to reference a second hierarchy and add round brackets, it will not like it as it expects curly braces {}.

Written by Gautham KAMATH

03/11/2011 at 18:36

Posted in MDX

Tagged with , , ,