Examining MDX Query performance using Block Computation

In 2007 I read a very interesting blog by Mosha outlining how to Optimize a Count(Filter(…)) expression in MDX. Since then I have been particularly careful in writing MDX queries and also make sure to run it past the MDX Studio to check it. But quite frequently I come across queries on forums and blogs that take the machine and its resources for a ride and still question why aren’t my queries faster when I have enough resources (disk space, RAM, paging, processors etc.). Its quite simple at times to change the query but in most occasions it is simply not possible to change the way the query returns data. Lets first get a sense of how the Analysis Services processes MDX queries. There are two major parts of the Analysis Services engine.

Formula Engine: The Formula Engine (FE) processes and parses the incoming MDX query and figures out what data is required to satisfy them. This information is then forwarded to the Storage Engine.

Storage Engine: Apart from reading and writing of data the Storage Engine (SE) also retrieves and aggregates the data from the Analysis Services that the Formula Engine requests.

In a nut-shell, when an MDX query is executed, the query is passed to the formula engine where it is parsed then passed to the storage engine. The raw data comes back from the storage engine (with necessary aggregations) and the formula engine performs any calculations required on that data before publishing it.

The Formula Engine uses a mechanism called “SONAR” to create subcubes requests to the storage engine (In MDX Studio they appear as Sonar subcubes). The Storage Engine, with the help of subcube query events (in the profiler trace they appear as Query Subcube event class and in MDX Studio you can see them as SE Queries), uses the aggregation data cache or the aggregation data registry to retrieve the aggregated data. The data then comes back to the Formula Engine where the necessary calculations are applied before publishing it.

Now let me take up a simple query that returns the number of products where Product Line = Accessory and Product Color = Black that were sold by Date.

    WITH MEMBER [Measures].[Count Of Products] AS
        IIF({NonEmpty(
                    Exists([Product].[Product].[Product].Members,
                        ([Product].[Product Line].[Accessory], [Product].[Color].&[Black]))
                    , [Measures].[Internet Sales Amount]) AS [ProductSet]}.Count = 0
            , NULL, [ProductSet].Count)
    SELECT {[Measures].[Count Of Products]} ON 0,
    NON EMPTY [Date].[Date].[Date].Members ON 1
    FROM [Adventure Works]

The Performance Monitor counters on the MDX Studio show the following information.

On cold cache…..

        Time              : 1 sec 453 ms
        Calc covers       : 0
        Cells calculated  : 427
        Sonar subcubes    : 2393
        NON EMPTYs        : 1189
        Autoexists        : 1194
        EXISTINGs         : 0
        SE queries        : 1198
        Flat cache insert : 0
        Cache hits        : 2378
        Cache misses      : 27
        Cache inserts     : 11
        Cache lookups     : 2405
        Memory Usage KB   : 160

On warm cache…..

	Time              : 1 sec 78 ms
	Calc covers       : 0
	Cells calculated  : 390
	Sonar subcubes    : 2377
	NON EMPTYs        : 1189
	Autoexists        : 1188
	EXISTINGs         : 0
	SE queries        : 1188
	Flat cache insert : 0
	Cache hits        : 2379
	Cache misses      : 0
	Cache inserts     : 0
	Cache lookups     : 2379
	Memory Usage KB   : 4

As you can see there is no significant improvement in the query performance even in warm cache. The number of Sonar subcubes and Subcube Event queries (SE queries) fired does not change. The explanation is that having NonEmpty or Exists inside an MDX calculation is not a good idea. In our case the set inside the Exists does not change and hence context remains fixed. Almost all the Formula Engine requests (SONAR) are having cache hits and a significant number of Subcube Event queries (SE queries) are being fired. Now lets change the query and derive the calculated member using SUM function.

    WITH MEMBER [Measures].[Count Of Products] AS
        SUM([Product].[Product].[Product].Members,
            IIF(([Product].[Product Line].[Accessory],
                [Product].[Color].&[Black],
                [Measures].[Internet Sales Amount]) > 0, 1, NULL)
            )
    SELECT {[Measures].[Count Of Products]} ON 0,
    NON EMPTY [Date].[Date].[Date].Members ON 1
    FROM [Adventure Works]

The Performance Monitor counters show the following information.

On cold cache…..

	Time              : 875 ms
	Calc covers       : 0
	Cells calculated  : 427
	Sonar subcubes    : 14
	NON EMPTYs        : 1
	Autoexists        : 6
	EXISTINGs         : 0
	SE queries        : 10
	Flat cache insert : 0
	Cache hits        : 4
	Cache misses      : 29
	Cache inserts     : 12
	Cache lookups     : 33
	Memory Usage KB   : 848

As you can see there is significant reduction in number of formula engine requests (SONAR subcubes) and subcube event queries (SE queries) even though the cells calculated remains same. This is because now the query is executing under block-computation mode rather than the cell-by-cell computation.

Block-computation allows MDX queries to calculate a cell value once and use it numerous times. Please note that this does not work in all situations, but wherever possible it should be used.

Now if you u look at the same query in the warm cache scenario there is only one SONAR subcube request and SE queries and cache hits fills up the rest.

On warm cache…..

	Time              : 578 ms
	Calc covers       : 0
	Cells calculated  : 390
	Sonar subcubes    : 1
	NON EMPTYs        : 1
	Autoexists        : 0
	EXISTINGs         : 0
	SE queries        : 1
	Flat cache insert : 0
	Cache hits        : 7
	Cache misses      : 0
	Cache inserts     : 0
	Cache lookups     : 7
	Memory Usage KB   : 0

EXISTING and Block-Computation mode

EXISTING is an operator that can also be best avoided but certain situations may force you to use it. Block-computation with EXISTING is also possible that will give a performance boost to the query. Consider the following query

    WITH MEMBER [Measures].[Count Of Products] AS
        COUNT(EXISTING NONEMPTY([Product].[Product].[Product].Members
                                , [Measures].[Internet Sales Amount]))
    SELECT { [Measures].[Count Of Products], [Measures].[Internet Sales Amount]} ON 0,
    NONEMPTY(
        {[Product].[Product Line].&[S]}
        * {[Product].[Color].&[Black]}
        * {[Date].[Date].[Date].Members}
    ,[Measures].[Internet Sales Amount]) ON 1
    FROM [Adventure Works]

The Performance Monitor counters on the MDX Studio show the following information.

	Time              : 1 sec 328 ms
	Calc covers       : 0
	Cells calculated  : 817
	Sonar subcubes    : 798
	NON EMPTYs        : 391
	Autoexists        : 6
	EXISTINGs         : 390
	SE queries        : 403
	Flat cache insert : 390
	Cache hits        : 422
	Cache misses      : 746
	Cache inserts     : 370
	Cache lookups     : 1168
	Memory Usage KB   : 80

The warm cache for this query does not trouble the numbers much. Now lets change the query to use the block-computation mode.

    WITH MEMBER [Measures].[Count Of Products] AS
        SUM(EXISTING [Product].[Product].[Product].Members ,
            IIF([Measures].[Internet Sales Amount] > 0, 1, NULL))
    SELECT { [Measures].[Count Of Products], [Measures].[Internet Sales Amount]} ON 0,
    NONEMPTY(
        {[Product].[Product Line].&[S]}
        * {[Product].[Color].&[Black]}
        * {[Date].[Date].[Date].Members}
    ,[Measures].[Internet Sales Amount]) ON 1
    FROM [Adventure Works]

On cold cache the following are the perfmon numbers from MDX Studio.

	Time              : 421 ms
	Calc covers       : 0
	Cells calculated  : 817
	Sonar subcubes    : 19
	NON EMPTYs        : 1
	Autoexists        : 6
	EXISTINGs         : 2
	SE queries        : 13
	Flat cache insert : 0
	Cache hits        : 8
	Cache misses      : 32
	Cache inserts     : 12
	Cache lookups     : 40
	Memory Usage KB   : 80

Check how the number of EXISTING usage has changed from 390 to 2. Now on the warm cache, even though the EXISTING usage remains the same, most of the query data is drawn from the cache.

	Time              : 62 ms
	Calc covers       : 0
	Cells calculated  : 780
	Sonar subcubes    : 5
	NON EMPTYs        : 1
	Autoexists        : 0
	EXISTINGs         : 2
	SE queries        : 2
	Flat cache insert : 0
	Cache hits        : 10
	Cache misses      : 2
	Cache inserts     : 0
	Cache lookups     : 12
	Memory Usage KB   : 0

To summarize, it is important to understand how the Analysis Services engine analyses MDX queries and very important to find a way to use block-computation wherever possible.

Written by Gautham KAMATH

08/02/2012 at 10:07

Posted in Analysis Services, MDX

Tagged with block computation, Formula Engine, query subcube, SE queries, sonar, Storage Engine, subcube, subcube query event

Gautham KAMATH