Gautham KAMATH

BI Developer specializing in SSAS and MDX

Archive for February 2012

Processing using Snapshot Isolation in Analysis Services (SSAS)

with 7 comments

Just a few months ago I was hit with an issue of dimension processing failure. The error read

Errors in the OLAP storage engine: The attribute key cannot be found when processing: Table: ‘<<TableName>>’, Column: ‘<<ColumnName>>’, Value: ‘XXXXXXX’. The attribute is ‘<<AttributeName>>’.

On investigation I found that constant updates to the dimension were the main cause. One of my dimensions had as many as 15 attributes and the processing failed when any update occurred during dimension processing. Generally speaking, the first attribute’s SELECT DISTINCT statement would conflict with the SELECT DISTINCTs of other attributes because of the constant update.

One way to get around this is was to change the ProcessingGroup property of the dimension to be ByTable (instead of the default value of ByAttribute). As the ByTable option takes the entire table data for that dimension into memory, it would have to process the same data without having to worry about the constant updates. The new updates would then get incorporated in the next processing cycle.

This option did not work as the ByTable option works only when you have only a solitary table to draw the data from. Multiple tables in star schema or snow-flake schema or a RDBMS view that encapsulates multiple tables will issue multiple queries against the data source.

P/S Try processing the Product and Geography dimensions using ByAttribute and ByTable options. The profile trace for ByTable ProcessingGroup mode for Product dimension will show you multiple SELECT DISTINCT statements being fired whereas for the Geography dimension, because of its single table data source, it would fire only one SELECT DISTINCT statement.

As I am using RDBMS database views that uses multiple table joins, this option for me, also went to pits. The only feasible solution was to use the Snapshot Isolation.

Snapshot Isolation

The Snapshot Isolation works on the Row Versioning framework.

Row versioning based isolation levels reduce the number of locks acquired by transaction by eliminating the use of shared locks on read operations.

For the Analysis Services database to work on Snapshot Isolation mode the source RDBMS database needs to have the READ_COMMITTED_SNAPSHOT option and ALLOW_SNAPSHOT_ISOLATION to be ON.

  • When the READ_COMMITTED_SNAPSHOT is ON, READ_COMMITTED transactions provide statement level read consistency using row versioning.
  • When ALLOW_SNAPSHOT_ISOLATION is ON, SNAPSHOT transactions provide transaction-level read consistent using row versioning.

The following is the SQL command that can enable the mentioned options in your RDBMS database.

        USE <<DatabaseName>>
        GO
        ALTER DATABASE <<DatabaseName>>
        SET READ_COMMITTED_SNAPSHOT ON
        GO
        ALTER DATABASE <<DatabaseName>>
        SET ALLOW_SNAPSHOT_ISOLATION ON
        GO

Enabling these options causes all data modifications for the database to be versioned. A copy of the data before update (or add) is copied and stored in tempdb. The data for processing is thereafter used from the tempdb. After the dimension processing operation (under Snapshot Isolation) is finished the tempdb frees the space occupied by the versioned rows.

Please note, the tempdb must be enabled to have enough space to hold the row versioned data generated for the source RDBMS database.

Another important thing is to enable the Multiple Active Result Sets (MARS) connection property in the DataSource in the Analysis Services database.

Not enabling MARS will give you the following error

OLE DB error: OLE DB or ODBC error: Cannot create new connection because in manual or distributed transaction mode.

By enabling the Snapshot Isolation on my Analysis Services database the dimension processing has not had any errors due to constant updates at source. The tempdb size also has been well within acceptable limits.

Written by Gautham KAMATH

24/02/2012 at 16:34

Examining MDX Query performance using Block Computation

leave a comment »

In 2007 I read a very interesting blog by Mosha outlining how to Optimize a Count(Filter(…)) expression in MDX. Since then I have been particularly careful in writing MDX queries and also make sure to run it past the MDX Studio to check it. But quite frequently I come across queries on forums and blogs that take the machine and its resources for a ride and still question why aren’t my queries faster when I have enough resources (disk space, RAM, paging, processors etc.). Its quite simple at times to change the query but in most occasions it is simply not possible to change the way the query returns data. Lets first get a sense of how the Analysis Services processes MDX queries. There are two major parts of the Analysis Services engine.

Formula Engine: The Formula Engine (FE) processes and parses the incoming MDX query and figures out what data is required to satisfy them. This information is then forwarded to the Storage Engine.

Storage Engine: Apart from reading and writing of data the Storage Engine (SE) also retrieves and aggregates the data from the Analysis Services that the Formula Engine requests.

In a nut-shell, when an MDX query is executed, the query is passed to the formula engine where it is parsed then passed to the storage engine. The raw data comes back from the storage engine (with necessary aggregations) and the formula engine performs any calculations required on that data before publishing it.

The Formula Engine uses a mechanism called “SONAR” to create subcubes requests to the storage engine (In MDX Studio they appear as Sonar subcubes). The Storage Engine, with the help of subcube query events (in the profiler trace they appear as Query Subcube event class and in MDX Studio you can see them as SE Queries), uses the aggregation data cache or the aggregation data registry to retrieve the aggregated data. The data then comes back to the Formula Engine where the necessary calculations are applied before publishing it.

Now let me take up a simple query that returns the number of products where Product Line = Accessory and Product Color = Black that were sold by Date.

    WITH MEMBER [Measures].[Count Of Products] AS
        IIF({NonEmpty(
                    Exists([Product].[Product].[Product].Members,
                        ([Product].[Product Line].[Accessory], [Product].[Color].&[Black]))
                    , [Measures].[Internet Sales Amount]) AS [ProductSet]}.Count = 0
            , NULL, [ProductSet].Count)
    SELECT {[Measures].[Count Of Products]} ON 0,
    NON EMPTY [Date].[Date].[Date].Members ON 1
    FROM [Adventure Works]

The Performance Monitor counters on the MDX Studio show the following information.

On cold cache…..

        Time              : 1 sec 453 ms
        Calc covers       : 0
        Cells calculated  : 427
        Sonar subcubes    : 2393
        NON EMPTYs        : 1189
        Autoexists        : 1194
        EXISTINGs         : 0
        SE queries        : 1198
        Flat cache insert : 0
        Cache hits        : 2378
        Cache misses      : 27
        Cache inserts     : 11
        Cache lookups     : 2405
        Memory Usage KB   : 160

On warm cache…..

	Time              : 1 sec 78 ms
	Calc covers       : 0
	Cells calculated  : 390
	Sonar subcubes    : 2377
	NON EMPTYs        : 1189
	Autoexists        : 1188
	EXISTINGs         : 0
	SE queries        : 1188
	Flat cache insert : 0
	Cache hits        : 2379
	Cache misses      : 0
	Cache inserts     : 0
	Cache lookups     : 2379
	Memory Usage KB   : 4

As you can see there is no significant improvement in the query performance even in warm cache. The number of Sonar subcubes and Subcube Event queries (SE queries) fired does not change. The explanation is that having NonEmpty or Exists inside an MDX calculation is not a good idea. In our case the set inside the Exists does not change and hence context remains fixed. Almost all the Formula Engine requests (SONAR) are having cache hits and a significant number of Subcube Event queries (SE queries) are being fired. Now lets change the query and derive the calculated member using SUM function.

    WITH MEMBER [Measures].[Count Of Products] AS
        SUM([Product].[Product].[Product].Members,
            IIF(([Product].[Product Line].[Accessory],
                [Product].[Color].&[Black],
                [Measures].[Internet Sales Amount]) > 0, 1, NULL)
            )
    SELECT {[Measures].[Count Of Products]} ON 0,
    NON EMPTY [Date].[Date].[Date].Members ON 1
    FROM [Adventure Works]

The Performance Monitor counters show the following information.

On cold cache…..

	Time              : 875 ms
	Calc covers       : 0
	Cells calculated  : 427
	Sonar subcubes    : 14
	NON EMPTYs        : 1
	Autoexists        : 6
	EXISTINGs         : 0
	SE queries        : 10
	Flat cache insert : 0
	Cache hits        : 4
	Cache misses      : 29
	Cache inserts     : 12
	Cache lookups     : 33
	Memory Usage KB   : 848

As you can see there is significant reduction in number of formula engine requests (SONAR subcubes) and subcube event queries (SE queries) even though the cells calculated remains same. This is because now the query is executing under block-computation mode rather than the cell-by-cell computation.

Block-computation allows MDX queries to calculate a cell value once and use it numerous times. Please note that this does not work in all situations, but wherever possible it should be used.

Now if you u look at the same query in the warm cache scenario there is only one SONAR subcube request and SE queries and cache hits fills up the rest.

On warm cache…..

	Time              : 578 ms
	Calc covers       : 0
	Cells calculated  : 390
	Sonar subcubes    : 1
	NON EMPTYs        : 1
	Autoexists        : 0
	EXISTINGs         : 0
	SE queries        : 1
	Flat cache insert : 0
	Cache hits        : 7
	Cache misses      : 0
	Cache inserts     : 0
	Cache lookups     : 7
	Memory Usage KB   : 0

EXISTING and Block-Computation mode

EXISTING is an operator that can also be best avoided but certain situations may force you to use it. Block-computation with EXISTING is also possible that will give a performance boost to the query. Consider the following query

    WITH MEMBER [Measures].[Count Of Products] AS
        COUNT(EXISTING NONEMPTY([Product].[Product].[Product].Members
                                , [Measures].[Internet Sales Amount]))
    SELECT { [Measures].[Count Of Products], [Measures].[Internet Sales Amount]} ON 0,
    NONEMPTY(
        {[Product].[Product Line].&[S]}
        * {[Product].[Color].&[Black]}
        * {[Date].[Date].[Date].Members}
    ,[Measures].[Internet Sales Amount]) ON 1
    FROM [Adventure Works]

The Performance Monitor counters on the MDX Studio show the following information.

	Time              : 1 sec 328 ms
	Calc covers       : 0
	Cells calculated  : 817
	Sonar subcubes    : 798
	NON EMPTYs        : 391
	Autoexists        : 6
	EXISTINGs         : 390
	SE queries        : 403
	Flat cache insert : 390
	Cache hits        : 422
	Cache misses      : 746
	Cache inserts     : 370
	Cache lookups     : 1168
	Memory Usage KB   : 80

The warm cache for this query does not trouble the numbers much. Now lets change the query to use the block-computation mode.

    WITH MEMBER [Measures].[Count Of Products] AS
        SUM(EXISTING [Product].[Product].[Product].Members ,
            IIF([Measures].[Internet Sales Amount] > 0, 1, NULL))
    SELECT { [Measures].[Count Of Products], [Measures].[Internet Sales Amount]} ON 0,
    NONEMPTY(
        {[Product].[Product Line].&[S]}
        * {[Product].[Color].&[Black]}
        * {[Date].[Date].[Date].Members}
    ,[Measures].[Internet Sales Amount]) ON 1
    FROM [Adventure Works]

On cold cache the following are the perfmon numbers from MDX Studio.

	Time              : 421 ms
	Calc covers       : 0
	Cells calculated  : 817
	Sonar subcubes    : 19
	NON EMPTYs        : 1
	Autoexists        : 6
	EXISTINGs         : 2
	SE queries        : 13
	Flat cache insert : 0
	Cache hits        : 8
	Cache misses      : 32
	Cache inserts     : 12
	Cache lookups     : 40
	Memory Usage KB   : 80

Check how the number of EXISTING usage has changed from 390 to 2. Now on the warm cache, even though the EXISTING usage remains the same, most of the query data is drawn from the cache.

	Time              : 62 ms
	Calc covers       : 0
	Cells calculated  : 780
	Sonar subcubes    : 5
	NON EMPTYs        : 1
	Autoexists        : 0
	EXISTINGs         : 2
	SE queries        : 2
	Flat cache insert : 0
	Cache hits        : 10
	Cache misses      : 2
	Cache inserts     : 0
	Cache lookups     : 12
	Memory Usage KB   : 0

To summarize, it is important to understand how the Analysis Services engine analyses MDX queries and very important to find a way to use block-computation wherever possible.

Written by Gautham KAMATH

08/02/2012 at 10:07