Gautham KAMATH

BI Developer specializing in SSAS and MDX

Processing using Snapshot Isolation in Analysis Services (SSAS)

with 7 comments

Just a few months ago I was hit with an issue of dimension processing failure. The error read

Errors in the OLAP storage engine: The attribute key cannot be found when processing: Table: ‘<<TableName>>’, Column: ‘<<ColumnName>>’, Value: ‘XXXXXXX’. The attribute is ‘<<AttributeName>>’.

On investigation I found that constant updates to the dimension were the main cause. One of my dimensions had as many as 15 attributes and the processing failed when any update occurred during dimension processing. Generally speaking, the first attribute’s SELECT DISTINCT statement would conflict with the SELECT DISTINCTs of other attributes because of the constant update.

One way to get around this is was to change the ProcessingGroup property of the dimension to be ByTable (instead of the default value of ByAttribute). As the ByTable option takes the entire table data for that dimension into memory, it would have to process the same data without having to worry about the constant updates. The new updates would then get incorporated in the next processing cycle.

This option did not work as the ByTable option works only when you have only a solitary table to draw the data from. Multiple tables in star schema or snow-flake schema or a RDBMS view that encapsulates multiple tables will issue multiple queries against the data source.

P/S Try processing the Product and Geography dimensions using ByAttribute and ByTable options. The profile trace for ByTable ProcessingGroup mode for Product dimension will show you multiple SELECT DISTINCT statements being fired whereas for the Geography dimension, because of its single table data source, it would fire only one SELECT DISTINCT statement.

As I am using RDBMS database views that uses multiple table joins, this option for me, also went to pits. The only feasible solution was to use the Snapshot Isolation.

Snapshot Isolation

The Snapshot Isolation works on the Row Versioning framework.

Row versioning based isolation levels reduce the number of locks acquired by transaction by eliminating the use of shared locks on read operations.

For the Analysis Services database to work on Snapshot Isolation mode the source RDBMS database needs to have the READ_COMMITTED_SNAPSHOT option and ALLOW_SNAPSHOT_ISOLATION to be ON.

  • When the READ_COMMITTED_SNAPSHOT is ON, READ_COMMITTED transactions provide statement level read consistency using row versioning.
  • When ALLOW_SNAPSHOT_ISOLATION is ON, SNAPSHOT transactions provide transaction-level read consistent using row versioning.

The following is the SQL command that can enable the mentioned options in your RDBMS database.

        USE <<DatabaseName>>
        GO
        ALTER DATABASE <<DatabaseName>>
        SET READ_COMMITTED_SNAPSHOT ON
        GO
        ALTER DATABASE <<DatabaseName>>
        SET ALLOW_SNAPSHOT_ISOLATION ON
        GO

Enabling these options causes all data modifications for the database to be versioned. A copy of the data before update (or add) is copied and stored in tempdb. The data for processing is thereafter used from the tempdb. After the dimension processing operation (under Snapshot Isolation) is finished the tempdb frees the space occupied by the versioned rows.

Please note, the tempdb must be enabled to have enough space to hold the row versioned data generated for the source RDBMS database.

Another important thing is to enable the Multiple Active Result Sets (MARS) connection property in the DataSource in the Analysis Services database.

Not enabling MARS will give you the following error

OLE DB error: OLE DB or ODBC error: Cannot create new connection because in manual or distributed transaction mode.

By enabling the Snapshot Isolation on my Analysis Services database the dimension processing has not had any errors due to constant updates at source. The tempdb size also has been well within acceptable limits.

Advertisements

Written by Gautham KAMATH

24/02/2012 at 16:34

7 Responses

Subscribe to comments with RSS.

  1. Hi Gautham,

    Thank you, for this article. It was very helpful. We implemented a real-time cube (SSAS 2008 R2 Enterprise). The cubing job runs every 5 min. to do incremental updates against SQL Server database that’s being updated, while cube is processed. And this causes the ‘Attribute key cannot be found’ error. I applied the Snapshot isolation, using all the settings, you mentioned in your article and I don’t get this error anymore. But cubing time was increased almost 2 times.

    I am trying to understand, what causes the performance degradation. The server has Win.2003 Enterprise X64 with 24 Core CPU and 64GB of RAM. I monitored the tempdb file size growth during the cubing and didn’t see huge growth. Please, let me know if you experience performance hit with Snapshot isolation changes.

    Thank you.

    Vadim Krasnov (SSAS developer)

    Vadim Krasnov

    29/04/2012 at 18:57

    • Hi Vadim

      I did see some increase in processing time but it was well within the limit that I set. As a matter of speaking I was expecting it. This is because during processing there is an extra step that involving reading the data from source table and copying it in tempdb database. If there are indexes then the data has to be re-indexed on tempdb. All in all there will be an IO overhead.

      For you to get an increase of 100% is alarming. There can be couple of reasons that I can think off.
      1. Tempdb causing a general slowness.
      2. ProcessUpdate on dimension (with millions of records). ProcessUpdate is much slower compared to ProcessFull or ProcessData + ProcessIndex.

      If you have many dimensions with many members then there is an overhead (almost twice). In this case the tempdb is under pressure.

      The following perfmon counters can give you information on how the tempdb is doing
      • LogicalDisk/Avg. Disk Sec/Read: The average time, in seconds, of a read of data from disk.
      o If the value is > 50 ms then there is an IO issue
      o 20-50 ms is slow and if possible needs attention
      • LogicalDisk Object: Avg. Disk Sec/Write: The average time, in seconds, of a write of data to the disk.

      The following article can help you if temdb is crying foul
      http://bradmcgehee.com/wp-content/uploads/presentations/Optimizing_tempdb_Performance_chicago.pdf

      regards
      Gautham

      Gautham KAMATH

      30/04/2012 at 16:39

      • Hi Gautham,
        Thank you for your fast response. Yes, you’re absolutely right – we have multiple dimensions with millions of members.

        The Incremental update involves:
        -Process Update- all dimensions,
        -Process Full- partitions for which new data was loaded
        -Process Index- for the rest of partitions.

        The dimension data is not updated/deleted. We do only inserts in dim tables. Will look into replacing -Process Update- with -Process Data- .

        I will definitely try to monitor I/O during the cube processing. Thank you for this information.

        Vadim.

        Vadim Krasnov

        02/05/2012 at 16:42

  2. I forgot to mention that we use page compression on Fact and the largest dimension tables. Can it also be a factor that affecting tempdb data I/O?

    best regards

    Vadim

    Vadim Krasnov

    02/05/2012 at 21:10

    • The following is an extract from http://msdn.microsoft.com/en-us/library/dd894051(v=sql.100).aspx

      “Apart from the space savings, data compression provides another benefit: Because compressed data is stored in fewer pages, queries need to read fewer pages from the disk, thereby improving the performance of I/O intensive workloads. However, extra CPU resources are required on the database server to compress and decompress the data, while data is exchanged with the application.”

      Yes the data compression can be a factor because your CPU is doing some extra work.

      Also you mentioned that you do only inserts on your dimension tables. Instead of ProcessUpdate, try using ProcessAdd. The beauty of ProcessAdd is that it only looks for processing new records without having to scan the entire dimension and will create aggregations for only the new members.
      The ProcessUpdate, on the other hand, does extra work by scanning all the members in the dimensions and checking if they have been updated or not (which is not required here since you only insert data). Also all your flexible aggregations during ProcessUpdate are dropped and recreated. This is an unnecessary overhead.

      Hope this helps.

      regards
      Gautham

      Gautham KAMATH

      03/05/2012 at 11:13

  3. Thanks for your post. I am trying to use Snapshot Isolation (SI) for cube processing, and it seems as though you have to enforce SI at the server level for this to work, is that true? In my source DW, I have set ALLOW_SNAPSHOT_ISOLATION ON, but have not set READ_COMMITTED_SNAPSHOT ON as I don’t want to force all queries to generate snapshots. My impression was that setting the Read Committed property in the SSAS Connection Manager to Snapshot would cause the Cube processing to envoke a snapshot, however I am having difficulty ensuring that. Can you provide some ideas for how to test whether the snapshot is in effect during cube processing?

    I have created an SSIS package with two SSAS steps. The first step performs a Process Update on my dimensions, and then second step performs a process full on the current partition of my cube. If I process the dimensions, and then insert a new record into the fact table while the dimensions are processing, then the Fact processing fails with the classic ‘Attribute Not Found’ error. How can I make this envoke a snapshot?

    Thanks in advance for any assistance.

    – Jim

    Jim

    12/03/2013 at 15:15

    • Followup… After checking the sessions running during the package execution, I am envoking a snapshot, however each Analysis Services task envokes IT’S OWN snapshot, because they envoke their own session. Making all the cube processing run in a single Analysis Services task (i.e. Dimension AND partition processing) will work, but would mean that I can’t process in parallel (I believe). So I’m still wondering if there is a way to configure multiple SSAS tasks in SSIS to operate within the same session. (FYI – I do have transactions configured, but that doesn’t seem to affect how MSAS calls the source database)

      Jim

      12/03/2013 at 17:31


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: