Once you have created your multi-dimensional data model (schema) in your development environment, it’s time to move on to the testing phase. Testing not only means testing the model from a functional point of view but also from an architectural point of view and deciding how the system is going to be running in production.
You may have hundreds of data models or a single huge data model for which you have to solve the deployment strategy. Deployment depends on architecture, and architecture depends on schema sizing, schema loading time and MDX query time of your schemas. These 3 elements then become obvious determinant factors that need to be examined, and if possible, optimized.
This first optimization chapter will focus on sizing, i.e. how to optimize your icCube schema RAM usage.
- Schema Statistics Analysis
- Hierarchy/Level, Attribute or Member Properties?
- Key types
- Order by fact indexing
- Facts in files
1. Schema Statistics Analysis
The best way to start your schema’s RAM analysis is to take a look at the schema statistics file.
Once the schema is loaded you can find this file under Admin > Schemas > Loaded, click right on your schema and select ‘View Schema Statistics’.
The file shows the estimated total RAM usage as well as a breakdown by facts index, dimensions, etc.
This file will therefore give you the indicators of most used elements of the schema which will be guiding you on the optimization.
2. Hierarchy/Level, Attribute or Member Properties?
Depending on what is going to be the usage of a dimension, its structure can be optimized in order to save your dimensions total RAM.
Let’s take a Customer dimension example, containing its id, name, age and email. The three most logical ways to create this dimension would be:
A – Dimension with 3 hierarchies
(D) Customer (H) Name (L) Name Key column: id Name Column: name (H) Age (L) Age (H) Email (L) Email
B – Single level dimension with attributes
(D) Customer (H) Name (L) Name Key column: id Name Column: name Attributes: Age Email
C – Single level dimension with member properties
(D) Customer (H) Name (L) Name Key column: id Name Column: name Member properties: Age Email
Now, what’s the difference?
Consequence on the dimension’s total RAM
Between A and B, using attributes (B) slightly enhances RAM usage. However the big improvement will be on the processing time on some queries (NOT on the schema’s RAM). Say you’d like to list your sales by customer, showing all customer information:
| Name | Age | Email | Amount | | ... | ... | ... | ... |
The underlying query will contain a crossjoin of Name * Age * Email. With A, the query ‘cost’ will be based on: number of existing names * number of existing ages * number of existing emails.
If using B, then the query ‘cost’ will be reduced to only be based on the number of existing names.
Read more about attributes here.
Member properties (C) could also be an option, similar to attributes, and here we could have a significant memory usage improvement. However, there are two main elements to note regarding member properties:
- In order for properties to be displayed on a table you need to define a new member that is fetching the property, for example:
CREATE MEMBER [Measures].[Email] AS [Customer].[Name].[Name].getProperty('Email')
- You can filter member properties only if using the Filter Panel, but NOT if using other filters. You can use attributes (B) or hierarchies (A) for any type of filter.
Read more about member properties here.
Consequence on the Facts Index RAM
Options A and B will create indexing for name, age and email, whereas option C will only create indexing for name. Depending on the size of the fact tables the difference can be important.
As a general rule, If memory is an issue, do not create dimensions, hierarchies, levels or properties that are not going to be used.
3. Key types
Member keys also play an important role on memory usage. The id field should whenever possible be numeric, ideally an int (32-bit), a short (16-bit) would be even better, or if necessary, a long (64-bit).
Other data types can also be small such as char or boolean, but are unusual for ids. The most important takeaway is to avoid strings as key ids.
4. Fact indexing
On paragraph 2, we saw how creating dimensions, hierarchies and levels can affect the Facts Index RAM.
In more general terms, to reduce the Facts Index RAM (see your schema statistics file), for a given facts table, order it by the dimension taking the most space in its facts index. It’s not the same to have all data nearby rather than scattered in a big space.
For example, if your large dimension is Customer, prefer ordering the facts table on the customer key:
| Date | customer_id | Age | … | 28.01.23 | 1 | 25 | … | 08.11.22 | 2 | 48 | … | 15.02.23 | 3 | 27 | … | 04.05.21 | 4 | 36 | …
Note this will as well have an impact on performance due to data fragmentation changes.
For performance, you will order on the most used dimension (in queries) that depends on your model and this might not be the same one as for sizing. Make sure to test different solutions to find the one that works best for you.
5. Facts in Files
icCube being an in-memory server, is saving data in RAM by default.
If your memory is limited or if you have large amounts of data, an aggressive memory optimization option is to save Facts in Files (under Storage Policy of your schema properties). This allows to store the data in files on the server’s file system, instead of keeping it in memory.
Saving facts in files will reduce Facts Columns RAM on your schema statistics file.
Note that in this case, accessing the disk and reading the data from files may be slower, but requires less RAM.
Let’s recap. To optimize your schema’s memory usage:
- Create dimensions using attributes or properties whenever possible, and do not create useless dimensions
- Optimize key types
- Order a facts table by the dimension taking the most space in its facts index (careful, might impact performance)
- Save facts in files