To get insights from a large amount of data, the icCube MDX/OlAP server partitioning edition is dedicated to support large cubes; i.e., facts with more than 1 billion rows. For that purpose, this edition mainly improves on both the speed to load such cubes and to process MDX requests against large number of facts.
When loading large cubes, the bottleneck is more likely related to the underlying DB server speed. One way to improve that loading time is to take advantage of DB table partitioning. icCube is going to generate one SQL loading request per actual partition and process them in parallel (the actual number of parallel requests is a configuration property). Note that while other icCube editions are able to load several tables in parallel (i.e., asynchronous and parallel table processing feature) only this edition is able to process a given table using several parallel SQL load requests.
Even if the underlying DB table is not partitioned, it might worth investigating if parallel requests are improving the loading time of the cube. Indeed, it might be possible that the DB server processes efficiently those SQL requests.
Table partitioning is currently supported for DB table only; please contact icCube support if you'd like other type of datasource supporting this feature.
Activating table partitioning is achieved by defining a "Partitioning Column" as show in the following picture. How this works ? icCube is going to use a " SELECT DISTINCT partitioning-column WHERE data-table " SQL statement to determine the actual partition keys and generate one SQL request per partition key value. Note that there is a configuration property (maxTablePartitionCount) defining the maximum amount of rows returned by this SELECT DISTINCT request to prevent creating too many partitions. Its default value is 1024; creating too many partitions makes no real sense and might be inefficient.
Typically facts tables are several orders of magnitude larger than dimension tables. Therefore, for now, partitioning in the context of icCube is about facts partitioning.
This edition uses an improved columnar data storage that is improving performance for 'partitioned' MDX queries as well as removing the ~ 1 billion rows constraint of the other enterprise editions.
Without revealing too many secrets about the guts of icCube's in-memory MDX/OLAP engine, we can say that fast MDX processing requires efficient data indexing and memory access. Splitting facts into several separated and smaller physical memory areas is opening the door to more efficient implementations. E.g., when it is possible to detect that several partitions are not used within a MDX request (e.g., when filtering by a given year - and [Year] being a partition), then its processing will require less costly memory access and obviously less CPU.
Activating the facts partitioning is done via an advanced property at schema level as shown in the following picture:
Then for each facts definition, you can specify if partitioning applies and how the partitioning is defined : data-table or MDX-level based.
Facts Partitioning Mode : Data-Table Based
In this mode, each data-table partition maps to a facts partition.
Defining a data-table based partitioning is show in the following picture:
DB table partitioning is not required to benefit from the facts partitioning feature. Indeed, you can define facts partitions using any MDX level. For example, as shown in the following picture, a date dimension is a good candidate for partitioning your facts as it is more likely that you're going to filter your MDX requests per year more often.
Defining a MDX-level based partitioning is shown in the following picture:
Advanced : note that even if you've defined the table-based partitioning mode, you can still propose a level. It is going to be used as a performance hint during tuple evaluations. This level must be consistent with the table-based partitioning; e.g., having a geography table partitioning and using a date level makes no sense.
Memory Mapped Files
This edition unlock the memory mapped file feature. Indeed, very large heap in Java is more likely creating large GC pauses ( > 1 minute ) that might freeze MDX request processing. One way to avoid that is to take advantage of a well known OS feature : memory mapped files. Using this, facts are saved into files instead of in-memory. The OS is going to cache them in RAM (therefore the more RAM the better) and you should expect the same level of performance as available with the in-memory storage engine. We're strongly encourage you to use this feature with Linux rather than Windows as I/O are faster, steadier and do not suffer of files being locked.
Hardware / OS Requirements
Last but not least, the partitioning edition is still available on commodity machines able to run a JAVA virtual machine making it ideal to deploy on private/public cloud platform. OLAP with pleasure is still being our motto and a reality.