What is Olap ?

OLAP is not a technology, nor a language (unlike SQL or Java)

Author
David Alvarez Debrot
Date
2/12/2024
Share

OLAP (Online AnaLytical Processing) is a concept for conducting swift multidimensional analysis of data (MDA), enabling the examination of data from multiple dimensions. This allows for complex calculations, trend analyses, and sophisticated data modeling.

It's important to clarify that OLAP is not a technology, nor a language (unlike SQL or Java).  OLAP is a concept / method for organizing data in a certain structure to facilitate multidimensional analysis. The term “OLAP tool” refers to platforms, such as icCube and Microsoft Analysis Services (SSAS), that provide the necessary infrastructure for building OLAP data structures.

Why use Olap?

The core features of OLAP are : 

Multidimensional:  transforms complex data into structural and intuitive business concepts, making it easier to analyze data across various dimensions (e.g., time, geography, product). 

  • Analysis: data aggregation such as sum, minimum, maximum, average,vector and more. Supports advanced analytics such as statistical methods, regressions, algorithms, etc.
  • Fast Query Performance: Optimized for reading and analyzing multidimensional data.
  • Ad-hoc Analysis: Users can conduct spontaneous, on-the-fly queries to explore data without needing pre-defined reports or dashboards.

Schema or Model

Before doing OLAP analysis on your data, you first create a model or schema from your data sources or data warehouse. The most common data sources are SQL databases, though they can be data coming from any source (e.g., flat files, IoT, Google Analytics). The goal is to transform the data from technical nomenclature into domain-specific terms that a business user can easily use. 

Dimensions are a categorization of your data. Typical dimensions are country, people, age, product, color. Dimensions can be hierarchical, like in a time dimension with years, quarters and days, or in a geography dimension with continent, region, country and city.

Measures are metrics, that can be aggregated as sum, minimum, maximum, average (e.g., sales amount, temperatures).

Once the schema is defined, users can now start querying the data. Drag and drop dimensions and measures to generate information, such as a table with continent, year, and sales amount. This is called dicing (choose specific values of multiple dimensions to create a subcube): 

What is OLAP : table dicing example

The ability to show aggregated information in each cell is known as roll up. Although thousands of sales may occur in Africa in 2018, only the summarized, or aggregated, information is displayed.

Filtering the above table, e.g. for a given product, is called slicing (choose a single value for one dimension to create a subcube)

Drill down, the opposite of roll up, allows for going into the details of a continent or a year, taking advantage of the hierarchical structure of the dimension. The end result will look like : 

What is OLAP : table drilldown example

Roll up and Drill down, along with dicing and slicing, form the basis of OLAP interactions. 

Business Intelligence (BI) tools may serve as clients to an OLAP server, providing the visual result (through charts and filters) of OLAP queries.

The view of dimensions in the axis and values in the cells is why you can think of OLAP as a cube. But this is a conceptual view, not a physical one.

How does OLAP work?

We can use a very basic example, as input data we have a table with a few column : 

What is OLAP : input table example

OLAP when calculating the sales for Asia in 2018 is going to filter the table for all rows with Asia and 2018. On this filtered subtable aggregate the measure for the expected value, sum for sales (500+500 = 1000). For those familiar with SQL, this is similar to a group by operation.

This is a very basic example as OLAP solutions support advanced aggregation and calculations, including statistical, regression, comparison, and ranking analysis. More advanced solutions allow the dimensions themselves to be mathematical transformations.

OLAP query languages

Unlike relational databases that use SQL, OLAP does not have a standardized query language.

However,  in 2001 Microsoft released MDX that stands for Multidimensional Expression. It's a language used by different tools that support OLAP. Excel is an example of a client using MDX, an Excel Pivot Table can be connected to an MDX-supporting server through the XMLA protocol.

Other tools not supporting MDX will have their own language that might be translated to SQL. If you are interested in MDX, you can read this Gentle Introduction to MDX.

Types of OLAP

MOLAP (Multidimensional OLAP): The system loads the underlying data into an internal structure. This structure can be a file on disk or it can be an in-memory structure.. This provides fast answers with the loading time equal to that of the memory/file system. The advantage of having this seperate structure is that queries access data in this internal system, and not directly the data sources. This means that analytical queries, which can be quite heavy, do not slow down production servers.

ROLAP (Relational OLAP):  The system will perform the analysis directly into the underlying relational data source, an SQL compatible database. This provides a slower answer and does not have the analytical power and flexibility of MOLAP but there is no need to load the data and require local resources such as memory or disk space. Tools transform end user queries into SQL than is performed on the production database.

HOLAP (Hybrid OLAP): Combines MOLAP and ROLAP, presenting technical challenges due to the complexity of integrating both models.

icCube OLAP

icCube is based on in-memory columnar data stores and lifts several of the constraints that made OLAP cubes IT intensive and rigid, while at the same time maintaining the compatibility with industry standards in the reporting and analytics world.

Here are a few of the constraints icCube lifted:

No more need to restructure data into a star or snowflake schema, which is complicated to perform at source level.

  • Moving out of trivial key-like associations (e.g., many-to-many, ranged).
  • Categories allow introduction of new dimensions on the fly to simplify the modeling process.
  • Dimension members that are calculations (e.g. convert temperature to Celsius or calculate a moving average).
  • No pre-aggregation is required (all aggregations are computed on the fly).
  • MDX+ language (functional support, object-oriented extensions, and many new helper functions).
  • Client interface to manage schemas, deployment and loading.

When not to use Olap?

OLAP is designed as a read-only system, distinguishing it from transactional systems such as relational databases (RDBMS) or Online Transaction Processing (OLTP) platforms. Unlike tools for data scientists, OLAP is not optimized for performing extensive calculations on vast datasets.