Cubes introduction
In this chapter we're going to describe how to build facts and link them to previously defined dimensions. Facts allow us to define measures - representing values as temperature, sales amount - and how they are linked with the dimensions - e.g. city, date -. Measures are mainly scalar values defined by basic types : floating point, integer, string or boolean.
Before starting to build our facts we will introduce several basic concepts that allow to better understand how icCube handles measures.
Facts are represented as a table or list of rows. Each row is a list of column values. Columns match dimension keys and measures. They allow to link a measure, e.g. temperature, to a list of attributes or dimensions, e.g. city, date. This matches scenarios as :
The temperature the 23 March 2010 in New York was 45F, with a fact table as following:
date temperature
23-Mar-2010 45
24-Mar-2010 48
John sold an enterprise license the 1 Jan 2012, with a fact table as following:
employee_id date license_id amount
34 1-Jan-2012, 2 1
Retrieving information in MDX is done via a tuple. This tuple represents dimension coordinates; note that missing dimensions are filled with default values. These coordinates are simply members of the dimensions, e.g. employee or a date. So the value (i.e., sales amount) of the tuple ([John],[1-Jan-2012],[Enterprise]) would be 1.
How does icCube evaluate a tuple ?
We will handle the case where the tuple has no calculated member.
It's important to note that icCube works with rows and not with cells as other MDX servers. A cell in MDX represents the smallest coordinate. If you visualize an MDX cube as an hypercube it is the smallest subcube ([John],[1-Jan-2012],[Enterprise]) is a cell, ([January 2012],[Enterprise]) is a sub-cube or a set of cells). A more formal definition would be a cell is represented by a tuple in which all members, default included, have no children and therefore cannot be decomposed into smaller units.
So, in icCube a cell might contain a list of rows. It's possible to ensure a cell contains none or one row at most by aggregating rows (UI setting in facts) when creating the cubes. Pay attention, this will slow down the facts loading process and ideally should never be set.
In the standard scenario a tuple will be transformed in a query over a set of rows. If the aggregation is sum, we will sum over the selected rows values. So ([John],[Jan 2012]) will sum all the rows of January 2012 for John. If you're familiar with SQL this is similar to a query with a where clause and a group by.
How does this work for parent's members in hierarchical dimensions? In the normal scenario parent members rows are the union of their children rows (e.g., rows of [Jan 2012] are all days of January rows). This is the standard behavior and might be changed as explained in the coming pages.
As a resume and reminder:
- A tuple defines a set of rows (you can't add twice the same row).
- In a tuple a row can only be present or not (you can't have it twice).
- A parent member 'rows' has not to be the union of all it's children 'rows'.
Next chapter: Facts: Cubes, Measure Groups and Measures shows how to create your first cube.