Real-Time (aka. Incremental Load) Walkthrough
To demonstrate the incremental load feature we are going to create a cube that shows the world population per country and per year. We assume the reader is familiar with the features as described in the previous walkthrough and has a basic understanding of increment load as described here.
Creating the Schema
Once the schema has been created, we are activating the incremental load using the advanced property Incremental Load Active:
Incremental Tables
For the sake of simplicity we are using a file datasource but the principles are the same for a relational DB. We are adding three files : country.csv, year.csv and population.csv. For each table we have to specify the incremental load strategy. The list of countries is immutable in our example, so we are specifying static. For both year and population, we are specifying incremental load and indicate at the same time the column that acts as the incremental marker : INCR_ID. This is a unique value that is incremented with each new row.
Creating Dimensions / Cubes
Here there is nothing specific to do. We are building a time and geography dimension based on our previously defined data tables. And eventually we are adding a cube that contains a single measure : population.
Deploying / Analysis
Once deployed we can see the world population for both USA and France for the year 2000:
Updating Cube
To simulate new values are available, comment out the following lines in the file year.csv and population.csv and save them:
INCR_ID, YEAR 1, 2000 #2, 2010 <= comment out this line INCR_ID, YEAR, COUNTRY, POPULATION 1, 2000, USA , 282 2, 2000, France, 59 #3, 2010, USA , 309 <= comment out this line #4, 2010, France, 62 <= comment out this line
and refresh the schema from the monitoring application:
The previous request is now showing the population for the year 2010 as well. Indeed, the [Time] dimension has been updated with one new member [2010] and the facts are containing two new rows for [USA] and [France] for the year [2010]:
Trigger Tables
Built on the previous example, we are going to define a new schema "World Population (Trigger)" that is demonstrating usage of trigger tables. From the documentation [...] Trigger tables are very similar to incremental tables. The difference is that trigger tables are incrementally loaded first and if any changes is detected in these tables, the other (that is the tables with no trigger) incremental tables will be checked for new data. [...].
So let's add a new table from the file trigger.csv and define the incremental load (trigger) loading strategy, and then create dimensions and cubes as previously. Note that a schema can contain more than one trigger tables and that a trigger table can be used as a regular table for defining dimension and facts. In this example, the trigger table is simply used as a trigger:
Updating Cube
Let's update the data as previously by commenting out the following lines in the file year.csv and population.csv and refresh the cube.
As you can see below, nothing happened. This is because the table year/population have not been checked for update because the trigger table has not been updated.
So let's modify the trigger table as following; note there is no need to add a new row, you can replace the existing one; simply make sure to increment the value in the INCR_ID column:
INCR_ID 2
and refresh the cube. Now you can see the new values in year/population table being used:
Last Words
That's about it. This tutorial has demonstrated how to define schema with incremental load tables and to refresh the cubes instead of reloading them from scratch. You can now schedule some tasks in the scheduler to automate this process. Using (or not) trigger tables, you can define small refresh intervals to check for new data and have your cube always ready with up to date information.
Both the schema and the data of this example are available for download here.
Next chapter: Walkthrough Localization.