During one of my icCube demo meetings last year, one of the attendants asked if it is possible to migrate from Microsoft Analysis Services to icCube and what could be potential challenges. Well, what do you answer to such a question? … Honestly?

Having quite a bit of in-depth knowledge of both icCube and Analysis Services, I knew it could be done, but I never did it before myself. So, last Autumn, having a couple of days available for research I started the mini-project of migrating the Microsoft’s flag-ship demo model “AdventureWorks” to icCube.

So, can it be done? Can you migrate an Analysis Services model to icCube?

Yes, definitely. It was surprisingly simple. And as a bonus you will get all the additional benefits provided by the icCube platform too. With the most important benefit (for me and my clients):

  • With icCube you get an actively maintained and developed business intelligent platform. Something that Analysis Services really has missed over the last couple of years. And that is especially painful if you run into a critical/performance bug.
  • With icCube you get a complete all-in-one analytical software package, containing ETL, a builder tool, a server management tool, an admin tool and a dashboard editor & publisher. There is no need for additional (expensive) tools.

The result: AdventureWorks in icCube, accessed as Excel 2016 pivot
(check the connection).

Below a summary comparison points to note on the migration. If you want to learn more about the migration process itself, you can read my other blog post on this topic here.

Migrating AdventureWorks to icCube

There were no challenges for doing the migration, it’s just different ways of doing things. The following table gives a summary:

Table 1. AdventureWorks in Analysis Services vs icCube

Analysis ServicesicCube
set-up the
data source
to SQL Server
database
to SQL Server
database
same
tables & viewstables with
SQL expressions
use of viewssimilar
dimensionsallows NULL valuesdoes not allow
NULL values,
assign default value
similar
snowflakemulti-level
on a view
similar
discretization
bucket count
indexing by
range
matter of
taste
dimension usagereferencedregular with viewsimilar
measures same
calculated measuresscoped functionsusing MDX++
functions
similar
other calculations3 minor changes
in MDX
same

As you can derive from this table, you can conclude that there were no real issues in the migration. But, be your own judge. You can check out the result for yourself in icCube, using the migrated “AdventureWorks” model.

This is what you need to experience “AdventureWorks” in icCube:

  1. a license that contains the backup functionality (under big data features, click self-hosted first). You can ask support for help with the license.
  2. the backup schema for AdventureWorks, you can request this here.

And while you are at it, please do not forget to evaluate the dashboard editor. The “AdventureWorks” model provides a wealth of information to display in icCube’s interactive reports.

Here’s already a sample image of one, just to give you a taste of it.

Cheers,

Arthur

about the author

Arthur van den Berg, www.inside-vision.com

It’s a kind of magic … to transform loads of data into insight giving dashboard products that show your audience what step to take next, based on your data.

I work closely together with icCube being a Dutch reseller. I love their product that allows me and my clients to launch meaningful dashboard products and to embed analytical insights into their software.

Many-to-many “m2m” relations are defined by a relation where one object is related to multiple other ones and vice versa, e.g. a bank account can be shared by multiple people and one person can be the owner of multiple accounts. This article will show how to create these particular aggregations.

On the example we will use for this article, we have 4 bank customers: Phil, Kate, Josh and Samantha; where Kate & Josh as well as Kate & Samantha have joint accounts, respectively named “Account – KJ” and “Account – KS”.

The difficulty of managing these relations when data modelling lies on the fact that you want to avoid counting several times the same amount.

On the left, you have the balances per account and on the right, the assets by each customer. If you sum the amounts of the customer’s assets you’ll end up with a total of 800, that is more than the sum of all bank accounts (600). Why? Kate and Josh’s joint account has a total balance of 100, Kate’s balance in this account is 100 so is Josh’s balance, but when aggregating both the total is not 200.

The figure below represents the same data as above, showing in one table only, both balances per account and per customer.

This ‘strange’ visual effect in the pivot table is due to the m2m relations as parent members (Accounts’ total balance) are possibly not the aggregation of their children (Customers) anymore. Note that in fact, Kate & Josh’s joint account has a total balance of 100.

Be most careful when manipulating m2m relations as not all analytical and reporting tools offer this functionality out-of-the-box and you can end up with miscalculated results.

MANY-TO-MANY RELATIONS IN ICCUBE

There are two ways setting up (out-of-the-box!) m2m relations in icCube. The first by creating two separate dimensions and using a bridge table, and the second by creating one dimension with two hierarchies.

The difference between the two methods is that the first will show all possible combinations even if they don’t exist by construction and the second method removes non-existent combinations.

Let’s explore how to construct the data model using both these methods: Many-to-Many relations using a Bridge table and Many-to-Many dimensions.

For both methods, the data is set as follows:

  • Customers’ table:
  • AccountName   , Customers
    Account – P       , Phil
    Account – K       , Kate
    Account – J       , Josh
    Account – S       , Samantha
    Account – KJ     , Kate
    Account – KJ     , Josh
    Account – KS     , Kate
    Account – KS     , Samantha
  • Balance table:
  • AccountName   , Amount
    Account – P       , 100
    Account – K       , 100
    Account – J       , 100
    Account – S       , 100
    Account – KJ     , 100
    Account – KJ     , 100
    Account – KS     , 100
    Account – KS     , 100

FIRST METHOD: MANY-TO-MANY USING A BRIDGE TABLE

The bridge table method is the usual m2m method. Construct both your dimensions separately: a multi-level dimension for Customers and another for AccountName.

Now on the Facts table, for linking Customers to its corresponding data, create a Bridge table as follows:

By doing so, you are creating a link between your two tables through the AccountName. Therefore, the customers are now associated to their corresponding account(s).

The bridge schema is available for download here. Unzip and then import into the icCube builder.

SECOND METHOD: MANY-TO-MANY DIMENSIONS

You may create m2m relations by defining several hierarchies in a same dimension. This will result in showing only combinations {Customer, Account} that exist on your table. This behavior is called autoexist, it performs a NON EMPTY to remove empty tuples, and is in fact faster than a NON EMPTY.

The data used is exactly the same as the method above. Create the dimension BankAccount from your Customers’ table containing two hierarchies: AccountName and Customers.

No need to create a bridge on your facts table. Find below a screenshot of the BankAccount dimension browsing, as well as the facts table for this second method.

The m2m dimension schema is available for download here. Unzip and then import into the icCube builder.

Find more information on Many-to-Many relations in our Documentation: Many-to-Many using a Bridge Table and Many-to-Many dimensions.

icCube now integrates native Java and R. This means that you can build Java and R scripts into your MDX statements. The advantage of having those two languages natively integrated into icCube is that you are able to expand your analytical capabilities (predictive analytics, time series, data mining algorithms, etc.) with two of the richest mathematical programming languages available today, and directly apply those methods on your production data, analysis and dashboards.

Find the detailed documentation here: Java integration and R integration.

In this article we’ll show the same example, once in R then in Java with icCube. The schema used is the “Sales” schema that is available when you download icCube’s trial version.

In order to build a Java or R function, you need to define functions as “natives” (not defined using MDX.) Native functions are declared by NATIVE keyword. Here is a simple R example of a weekly day count between two dates. The function will count all Mondays, Tuesdays, Wednesdays, Thursdays and Fridays, between the two dates evaluated.

 WITH
    NATIVE FUNCTION RweekdaysBetween(date1,date2) AS
        /* R
sum(!weekdays(seq(date1, date2, "days")) %in% c("Saturday", "Sunday"))
        */
     MEMBER days as RweekDaysBetween(
                        [Time].[Calendar].[Day].[1 Jan 2005].key,
                        [Time].[Calendar].[Day].[1 Jan 2006].key
                    )
SELECT
days on 0
FROM [Sales]

Let’s show this same example using Java.

In order to put the algorithms in production, just add your corresponding Java or R code into your schema’s Script. Remember you can create a function in the MDX IDE on an existing schema without the need to reload it (DROP and CREATE the functions in the MDX IDE.)

 create native function weekDaysBetween(d1, d2 /* exclusive */) as
/* JAVA
	import org.joda.time.*;
    final int WEEK_START = DateTimeConstants.MONDAY;
	final LocalDate start = toLocalDate( toWeekDay( toLocalDate( d1)));
    final LocalDate end   = toLocalDate( toWeekDay( toLocalDate( d2)));
    final int daysBetween = Days.daysBetween(start, end).getDays();
    final int weekendsBetween = Weeks.weeksBetween(start.withDayOfWeek(WEEK_START), end.withDayOfWeek(WEEK_START)).getWeeks();
    return daysBetween - (weekendsBetween * 2 /* DAYS_PER_WEEKEND */);
*/
create native function toWeekDay(d) as
/* JAVA
	import org.joda.time.*;
    final int WEEK_END = DateTimeConstants.FRIDAY;
    final LocalDate day = toLocalDate( d );
	if (day.getDayOfWeek() > WEEK_END)
    {
      return day.plusDays(DateTimeConstants.DAYS_PER_WEEK - day.getDayOfWeek() + 1);
    }
    return day;
*/
Then those functions are available for querying in icCube’s IDE.
WITH
MEMBER days as weekDaysBetween(
                        [Time].[Calendar].[Day].[1 Jan 2005].key,
                        [Time].[Calendar].[Day].[1 Jan 2006].key
                    )
SELECT
days on 0
FROM [Sales]

(btw, results should both be 260 🙂 )

What language should I use, Java or R ?

From a performance point of view, Java is faster, and potentially quite a lot, as you are using the same code icCube is running. Once the Java code is compiled, it’s as fast as icCube’s code. But you need to have a Java JDK in one of your servers (no need in production as the bytecode is saved in the schema definition). Remember you can add your own libraries, jar files, in icCube and they will be visible for your Java code.

From a functionality point of view, R has an amazing set of existing libraries that do not exist in Java. Please note that icCube is using Renjin that has not a 100% of all R libraries (check here).

From my side, this is just the first step to allow users making amazing maths in icCube, so stay tuned for what will be coming!

I knew it from the start.

The moment I started with presenting general ledger data in icCube dashboards, the issue of intercompany elimination would pop-up, eventually. T I had an easy elimination solution at one of my clients, that worked for three years. During fall 2017, the business controllers of that specific client requested a full elimination solution that would eliminate intercompany deals on all levels of the organization; a functionality you see in specialized financial consolidation applications like Oracle HFM or SAP BFC.

In this post, I present you the solution for intercompany elimination when processing financial G/L data in icCube.

What is “intercompany elimination” and why would you care?

Intercompany elimination can be defined as the process for removal of transactions between companies included in a group in the preparation of consolidated accounts. The process of intercompany elimination is helpful in managing eliminations of operations among companies within a single group (see link).

Organizations do business with the outside world and as a result transactions flow into the general ledger. Suppose we have a company, ‘EasyDrone’, that sells and repairs drones and within this company an engineer doing the repairs, Monica.

Using our definition of Intercompany Elimination and for the sake of this example, EasyDrone would be the group with two companies (in our example the Sales – and Engineering department). Our interest will be to calculate the actual value of this two departments as separate entities.

Let’s start creating a few financial transactions :

Monica repairs a drone for one of EasyDrone clients. After the work is done, the hours are billed and the results are recorded in the general ledger as “revenue from repair” for the cost center “engineering agency”. The revenue rolls-up into the organizational structure as indicated in the figure below. This transaction with a real client is called a regular “external transaction” (action 1).

So far so good. Next, a second-hand drone and a training drone need to be repaired by Monica. In this case, the clients are not a “real” client, but two internal departments: training and sales. When the job is done, Monica records her hours worked and an internal invoice is sent to the training department (action 2) and the training department (action 3). The G/L will record these as “revenue from consulting” and “repair cost” transactions.

The transactions between organization structures like these are called “intercompany transactions”. The departments roll-up into the organization structure as indicated in the following figure:

Figure 1: the result of three repair actions on the accounts “revenue from repair” and “repair costs”, rolled-up into the organization structure.

Adding up the revenue and costs for these actions gives the following result:

Figure 2: total revenue and costs for the three actions, including the gross profit, without eliminating intercompany transactions.

Gross profit looks nice but what happens if we look at the ‘revenue from repair’ and ‘repair costs’ ?. Are you able to detect the issue with the intercompany transactions?

Let me give you a clue. What if the Engineering department and the Sales department were separate legal entities and you wanted to buy only the latter. What would be the “real” revenue for the “Engineering department”?

Think about it, for a moment.)

Would it be really € 190? I do not think so. As a buyer I would argue that the internal revenue to “training” is not a real revenue, therefore I would value the revenue at € 155 only, eliminating the internal transaction with training as it is not a real revenue.

This process of eliminating the internal transactions is called “intercompany elimination”.

Applied to the example above, the following additional eliminating transactions are required to give the eliminated view of the complete organization:

example intercompany elimination
Figure 3: Eliminated view on the organization structure, eliminating the internal transactions at the lowest level possible.

In the figure above, you can see the “real” revenue from repair for each level in the organization. The “engineering agency” can report € 190 revenue in the internal management reports, but as you zoom out to the higher levels the internal transactions are deducted, eventually resulting in just € 100 revenue for the Total Organization, which is the revenue of the external client.

Without having the “eliminated” view on the organization, the revenues and costs are reported too high due to intercompany transactions.

This example showed intercompany bookings for internal services (revenue & costs), but intercompany effects, also apply to internal debts and loans and internal stock ownership.

The icCube solution for intercompany elimination

In the following steps I will show you how you can build intercompany elimination into your financial icCube model. I will use the data in the example above to keep the explanation short and simple, but you can easily modify it to your needs.

To make it practical, a ready-to-use schema can be downloaded here. Unzip and then import into the icCube builder.

step 1 – model exploration

The model contains three in-memory tables:

  • 1. data, the data for the revenue in the previous chapter
  • 2. costcenter structure parent-child, the structure for the organization
  • 2. elimination structure, the flat hierachy structure for the elimination dimension

The solution for intercompany can be found in the icCube “data views”: createPath and elimination. Both views are ETL procedures based on javascript.

dataview: createPath

This procedure generates a path string in which the parent-child hierarchy is stored. The hierarchy in this example is very simple, but in reality, organization structures tend to be very “ragged”, meaning that some parts might just contain a few levels of depth while other parts are very detailed. Typically, the organization structure ends with the cost centers. That is why I call the organization structure “Costcenter”.

Here is the Init. Code:

ctxt.map = [[], []];

Row Processing Code:

// Get current node data
var node = get("costcenter");
var parent = get("parent");

// Find the path of the parent 
var idx = ctxt.map[0].indexOf(parent);
var parent_path = ctxt.map[1][idx] || "";

// Create path of current node
var path = parent_path + "|" + node;
set("PATH", path);

// Add row to map array
ctxt.map[0].push(node);
ctxt.map[1].push(path);

fire();

Completion Code:

ctxt.map = [];

This is the result:

costcenter parent PATH
Organization null |Organization
Sales dept. Organization |Organization|Sales dept.
Engineering dept. Organization |Organization|Engineering dept.
sales second-hand machines Sales dept. |Organization|Sales dept.|sales second-hand machines
sales new machines Sales dept. |Organization|Sales dept.|sales new machines
engineering agency Engineering dept. |Organization|Engineering dept.|engineering agency
training Engineering dept. |Organization|Engineering dept.|training

dataview: elimination

This procedure processes all rows and if the row is an intercompany booking, it will “calculate” the elimination booking that will counterpart this row.

The logic is as follows:

  • 1. for each normal row, set the elimination type to 1 (= orginal). If the row does not have an intercompany value, set the intercompany value to “external”.
  • 2. for each row that has an intercompany value, find the lowest common parent for the “costcenter” – “intercompany” combination using the commonPath matrix.
    When found, create an elimination booking with coordinates on the common parent, elimination type 2 (= elimination booking) and -1 x amount.

Here is the Init. Code:

	//
	// get first common parent of all combinations
	//
	var df = table("createPath");
	var n = df.count();
	
	// output matrix, n rows and 2 columns: 1st element, 2nd the path
	var nrow = 2;
	var ncol = n;
	
	ctxt.m = Array(nrow);
	
	for(var i=0; i<nrow; i++) {
	ctxt.m[i] = new Array(ncol);
	}
	
	for (var i=1; i<n; i++)
	{
	var idx = i;
	var element = df.row(i).get("costcenter");
	var path = df.row(i).get("PATH");
	
	ctxt.m[0][idx] = element;
	ctxt.m[1][idx] = path;
	
	}
	
	ctxt.row_nb = 0;
	

Row Processing Code:

	var costcenter = get("costcenter");
	var intercompany = get("intercompany");
	
	if(get("intercompany") == null)
	{  intercompany = "external";
	} 
	
	// 1st, show the regular input as output
	set("costcenter", costcenter);
	set("intercompany", intercompany);
	set("type", 1);
	set("amount", get("amount"));
	fire(ctxt.row_nb++);
	
	
	// 2nd, check if the row is an intercompany booking and if so
	// prepare the row
	if(intercompany != "external")  // intercompany <> null <> ""
		{
		
		// look up the paths in the predefined matrix for both the costcenter as the intercompany
		var cc_path = "";
		var ic_path = "";
		for (var k=0; k<ctxt.m[0].length; k++)
		{
		if (ctxt.m[0][k] == costcenter)
		{
		cc_path = ctxt.m[1][k].split("|");
		if (ic_path.length > 0 )
			{
			break;
			
			}
			}
			if (ctxt.m[0][k] == intercompany)
			{
			ic_path = ctxt.m[1][k].split("|");
			if (cc_path.length > 0 )
			{
			break;
			
			}  
			}
			}
			
			// now, compare the paths
			var common_parent = "";
			for (var i = Math.min(cc_path.length, ic_path.length); i > 0; i--)
			{
			if (cc_path[i-1] == ic_path[i-1])
			{
			common_parent = cc_path[i-1];
			break;
			}
			
			}
			
			
			set("costcenter", common_parent);
			set("intercompany", common_parent);
			set("type", 2);
			set("amount", -1*get("amount"));
			fire(ctxt.row_nb++);
			
			}
			

This is the result:

costcenter intercompany amount type
engineering agency external 100.0 1
engineering agency sales second-hand machines 55.0 1
Organization Organization -55.0 2
engineering agency training 35.0 1
Engineering dept. Engineering dept. -35.0 2

As you can see, the procedure has generated two elimination bookings, exactly at the right level.

.

step 2 – model generation

Deploy the schema and navigate to the MDX editor. And copy the following MDX statement:

select {[Measures].[amount] }*descendants([Elimination].[Elimination].[Eliminated]) on 0
				, descendants([CostCenter].[CostCenter].[Total CostCenter]) on 1
			from finance

This gives as result:

Figure 4: intercompany elimination in icCube, result from the MDX statement

Exactly as we want it to be!

Feel free to explore this model into more detail. Although the Intercompany dimension is not strictly needed, I would advise you to keep it as is, since it provides very useful insights in the intercompany process for the financial controllers.

practical things to consider in real-life implementation

The code presented in the previous paragraph has been implemented at a client and is used on a cost center structure of about 1500 members and 1 mio fact rows. It has worked flawlessly and fast for the past few months and I assume it will continue to work like that for the years to come.

When implementing intercompany allocation, consider the following:

  • If intercompany elimination is introduced in an existing solution, make agreements with the controllers how to proceed with the historical data. Most likely, they do not want it to be changed, so intercompany elimination should only be applied to the data starting with January 1st this year.
  • Loading the elimination bookings on subtotal levels might be difficult for end-users to swallow. For example, have a look at the column “Eliminated” in Figure 4. If you only see the figures for this element you could get nervous that the figures “do not add up”. Also the € -90 does not help a lot as you can not see the eliminatin booking of € -55 seperately, but in stead you see the summarized result. A possible solution to this, is to add an additional input level for each subtotal to the Costcenter hierarchy. Obviously, this has also its drawbacks as it “pollutes” the structure.
  • Regardless of what you choose to load the elimination bookings, ensure to create sufficient explanatory dashboards. Intercompany elimination is a difficult process so help your controllers and other end-users so they can review, analyse and explain the results they see. The most efficient solution is the solution that leaves no unanswered questions for your audience.

Good luck implementing this in your dashboard solutions!

about the author

Arthur van den Berg, www.inside-vision.com

It’s a kind of magic … to transform loads of data into insight giving dashboard products that show your audience what step to take next, based on your data.

I work closely together with icCube being a Dutch reseller. I love their product that allows me and my clients to launch meaningful dashboard products and to embed analytical insights into their software.

We’re working hard on the next version that will include some important performance improvements such as smart cache and pre-compilation.

In the meantime, there are already a couple of “tricks” you can use to improve performance. Because they are specific to icCube, MDX+, they are not yet well known by all icCube’s users. So, we’re going to use a customer request as an excuse to present several ways for improving a classical MDX expression:

Sum( {a Set}, [some value] ).

This started with a customer telling us that some icCube requests were too slow, taking 20 seconds. This is certainly a lot of time, so we decided to dive into the query. To simplify, we are going to use parts of the original query that was using several calculated measures. It looks like this:

SET [MTD] as Mtd( ClosingPeriod([Calendar].[day]) )
SET [YTD] as Ytd( ParallelPeriod( [Calendar].[day], 1, ClosingPeriod([Calendar].[day]) ) )
member [Measures].[Sales MTD] as Sum( [MTD], [Measures].[sales] )
member [Measures].[Sales Last Year YTD] as Sum( [YTD], [Measures].[sales] )

We have a couple of measures that calculate sales for different periods, namely YTD and MTD. The hierarchy [Calendar] has been built so ClosingPeriod([Calendar].[day]) returns the last day with data. YTD and MTD will sum all days from start year and start month, respectively. For YTD we can have up to 365 days, so it’s time-consuming, especially if the query is done on several calculated measures, as was the case.

The first quick improvement is to use the CompactSet function, which transforms the input set into a smaller set. If all days of a month are in the set, they will be replaced by the month. Therefore, the size of a set decreases from hundreds of members to around 10–20 members.

This transformation is possible because the numerical expression used in the Sum function uses the sum-aggregation method. This would also be true for Min, Max, Count but not for Aggregations methods such as Distinct Count, Open Period, or End Period. It would also not be possible if the numerical expression had multiplication, as for example in a share of market.

The new MDX looks like this:

SET [MTD] as CompactSet( Mtd( ClosingPeriod([Calendar].[day]) ) )
SET [YTD] as CompactSet( Ytd( ParallelPeriod( [Calendar].[day], 1, ClosingPeriod([Calendar].[day]) ) ) )
member [Measures].[Sales PTD] as Sum( [MTD], [Measures].[sales] )
member [Measures].[Sales Last Year YTD] as Sum( [YTD], [Measures].[sales] )

This new query is already dropping the response time to 2.3 seconds—and this is a major improvement.

However, CompactSet will not work in all scenarios. So let’s try an alternate solution using our latest Eval() MDX+ function. This function is similar to the Aggregate() function but uses a subcube. The Eval() function is a bit smarter and will reuse subcubes defined with the same set.

SET [MTD] as Mtd( ClosingPeriod([Calendar].[day]) )
SET [YTD] as Ytd( ParallelPeriod( [Calendar].[day], 1, ClosingPeriod([Calendar].[day]) ) )
member [Measures].[Sales PTD] as Eval( [MTD], [Measures].[sales] )
member [Measures].[Sales Last Year YTD] as Eval( [YTD], [Measures].[sales] )

This query needs 3.2 seconds. It’s not as fast as the Compact version, but it is seven times faster than the initial version. If we combine it with CompactSet, the time drops to 1.7 seconds. This is our fastest version for now.

Finally, we are going to use Categories. Categories allow defining new members based on a set or a subcube, and is similar to the Eval function.

As [MTD] and [YTD] sets are defined at schema level, we can add them into the Advanced/Scripts panel:

CREATE STATIC SET [MTD Set] as Mtd( ClosingPeriod(([Calendar].[day]) )
CREATE STATIC SET [YTD Set] as Ytd( ParallelPeriod( [Calendar].[day], 1, ClosingPeriod([Calendar].[day]) ) )
CREATE CATEGORY MEMBER [Util].[Categories].[default].[MTD] as [MTD Set]
CREATE CATEGORY MEMBER [Util].[Categories].[default].[YTD] as [YTD Set]

To use them is even simpler:

member [Measures].[Sales PTD] as ([Util].[Categories].[MTD], [Measures].[sales] )
member [Measures].[Sales Last Year YTD] as ([Util].[Categories].[YTD], [Measures].[sales] )

This is the fastest version, taking 1.6 seconds. It’s also IMHO, the cleanest and easy to reuse.

I hope these few examples will help you writing more efficient and faster MDX statements.

In this post we’re going to explain one of the latest functions in icCube , sumOn / VectorK , and how it can be used.

This start with the use case in the healthcare sector that was tricky to solve. It’s simplification of the actual scenario but we have data about the stay in a hospital of a list of patients that look like :

Stay PatiendId Departement Traitement Cost
1 1 Cardiology Ex 75
1 1 Radiology IRM 750
1 1 Cardiology Op 150
2 2 Psychiatry Ex 150

We want to be able to calculate the cost of the whole stay using as filter Department and Traitment.

The standard solution in MDX would be to filter the Stay members and afterwards calculate the stay cost (it’s not the traitement cost), something like :

WITH
// we need to reset the filter so we have all the traitements for a stay
MEMBER [Stay Cost] as ( [Patient].defaultMember, [Department].defaultMember, [Traitement].defaultMember, [Measures].[Cost] )
SELECT
  [Measures].[Stay Cost] on 0,
  NonEmpty( [Stay].[Stay].[Stay - L] , [Measures].[Cost] ) on 1
FROM [Cube]
WHERE [Department].[Department].[Department].[Radiology]

This will work for small schemas and when the [Stay] dimension is not very large but it’s going to get worst when we have millons of rows and have an increasing number of stays.

One solution would be to create a fact table that uses many-to-many relations (check ‘GroupBy on Stay’ view in the schema enclosed at the end of this post).

Stay Departement Traitement Cost
1 [Cardiology Radiology Cardiology] [Ex IRM Op] 975.0
2 [Psychiatry] [Ex] 150.0
3 [Oncology Laboratory Laboratory Radiology Oncology] [Ex GF1 FG2 IRM Op] 7365.0

If this is possible, go for it. In our case this was not possible as we wanted to be able to filter on multiple dimensions using an AND relation. For example, we want to get the stays that had a row with Cardiology AND Ex traitement. The many-to-many relation answers to the question Cardiology OR Ex, so it can not be used as you can see in the table this information was lost in the transformation. We could indeed had created a composed dimension ( Department + Treatment ) but this was not ideal as we wanted the end users to be able to create their own reports with easy filters.

So eventually, the solution is a mix of icCube features based around the Vector aggregation. As a summary of the solution: the new calculation is going to get a measure that returns a vector with unique stays and we will use this vector with the stays to calculate the cost. How ?

a) Create a Measure with ‘Vector Unique’ as aggregation method on the stayId column -> [V – Stay]
b) Create a Vector that will hold for each member id the Stay Cost at the position defined by the id of the member (that has to be an integer). For example, [Stay 23] member that has an id of 23 will put his cost at the position 23 of the vector. To be memory efficient there should be no holes in the stayId values and they should start with 0, so the vector would be as small as possible.

We can already check our ideas :

WITH
   // create a vector with the cost by stay for each stay 
   STATIC FUNCTION VStayCost() as VectorK( [Stay].[Stay].[Stay - L] , [Measures].[Cost] )
   // This will sum for each stay defined in the measure it's cost
   MEMBER [Stay Cost2] as SumOn( VStayCost() , [Measures].[V - Stay ] )
SELECT
   { [Measures].[Stay Cost2], [Measures].[Stay Cost] } on 0,
   NonEmpty( [Stay].[Stay].[Stay - L] , [Measures].[Cost] ) on 1
FROM [Cube]
WHERE [Department].[Department].[Department].[Radiology]

STATIC FUNCTION allows us to evaluate only once ignoring the current cell context: filters, subselects and where clause.

VectorK, is a function that evalautes the expression and sets the value in the position defined by the member key. SumOn allows to sum to vectors, the first one defined the values to sum, the second one the positions to sum.

As our final detail, we move the Function and the measure declaration to the schema script definition and it’s done. Our robust and amazingly fast new calculation is there.

You can download the self-contained schema from here.

Hope you enjoyed !

Shopping Cart Advanced Analytics with icCube and MongoDB

In part 1 and part 2 of our blog post about icCube and MongoDB, we introduced how to load shopping cart data stored in MongoDB and how to perform some SubCube functions. In this third part, we introduce two icCube MDX+ object-oriented extensions: using vectors and defining the same hierarchy on both axes. MDX+ are icCube specific extensions to the MDX standard.

We aim to answer the question, ‘How similar are two customers’ shopping carts?’

For this, we’re going to calculate a cart proximity number and present the result as a heat map between customers.

The shopping cart proximity is a loose concept, indeed. For simplicity’s sake, we can start by visualizing the shopping cart for two customers:

[icCubeReport reportName=”/shared/icCube Blogg/Shopping Cart – Compare Two Customers Chart” width=760 height=350 iframe=false]

How to calculate proximity?

As we want to put in a single number the distance we need to find a way for calculating, Distance( Cust1, Cust2 ), let’s represent a customer as a vector. The members of this vector will represent the number of products bought by the customer:

Cust1 = ( Number of Plasma , Number LED … ) – (2,0,3 …) Cust2 = ( Number of Plasma , Number LED … ) – (1,3,12 …)

Using the number of products measure is arbitrary, and we could choose other measures as well as other dimensions. But let’s continue with this choice.

The first idea is using the classical Euclidian vector distance. However, the result does not match our visual intuition, so it’s not a valid option. Thus, we use a new distance definition based on a graphical concept. Let’s use a distance function based on the visual chart surface.

The distance would be the absolute value of the difference divided by the maximum value (all values are positive, so sign is not a concern). The distance equation would look like this:

Dist = Abs( Cust1 – Cust2 ) / Max( Cust1, Cust2 )

Translating this to icCube is easy thanks to the OO vector support. In order to build a vector object, we can write the following MDX statement:

Vector( [Products Set], [Measures].[Number Carts])

The Vector() function will return a vector object after iterating for each member of the first parameter. MDX+ object-oriented support allows us to perform some math directly on vectors (check here for more information):

If a and b are vectors for customers a and b:

FUNCTION distance(Value a, Value b) AS ( a->pDif(b)->sum() ) / ( a->pMax(b)->sum() )

The notation -> is the MDX+ operator calling the underlying Java function on an object (here vector). The functions used in this calculation are defined as:

pDif(), calculated for each vector element the abs( ai – bi ) pMax(), the maximum value for each index sum(), the sum function will return the sum of each element

In our case, taking 1 for the length, this matches the surface of the chart.

Putting all this together in an MDX (the cache is for improving performance):

WITH
  // calculates the distance our shopping carts ,
  FUNCTION distance(Value a, Value b) as ( a->pDif(b)->sum() ) / ( a->pMax(b)->sum() )
  // FUNCTION distance(Value a, Value b) as a->corr( b )
  // true if we're on the top triangle of our axis
  FUNCTION isAbove() as [Customer].[Customer].current(0).name > [Customer].[Customer].current(1).name
  // a little bit of MDX leveraging request cache so we don't calculate each vector N times
  MEMBER Vect as Vector( [Product].[Product].[Product L], [Measures].[Number Carts]), REQUEST_CACHED = true
  // basket for each customer, to leverage cache use only one hierarchy
  MEMBER Basket1 as ([Customer].[Customer].current(0), [Measures].[Vect])
  MEMBER Basket2 as ([Customer].[Customer].current(1), [Measures].[Vect])
  // MEMBER Basket2 as ([Customer 2].[Customer 2].defaultMember, LinkMember( [Customer 2].[Customer 2].currentmember,[Customer].[Customer]) , [Measures].[Vect])
  // a distance between two basket (1 if they are the same going somehow to 0 if they are fully different)
  MEMBER Dist as IIF( isAbove(), null, 1-distance([Basket1],[Basket2]) ), FORMAT_STRING="percent" //, BACK_COLOR=paletteOutliersLight( currentCellValue() )
  SET [Cust1] as nonempty( [Customer].[Customer].[Customer - L], [Measures].[Number Carts]).head(25)
SELECT
  [Cust1] on 0,
  [Cust1] on 1
FROM [Cube]
WHERE [Dist]

Note this MDX statement uses a 4.8 functionality that allows defining the same hierarchy on both axis-0 and axis-1. The Current(0) function is picking the first (on axis-0) member for the hierarchy.

Eventually, all this can be put together in a single dashboard for experimenting that you can check here.

Shopping Cart Advanced Analytics with icCube and MongoDB

In our previous post, we introduced how to load shopping cart data stored in MongoDB to icCube leveraging the array support. Here we introduce the MDX+ SubCube functions of icCube. MDX+ are icCube specific extensions to the MDX standard.

The kind of question we’re going to answer here looks like, ‘How to make stats/count all shopping carts with TVs that have no cheese sales?’

Even though this query might not look as the most thrilling one, we might easily extend this for other scenarios and businesses as surveys and other customer behaviour scenarios.

Coming back to our query, there is no straightforward way to translate the question into MDX language. There is no way to create a filter defined as ‘all shopping carts with TVs but without cheese’. Why?

A short introduction to standard MDX

The MDX language works nicely in a universe without many-to-many relations but less so when those relations exist. By many-to-many we mean a hierarchy where several members, e.g., [United Kingdom] and [USA], are pointing to the same data. By definition, those members are from the same hierarchy, [Geo], and level, [Country]. Without many-to-many, defining a filter like [UK] – [US] has no sense as it is empty by construction.

An approach to understanding what is going on in MDX is to see MDX tuples as a way of defining/filtering a set of fact rows. In standard MDX, and with tuples without calculated measures, we can have two ways combining rows: Union and Intersects.

The tuple ( [USA], [2010] ): will be an intersect, aka logical-and, of the fact rows defined by [USA] and [2010].

A set {[USA],[UK]}: will be a union, aka logical-or, of the fact rows defined by [USA] and [United Kingdom].

We can add sets and tuples to the ‘where’ clause and to the subquery, but they impact how tuples and calculated members are evaluated. Unfortunately, this is all we have in standard MDX as set operations and combining them is an exercise of technique, experience, and imagination. Some operations are just overly complicated and using calculated measures might be error prone, poorly stable, and/or time consuming. Let’s say there is no proper support in MDX for basic set operations: combining union, intersection, and complement functions (logical and, or & not).

How to overcome this limitation

Since icCube 4.7, we have the MDX+ SubCube functions family:

SubCubeUnion, SubCubeIntersect, SubCubeComplement, SubCubeMinus, and SubCubeSymDifference.

Minus and Symmetrical Difference are a shortcut for combinations of the first three basic ‘brick’ operations. SubCubeMinus( A, B ) is a shortcut for SubCubeIntersect( A, SubCubeComplement(B) ). You can define your own set functions using MDX+ functional support(link here).

How do these functions work?

For example, let’s take SubeCubeMinus( [USA] , [United Kingdom] ).

This will filter all rows defined by [USA] minus the rows defined by [United Kingdom]. Note that the parameters of SubCube functions do not need to have the same dimensionality; SubCubeMinus( [USA], ([UK],[2010] ) is a valid MDX statement.

Before entering into an MDX example, feel free to try the following reporting example:

[icCubeReport reportName=”/shared/icCube Blogg/Shopping Cart – SubCube” width=760 height=820 iframe=false]

This is a dynamic report embedded with icCube’s WordPress plugin

An example of MDX syntax looks like:

WITH
  CATEGORY HIERARCHY [Stats].[Filter]
  CATEGORY STATIC MEMBER [Stats].[Filter].[All-M].[filter] AS SubCubeMinus( [Product].[Product Family].[Product Family L].[TV], [Product Plain].[Product Plain].[Product L].[Crab] )
SELECT
  [Measures].members ON COLUMNS,
  NON EMPTY [Age Group].[Age Group - L] ON ROWS
FROM [Cube]
WHERE [Stats].[Filter].[All-M].[filter]

SubCube functions return a SubCube object and currently can only be attached to category members. On the above MDX, we’re attaching our SubCube to a filter member that we can use as a normal MDX member.

We can also use categories and SubCube functions for axis, as in this example:

WITH
  CATEGORY HIERARCHY [Stats].[Filter]
  CATEGORY MEMBER [Stats].[Filter].[All-M].[With TV] AS {[Product].[Product Family].[Product Family L].[TV]}
  CATEGORY MEMBER [Stats].[Filter].[All-M].[TV without Seefood] AS SubCubeMinus( [Product].[Product Family].[Product Family L].[TV], [Product].[Product Family].[Product Family L].[Seafood] )
  CATEGORY MEMBER [Stats].[Filter].[All-M].[TV without VideoGames] AS SubCubeMinus( [Product].[Product Family].[Product Family L].[TV], [Product].[Product Family].[Product Family L].[Video Games] )
  CATEGORY MEMBER [Stats].[Filter].[All-M].[TV without Cheese] AS SubCubeMinus( [Product].[Product Family].[Product Family L].[TV], [Product].[Product Family].[Product Family L].[Cheese] )
SELECT
  [Measures].members ON COLUMNS,
  [Stats].[Filter].[All-M].children ON ROWS
FROM [Cube]

In our next post, we’re going to use object-oriented extensions to make some calculations. See you in Chapter III, presenting object-oriented icCube extensions.

Shopping carts are a good example of big data with the need of advanced analytics. Here we present new MDX+ features of icCube – categories, subcubes, and arrays – coupled with the power and flexibility of MongoDB applied to a shopping cart model.

Shopping transactions can be massive. A typical model might have millions, or hundreds of millions, of shopping carts with an average of 10 to 50 products. Doing a bit of math, 10 million shopping carts with an average of 50 products will produce 500 million single-product transactions; 100 million shopping carts will produce 5 billion single-product transactions. That’s starting to be big. To focus on customer behaviour, our analysis will center on shopping carts’ profitability not individual products. Why we’re interested in being able to handle this problem at shopping cart level instead of product transaction is easy to understand: we’re speaking about 50 times less rows. This is important for loading data, saving data onto memory/disks, as well as performing analytical operations. Our model will have a large number of dimensions – e.g., customer information, geo information, product categories etc. – with a high sparsity, so pre-aggregation is not a valid solution.

Let’s start by showing how a typical shopping cart MongoDB document would be like:

{ "_id" : ObjectId("53bfe3f3578ddc736f679423"), "id" : 2, "customer" : { "id" : 46, "name" : "Danielle", "age" : 24, "gender" : "M" }, "date" : ISODate("2014-05-10T14:00:00Z"), "Shopping Cart" : [{ "Group" : "Foodstuff", "Family" : "Fresh fruits", "Product" : "Berries", "Catalogue Price" : 4.0, "Customer Price" : 4.0, "Margin" : 0.61 }, { "Group" : "Foodstuff", "Family" : "Seafood", "Product" : "Lobster", "Catalogue Price" : 12.0, "Customer Price" : 12.0, "Margin" : 1.81 } .. }

To make it simple, we’ve included just a bit of information about the customer and a shopping cart with a list of products. In a relational database, the same structure would require a couple of tables (Shopping_Cart, Customer, Shopping_Cart_Product_Transaction…) with their foreign keys.

How would we map this structure in icCube ?

Thanks to the icCube’s MongoDB driver, we can directly embed JSON arrays that will be transparently transformed in many-to-many relation (each cart has N products). No worries, icCube will handle this for you.

Without the array support, we would have to solve this in MongoDB with a denormalization, in a relational database by performing a ‘cross-product’ between two tables, both resulting in a row for each product transaction. There are two problems with this approach. The first problem is the need for a lot more memory and disk I/O as we store each product shopping transaction (fifteen times more). The second problem is that in order to count the number of shopping carts, we have to perform a distinct count on the shopping cart id, and this is, by far, the slowest aggregation method available in an OLAP Server.

Fortunately, in icCube we can handle directly arrays as returned by MongoDB, so we can use the fastest aggregation method available: ‘count rows’. And this is quicker than distinct count by some order of magnitude with the same amount of fact rows, so it’s a huge difference as we have 15 less fact rows.

There is a bit of work to create the dimensions and facts to build a schema. We did it for you, so you can check the resulting icCube schema here.

Once the schema is created, we can start playing with icCube.

It’s important to pay attention that this schema will have a few dimensions that present a many-to-many relation.

What is the impact of a many-to-many relation?

In the classical relation, each parent is the aggregation of its children. If we use the sum, a parent will be the sum of its children; that’s simple and intuitive. What is also important is that two children do not share any value (or row); their intersection is always empty. Let’s say no shopping carts could have two products ‘Banana’ and ‘TV’.

If the relation is a many-to-many, as in products – shopping carts, this principle is broken. Not only a parent is no longer the sum of its children, but also the intersection of two children might not be empty. There are shopping carts with ‘Bananas’ and ‘TVs’. We will have multiple shopping carts with two different products and with the number of shopping carts no longer the sum of the shopping carts for each product.

If you’re not used to MDX, you might be saying, yes and what’s the problem?

MDX was mainly built to handle simple relations, so set operations between different members are not properly supported. For people used to RDBM, there is no equivalent of SQL join clauses. For example, filtering all shopping carts that have ‘Bananas’ without ‘TVs’ is complicated or just impossible. Bear in mind that we can have an already complex MDX query with calculated members over which we might apply this condition.

How could you solve this with icCube?

In order to support this kind of operation, we introduced in our latest icCube version a set of functions – SubCube ( SubCubeUnion ,SubCubeIntersect … ):

SubCubeMinus( [Products].[Bananas], [Products].[Tvs] )

These will create a SubCube with all shopping carts that have ‘Bananas’ without ‘TVs’.

It’s a lot for a single blog post. Stay tuned for the second and third part.

Since a couple of versions icCube includes out of the box support for basic Matrix algebra in MDX. There is a support for two basic bricks : vectors and matrices.

In this short blog we’re going to use icCube to solve a simple linear equation. To make it self contained I’m going to write an MDX that does not rely on any specific schema :

Heavily inspired on this article we’re going to solve the following equation :

x + y + z = 6
0 + 2y + 5z = -4
2x + 5y – z = 27

In matrix syntax we could write it as A * X = B. Where A is a matrix, and both X and B are vectors.

What we need is to build a matrix. In icCube a matrix can be build with a set of vectors, so first we will create our three vectors :

Vector(1,0,2)
Vector(1,2,5)
Vector(1,5,-1)

Let’s put together the three vector to have our matrix.

Matrix( Vector(1,0,2),Vector(1,2,5),Vector(1,5,-1) )

You can check directly the output in icCube’s IDE with the following statement (Note that Vector and Matrices are like any other scalar entities – e.g. the number 2.0 )

WITH MEMBER A AS Matrix( Vector(1,0,2),Vector(1,2,5),Vector(1,5,-1) ) SELECT A on 0 FROM [Sales]




  In order to solve the equation we can calculate the inverse and multiply by B :  X = A-1 * B icCube version would be :
WITH MEMBER InvA AS Matrix( Vector(1,0,2),Vector(1,2,5),Vector(1,5,-1) )->inverse() * Vector(6,-4,27) SELECT InvA on 0 FROM [Sales]



 

That gives a result pretty close to what we expect, this due to rounding issues. To solve this minor problem, instead of using the inverse matrix we can use directly the solve method available in icCube that uses an LUDecomposition :

WITH Function Solve(a_) AS Matrix( Vector(1,0,2),Vector(1,2,5),Vector(1,5,-1) )->solve( a_ ) MEMBER InvA AS Solve( Vector(6,-4,27) ) SELECT InvA on 0 FROM [Sales]



 

And here we got our result without rounding issue with the bonus of showing icCube’s support for functions.

Most probably, you will build your Vector and Matrix with data coming out of you model. For this you can use the same function with a slight difference syntax. Feel free to check the online documentation ( Vector doc, Matrix doc) and to experiment for your own.

That’s the end and it’s just a tiny intro on doing a bit of linear algebra with icCube and showing the object oriented support in icCube. If you’re wondering what’s the use of this, matrix calculations are a angular part of a lot of businesses as for example risk calculations and cost allocation.

If you are interested in learning more about MDX and MDX+, please take a look at our MDX Tutorial and MDX Documentation.