Shopping Cart Advanced Analytics with icCube and MongoDB
In part 1 and part 2 of our blog post about icCube and MongoDB, we introduced how to load shopping cart data stored in MongoDB and how to perform some SubCube functions. In this third part, we introduce two icCube MDX+ object-oriented extensions: using vectors and defining the same hierarchy on both axes. MDX+ are icCube specific extensions to the MDX standard.
We aim to answer the question, ‘How similar are two customers’ shopping carts?’
For this, we’re going to calculate a cart proximity number and present the result as a heat map between customers.
The shopping cart proximity is a loose concept, indeed. For simplicity’s sake, we can start by visualizing the shopping cart for two customers:
[icCubeReport reportName=”/shared/icCube Blogg/Shopping Cart – Compare Two Customers Chart” width=760 height=350 iframe=false]How to calculate proximity?
As we want to put in a single number the distance we need to find a way for calculating, Distance( Cust1, Cust2 ), let’s represent a customer as a vector. The members of this vector will represent the number of products bought by the customer:
Cust1 = ( Number of Plasma , Number LED … ) – (2,0,3 …) Cust2 = ( Number of Plasma , Number LED … ) – (1,3,12 …)
Using the number of products measure is arbitrary, and we could choose other measures as well as other dimensions. But let’s continue with this choice.
The first idea is using the classical Euclidian vector distance. However, the result does not match our visual intuition, so it’s not a valid option. Thus, we use a new distance definition based on a graphical concept. Let’s use a distance function based on the visual chart surface.
The distance would be the absolute value of the difference divided by the maximum value (all values are positive, so sign is not a concern). The distance equation would look like this:
Dist = Abs( Cust1 – Cust2 ) / Max( Cust1, Cust2 )
Translating this to icCube is easy thanks to the OO vector support. In order to build a vector object, we can write the following MDX statement:
Vector( [Products Set], [Measures].[Number Carts])
The Vector() function will return a vector object after iterating for each member of the first parameter. MDX+ object-oriented support allows us to perform some math directly on vectors (check here for more information):
If a and b are vectors for customers a and b:
FUNCTION distance(Value a, Value b) AS ( a->pDif(b)->sum() ) / ( a->pMax(b)->sum() )
The notation -> is the MDX+ operator calling the underlying Java function on an object (here vector). The functions used in this calculation are defined as:
pDif(), calculated for each vector element the abs( ai – bi ) pMax(), the maximum value for each index sum(), the sum function will return the sum of each elementIn our case, taking 1 for the length, this matches the surface of the chart.
Putting all this together in an MDX (the cache is for improving performance):
WITH // calculates the distance our shopping carts , FUNCTION distance(Value a, Value b) as ( a->pDif(b)->sum() ) / ( a->pMax(b)->sum() ) // FUNCTION distance(Value a, Value b) as a->corr( b ) // true if we're on the top triangle of our axis FUNCTION isAbove() as [Customer].[Customer].current(0).name > [Customer].[Customer].current(1).name // a little bit of MDX leveraging request cache so we don't calculate each vector N times MEMBER Vect as Vector( [Product].[Product].[Product L], [Measures].[Number Carts]), REQUEST_CACHED = true // basket for each customer, to leverage cache use only one hierarchy MEMBER Basket1 as ([Customer].[Customer].current(0), [Measures].[Vect]) MEMBER Basket2 as ([Customer].[Customer].current(1), [Measures].[Vect]) // MEMBER Basket2 as ([Customer 2].[Customer 2].defaultMember, LinkMember( [Customer 2].[Customer 2].currentmember,[Customer].[Customer]) , [Measures].[Vect]) // a distance between two basket (1 if they are the same going somehow to 0 if they are fully different) MEMBER Dist as IIF( isAbove(), null, 1-distance([Basket1],[Basket2]) ), FORMAT_STRING="percent" //, BACK_COLOR=paletteOutliersLight( currentCellValue() ) SET [Cust1] as nonempty( [Customer].[Customer].[Customer - L], [Measures].[Number Carts]).head(25) SELECT [Cust1] on 0, [Cust1] on 1 FROM [Cube] WHERE [Dist]
Note this MDX statement uses a 4.8 functionality that allows defining the same hierarchy on both axis-0 and axis-1. The Current(0) function is picking the first (on axis-0) member for the hierarchy.
Eventually, all this can be put together in a single dashboard for experimenting that you can check here.
By David Alvarez