Group By (ETL)

Allows for merging several rows together and creating arrays for many2many relations.

This transformation allows for creating a 'group by' similar to the one existing in SQL. Additionally, the view can create join values in arrays that are handy for generating with a single table many-2-many relations.

In this example we are going to group rows by COUNTRY and perform several aggregations of the AMOUNT column; remaining columns values are grouped into arrays defining a many-2-many relation.

Input : js_filter_cols

COUNTRY, PRODUCT, AMOUNT
Canada,  Bike,    10
Canada,  Bike,    20
Canada,  Car,     20
Canada,  Car,     40
USA,     Bike,    100
USA,     Bike,    200
USA,     Car,     500
USA,     Car,     1000
France,  Bike,    10
France,  Bike,    20
Spain,   Car,     100
Spain,   Car,     200

Output : sales_group_by

COUNTRY, PRODUCT, SUM(AMOUNT)
Canada,  [Bike, Bike, Car, Car], 90
USA,     [Bike, Bike, Car, Car], 1800
France,  [Bike, Bike],           30

Walkthrough

This new table reduces the number of rows and allows to easily calculate the number of countries. Instead of a measure with a 'distinct count' we can use a 'count rows' that is an order of magnitude faster. This kind of transformations is handy for fields as basket analysis.

The schema mentioned in this page can be downloaded here. It requires icCube v5.x onwards and a JRE 1.8 (for the Javascript code transformation).

Next chapter: Join describes how to join two tables