MongoDB
The icCube MongoDB datasource allows for accessing MongoDB server using native features such as the aggregation framework and map/reduce commands. Its implementation is based on the MongoDB Java driver (www).
When creating a new MongoDB datasource, typical configuration properties are required such as the server address, the DB name, etc... Note that the user's credentials are optional to support MongoDB in 'secure mode'.
Default Port Number
Details about the MongoDB server port number can be found here.
Date Time Zone
The time zone used for formatting datetime types can be specified; if left blank the icCube server default time zone (i.e., default Java machine time zone) is used. Note that unless specified otherwise date types are handled using the UTC/GMT time zone.
Advanced Properties
The advanced properties allows for defining any extra property of the underlying Java driver (www) using the syntax of a typical Java properties file.
Aggregate Table
An icCube aggregate table is based on the MongoDB aggregation framework (www).
Pipeline
The pipeline syntax is pretty much the same as what you would use in the MongoDB shell (www). But as this is a JSON document that must be consistent with the underlying Java driver BSON parser, some differences apply: e.g., see date handling below.
Example :
{ '$unwind' : '$Shopping Cart' }, { '$group':{ '_id' : { '_id' : '$_id' }, "Customer" : { $first : "$customer" }, "Shopping Date" : { $first:"$date"} , "Shopping Car Items": { $sum: 1 }, "Shopping Car Customer Price": { $sum: "$Shopping Cart.Customer Price" }, "Shopping Car Catalogue Price": { $sum: "$Shopping Cart.Catalogue Price" }, "Shopping Car Margin": { $sum: "$Shopping Cart.Margin" }, "Shopping Cart" : { $push : "$Shopping Cart" } } }
Map/Reduce Table
An icCube map/reduce table is based on the MongoDB map/reduce framework (www).
Commands
The commands syntax is pretty much the same as what you would use in the MongoDB shell (www). But as this is a JSON document that must be consistent with the underlying Java driver BSON parser, some differences apply: e.g., see date handling below, functions must be enclosed in strings and therefore embedded strings must be properly escaped when required.
Common Features
Here are several features that apply both to the aggregate and map/reduce tables.
Prototype
The prototype is an optional JSON object/document that allows for defining the actual columns; this is required when the first document returned by the aggregate (or map/reduce) table does not contain all the requested columns. The prototype contains all the required fields and each field is defined using a dummy value of its expected type (note that the table type in the column definitions is not relevant; the output type is the one used by icCube). Here is an example defining several columns of simple types as well as arrays (more on this later):
{ _id: "", name: "", status: { created: { $date: "2000-01-01T00:00:00.000Z" }, state: "" }, keywords: [ "" ], references: [{ name: "", code: "" }] }
Date
Should you want to specify a date (e.g., within an aggregation pipeline or map/reduce commands) use { $date : "2014-01-01T00:00:00.000Z" } instead of for example ISODate("2014-01-01T00:00:00.000Z"). This is required by the underlying Java driver BSON parser.
Nested Object
Documents returned by the aggregate (and map/reduce) table can contains nested objects (see array below). iCube is creating a new column for each nested field using a path like notation:
{ _id: "", name: "", status: { created: { $date: "2000-01-01T00:00:00.000Z" }, state: "" } }
The following columns will be created:
_id [ String ] name [ String ] status.created [ DateTime ] status.state [ String ]
Arrays (aka. many-2-many)
In a document-oriented database it is more likely to have many-2-many relations. For example, let's assume an icCube table represents a list of articles; each article being indexed by a set of keywords. This could be modeled as following:
{ _id: "...", title: "icCube MongoDB datasource", keywords: [ "icCube, MongoDB" ] }
From that underlying data model we are defining a ' keyword ' dimension to analyse their usage. This means there is a many-2-many relation as each article is possibly indexed by more than one keyword. For that purpose, icCube supports the ' array ' type meaning each article facts will be linked to several members of the ' keyword ' dimension. No additional configuration is required for that many-2-many relation.
Incremental Load
Incremental load is supported via an additional pipeline (or map/reduce command) that is executed when the schema is updated. The special icCube $ic3incrValue statement is going to be replaced with the value (from the previous schema full-load/update) of the incremental column. Note that icCube is merely doing a search and replace of this statement with the actual value. It is left up to the user to properly format the value (e.g., using quotes for strings or $date statement as explain above). More details about incremental load in general are available here.
Next chapter: Hadoop (Impala/Hive) describes how to extend the icCube server with new types of data sources.