Saturday, January 17, 2015

MongoDB Quarter Aggregation

Since last spring I've been working on a new project using MongoDB as the backend data store.  We're using MongoDB's aggregation framework for our data analysis and one thing we are frequently doing is slicing data by date parts, such as months, days, weeks, etc, for further aggregation and analysis.  The aggregation framework includes date operators to help with extracting date parts from date fields, but there is no operator to extract a three-month business quarter.  This may be in part because the definition of when quarter starts and ends can vary.  For us, the first quarter starts on January 1st and the last quarter ends on December 31st.

By joining some math and string operators together you can extract the business quarter from a MongoDB date field.  Be warned, however, that this returns the quarter number as a string, not as a number as other aggregation framework date operators do.

db.dates.aggregate([
 {$match: {date: {$exists: true}}},
 {$project: {dateQuarter: {$substr: [{$add: [{$divide: [{$subtract: [{$month: "$date"}, 1]}, 3]}, 1]}, 0, 1]}}}
])
For this example, the query could be executed on a collection of documents containing a single date field:

{
    "_id" : ObjectId("54baa73bfccd5e75836d0c6f"),
    "date" : ISODate("2015-05-20T01:20:13.551Z")
}

The first $match in the pipeline ensures that the document actually has a date field. This is important because without this you'll get a nasty error that MongoDB "can't convert from BSON type EOO to Date" coming from any documents in the collection missing the date field. It would be nice if the aggregation framework was smart enough to exclude any documents without the field on its own.

The second operation in the pipeline is the $project (this could be a $group operator as well) which extracts the quarter from the date.  The first step is to extract the month from the date field, then subtract 1 from the month number.  The result is then divided by 3 and 1 is added to the quotient.  This is where some trickery is involved.  Since there is no way to round a number in the aggregation framework, we are left with a few places to the right of the decimal we need to get rid of.  I used the $substr operator to remove everything but the first number in the result, the downside of this is the resuling dateQuarter field is now a string.  In our application having the quarter as a string wasn't a problem, so this approach worked for us.  You could potentially cast the quarter in your application code if necessary.

No comments:

Post a Comment