Wednesday, March 21, 2012

Finding Max And Min Values with Versioned Documents

Finding Max And Min Values with Versioned Documents Credit: Amos King Problem You want to list the latest version numbers of a set of documents. Each document contains a field that represents the version of the document and a field representing which document that is a version of: { "document_id" : "mongoDB How-To", "author" : "Amos King", "content" : "...", "version" : 1.0 } We want to end up with a collection of document_ids and their largest version number: {"_id" : "mongoDB How To", "value" : 1.1} {"_id" : "Resume", "value" : 6} {"_id" : "Schema", "value" : 1} Solution Use the mapreduce database command. Emit each document_id and version in the map function, then use the reduce function to find the max version. 1. Map The map function is very simple. We use our common element between all versions as the key and the version as the value: map = function () { emit(this.document_id, this.version); } 2. Reduce The reduce function is also very simple but has a little bit of javascript magic. Math.max normally takes in any number of arguments(ie. Math.max(1,2,3) ), but we need to call it with an array. So we call Math.max with apply so that we can pass in an array of values to max. The apply breaks the array into individual arguments to pass to Math.max. The first argument to apply is the context in which we want to run; Math will do fine here. reduce = function (key, values) { return Math.max.apply(Math, values); } Finding the minimum value is as easy as replacing Math.max with Math.min. 3. Call the mapreduce command Now it's time to get our result set. We'll set the output collection name parameter to 'newest_versions' so that we'll have an appropriately named set to work with: > result = db.runCommand({ ... "mapreduce" : "documents", ... "map" : map, ... "reduce" : reduce, ... "out" : "newest_versions"}) Now, we query the 'newest_versions' collection. Each document is exactly what we're looking for: > db.newest_versions.find() {"_id" : "mongoDB How To", "value" : 1.1} {"_id" : "Resume", "value" : 6} {"_id" : "Schema", "value" : 1} Extras The Map and Reduce Functions can be rewritten slightly to return the Maximum and Minimum versions of each document. For the purpose of this example, the input collection is as follows: (The _id values have been truncated for brevity.) > db.documents.find() { "_id" : 1, "document_id" : "mongoDB How-To", "author" : "Amos King", "content" : "...", "version" : 1 } { "_id" : 2, "document_id" : "mongoDB How-To", "author" : "Amos King", "content" : "...", "version" : 1.1 } { "_id" : 3, "document_id" : "Resume", "author" : "Author", "content" : "...", "version" : 6 } { "_id" : 4, "document_id" : "Schema", "author" : "Someone Else", "content" : "...", "version" : 0.9 } { "_id" : 5, "document_id" : "Schema", "author" : "Someone Else", "content" : "...", "version" : 1 } > Map The new Map function emits documents containing the document_id, and "value" key containing a list of embedded documents, each containing the keys, "max" and "min". Both keys are initially set to be equal to the "version" key of the current document. Because there is only one document containing the "document_id" : "Resume", this output will not need to be reduced.

No comments: