Transforming Views with List Functions

Just as show functions convert documents to arbitrary output formats, CouchDB list functions allow you to render the output of view queries in any format. The powerful iterator API allows for flexibility to filter and aggregate rows on the fly, as well as output raw transformations for an easy way to make Atom feeds, HTML lists, CSV files, config files, or even just modified JSON.

List functions are stored under the lists field of a design document. Here’s an example design document that contains two list functions:

{
  "_id" : "_design/foo",
  "_rev" : "1-67at7bg",
  "lists" : {
    "bar" : "function(head, req) { var row; while (row = getRow()) { ... } }",
    "zoom" : "function() { return 'zoom!' }",
  }
}

Arguments to the List Function

The function is called with two arguments, which can sometimes be ignored, as the row data itself is loaded during function execution. The first argument, head, contains information about the view. Here’s what you might see looking at a JSON representation of head:

{total_rows:10, offset:0}

The request itself is a much richer data structure. This is the same request object that is available to show, update, and filter functions. We’ll go through it in detail here as a reference. Here’s the example req object:

{
  "info": {
    "db_name": "test_suite_db","doc_count": 11,"doc_del_count": 0,
    "update_seq": 11,"purge_seq": 0,"compact_running": false,"disk_size": 4930,
    "instance_start_time": "1250046852578425","disk_format_version": 4},

The database information, as available in an information request against a database’s URL, is included in the request parameters. This allows you to stamp rendered rows with an update sequence and know the database you are working with.

  "method": "GET",
  "path": ["test_suite_db","_design","lists","_list","basicJSON","basicView"],

The HTTP method and the path in the client from the client request are useful, especially for rendering links to other resources within the application.

  "query": {"foo":"bar"},

If there are parameters in the query string (in this case corresponding to ?foo=bar), they will be parsed and available as a JSON object at req.query.

  "headers":
    {"Accept": "text/html,application/xhtml+xml ,application/xml;q=0.9,*/*;q=0.8",
    "Accept-Charset": "ISO-8859-1,utf-8;q=0.7,*;q=0.7","Accept-Encoding":
    "gzip,deflate","Accept-Language": "en-us,en;q=0.5","Connection": "keep-alive",
    "Cookie": "_x=95252s.sd25; AuthSession=","Host": "127.0.0.1:5984",
    "Keep-Alive": "300",
    "Referer": "http://127.0.0.1:5984/_utils/couch_tests.html?script/couch_tests.js",
    "User-Agent": "Mozilla/5.0 Gecko/20090729 Firefox/3.5.2"},
  "cookie": {"_x": "95252s.sd25","AuthSession": ""},

Headers give list and show functions the ability to provide the Content-Type response that the client prefers, as well as other nifty things like cookies. Note that cookies are also parsed into a JSON representation. Thanks, MochiWeb!

  "body": "undefined",
  "form": {},

In the case where the method is POST, the request body (and a form-decoded JSON representation of it, if applicable) are available as well.

  "userCtx": {"db": "test_suite_db","name": null,"roles": ["_admin"]}
}

Finally, the userCtx is the same as that sent to the validation function. It provides access to the database the user is authenticated against, the user’s name, and the roles they’ve been granted. In the previous example, you see an anonymous user working with a CouchDB node that is in “admin party” mode. Unless an admin is specified, everyone is an admin.

That’s enough about the arguments to list functions. Now it’s time to look at the mechanics of the function itself.

An Example List Function

Let’s put this knowledge to use. In the chapter introduction, we mentioned using lists to generate config files. One fun thing about this is that if you keep your configuration information in CouchDB and generate it with lists, you don’t have to worry about being able to regenerate it again, because you know the config will be generated by a pure function from your database and not other sources of information. This level of isolation will ensure that your config files can be generated correctly as long as CouchDB is running. Because you can’t fetch data from other system services, files, or network sources, you can’t accidentally write a config file generator that fails due to external factors.

J. Chris got excited about the idea of using list functions to generate config files for the sort of services people usually configure using CouchDB, specifically via Chef, an Apache-licensed infrastructure automation tool. The key feature of infrastructure automation is that deployment scripts are idempotent—that is, running your scripts multiple times will have the same intended effect as running them once, something that becomes critical when a script fails halfway through. This encourages crash-only design, where your scripts can bomb out multiple times but your data remains consistent, because it takes the guesswork out of provisioning and updating servers in the case of previous failures.

Like map, reduce, and show functions, lists are pure functions, from a view query and an HTTP request to an output format. They can’t make queries against remote services or otherwise access outside data, so you know they are repeatable. Using a list function to generate an HTTP server configuration file ensures that the configuration is generated repeatably, based on only the state of the database.

Imagine you are running a shared hosting platform, with one name-based virtual host per user. You’ll need a config file that starts out with some node configuration (which modules to use, etc.) and is followed by one config section per user, setting things like the user’s HTTP directory, subdomain, forwarded ports, etc.

function(head, req) {
  // helper function definitions would be here...
  var row, userConf, configHeader, configFoot;
  configHeader = renderTopOfApacheConf(head, req.query.hostname);
  send(configHeader);

In the first block of the function, we’re rendering the top of the config file using the function renderTopOfApacheConf(head, req.query.hostname). This may include information that’s posted into the function, like the internal name of the server that is being configured or the root directory in which user HTML files are organized. We won’t show the function body, but you can imagine that it would return a long multi-line string that handles all the global configuration for your server and sets the stage for the per-user configuration that will be based on view data.

The call to send(configHeader) is the heart of your ability to render text using list functions. Put simply, it just sends an HTTP chunk to the client, with the content of the strings pasted to it. There is some batching behind the scenes, as CouchDB speaks with the JavaScript runner with a synchronous protocol, but from the perspective of a programmer, send() is how HTTP chunks are born.

Now that we’ve rendered and sent the file’s head, it’s time to start rendering the list itself. Each list item will be the result of converting a view row to a virtual host’s configuration element. The first thing we do is call getRow() to get a row of the view.

  while (row = getRow()) {
    var userConf = renderUserConf(row);
    send(userConf)
  }

The while loop used here will continue to run until getRow() returns null, which is how CouchDB signals to the list function that all valid rows (based on the view query parameters) have been exhausted. Before we get ahead of ourselves, let’s check out what happens when we do get a row.

In this case, we simply render a string based on the row and send it to the client. Once all rows have been rendered, the loop is complete. Now is a good time to note that the function has the option to return early. Perhaps it is programmed to stop iterating when it sees a particular user’s document or is based on a tally it’s been keeping of some resource allocated in the configuration. In those cases, the loop can end early with a break statement or other method. There’s no requirement for the list function to render every row that is sent to it.

  configFoot = renderConfTail();
  return configFoot;
}

Finally, we close out the configuration file and return the final string value to be sent as the last HTTP chunk. The last action of a list function is always to return a string, which will be sent as the final HTTP chunk to the client.

To use our config file generation function in practice, we might run a command-line script that looks like:

curl http://localhost:5984/config_db/_design/files/_list/apache/users?hostname=foobar > apache.conf

This will render our Apache config based on data in the user’s view and save it to a file. What a simple way to build a reliable configuration generator!

List Theory

Now that we’ve seen a complete list function, it’s worth mentioning some of the helpful properties they have.

The most obvious thing is the iterator-style API. Because each row is loaded independently by calling getRow(), it’s easy not to leak memory. The list function API is capable of rendering lists of arbitrary length without error, when used correctly.

On the other hand, this API gives you the flexibility to bundle a few rows in a single chunk of output, so if you had a view of, say, user accounts, followed by subdomains owned by that account, you could use a slightly more complex loop to build up some state in the list function for rendering more complex chunks. Let’s look at an alternate loop section:

var subdomainOwnerRow, subdomainRows = [];
while (row = getRow()) {

We’ve entered a loop that will continue until we have reached the endkey of the view. The view is structured so that a user profile row is emitted, followed by all of that user’s subdomains. We’ll use the profile data and the subdomain information to template the configuration for each individual user. This means we can’t render any subdomain configuration until we know we’ve received all the rows for the current user.

  if (!subdomainOwnerRow) {
    subdomainOwnerRow = row;

This case is true only for the first user. We’re merely setting up the initial conditions.

  } else if (row.value.user != subdomainOwnerRow.value.user) {

This is the end case. It will be called only after all the subdomain rows for the current user have been exhausted. It is triggered by a row with a mismatched user, indicating that we have all the subdomain rows.

    send(renderUserConf(subdomainOwnerRow, subdomainRows));

We know we are ready to render everything for the current user, so we pass the profile row and the subdomain rows to a render function (which nicely hides all the gnarly nginx config details from our fair reader). The result is sent to the HTTP client, which writes it to the config file.

    subdomainRows = [];
    subdomainOwnerRow = row;

We’ve finished with that user, so let’s clear the rows and start working on the next user.

  } else {
    subdomainRows.push(row);

Ahh, back to work, collecting rows.

  }
}
send(renderUserConf(subdomainOwnerRow, subdomainRows));

This last bit is tricky—after the loop is finished (we’ve reached the end of the view query), we’ve still got to render the last user’s config. Wouldn’t want to forget that!

The gist of this loop section is that we collect rows that belong to a particular user until we see a row that belongs to another user, at which point we render output for the first user, clear our state, and start working with the new user. Techniques like this show how much flexibility is allowed by the list iterator API.

More uses along these lines include filtering rows that should be hidden from a particular result set, finding the top N grouped reduce values (e.g., to sort a tag cloud by popularity), and even writing custom reduce functions (as long as you don’t mind that reductions are not stored incrementally).

Querying Lists

We haven’t looked in detail at the ways list functions are queried. Just like show functions, they are resources available on the design document. The basic path to a list function is as follows:

/db/_design/foo/_list/list-name/view-name

Because the list name and the view name are both specified, this means it is possible to render a list against more than one view. For instance, you could have a list function that renders blog comments in the Atom XML format, and then run it against both a global view of recent comments as well as a view of recent comments by blog post. This would allow you to use the same list function to provide an Atom feed for comments across an entire site, as well as individual comment feeds for each post.

After the path to the list comes the view query parameter. Just like a regular view, calling a list function without any query parameters results in a list that reflects every row in the view. Most of the time you’ll want to call it with query parameters to limit the returned data.

You’re already familiar with the view query options from Chapter 6, Finding Your Data with Views. The same query options apply to the _list query. Let’s look at URLs side by side; see Example 1, “A JSON view query”.

GET /db/_design/sofa/_view/recent-posts?descending=true&limit=10

Example 1. A JSON view query

This view query is just asking for the 10 most recent blog posts. Of course, this query could include parameters like startkey or skip—we’re leaving them out for simplicity. To run the same query through a list function, we access it via the list resource, as shown in Example 2, “The HTML list query”.

GET /db/_design/sofa/_list/index/recent-posts?descending=true&limit=10

Example 2. The HTML list query

The index list here is a function from JSON to HTML. Just like the preceding view query, additional query parameters can be applied to paginate through the list. As we’ll see in Part III, “Example Application”, once you have a working list, adding pagination is trivial. See Example 3, “The Atom list query”.

GET /db/_design/sofa/_list/index/recent-posts?descending=true&limit=10&format=atom

Example 3. The Atom list query

The list function can also look at the query parameters and do things like switch that output to render based on parameters. You can even do things like pass the username into the list using a query parameter (but it’s not recommended, as you’ll ruin cache efficiency).

Lists, Etags, and Caching

Just like show functions and view queries, lists are sent with proper HTTP Etags, which makes them cacheable by intermediate proxies. This means that if your server is starting to bog down in list-rendering code, it should be possible to relieve load by using a caching reverse proxy like Squid. We won’t go into the details of Etags and caching here, as they were covered in Chapter 8, Show Functions.