Say you are building a message service with CouchDB. Each user has an inbox database and other users send messages by dropping them into the inbox database. When users want to read all messages received, they can just open their inbox databases and see all messages.
So far, so simple, but now you’ve got your users hitting the Refresh button all the time once they’ve looked at their messages to see if there are new messages. This is commonly referred to as polling. A lot of users are generating a lot of requests that, most of the time, don’t show anything new, just the list of all the messages they already know about.
Wouldn’t it be nice to ask CouchDB to give you notice when a new message arrives? The _changes
database API does just that.
The scenario just described can be seen as the cache invalidation problem; that is, when do I know that what I am displaying right now is no longer an apt representation of the underlying data store? Any sort of cache invalidation, not only backend/frontend-related, can be built using _changes
.
_changes
is also designed and suited to extract an activity stream from a database, whether for simple display or, equally important, to act on a new document (or a document change) when it occurs.
The beauty of systems that use the changes API is that they are decoupled. A program that is interested only in latest updates doesn’t need to know about programs that create new documents and vice versa.
Here’s what a changes
item looks like:
{"seq":12,"id":"foo","changes":[{"rev":"1-23202479633c2b380f79507a776743d5"}]}
There are three fields:
seq
update_seq
of the database that was created when the document with the id
got created or changed.id
changes
The changes API is available for each database. You can get changes that happen in a single database per request. But you can easily send multiple requests to multiple databases’ changes API if you need that.
Let’s create a database that we can use as an example later in this chapter:
> HOST="http://127.0.0.1:5984" > curl -X PUT $HOST/db {"ok":true}
There are three ways to request notifications: polling (the default), long polling and continuous. Each is useful in a different scenario, and we’ll discuss all of them in detail.
In the previous example, we tried to avoid the polling method, but it is very simple and in some cases the only one suitable for a problem. Because it is the simplest case, it is the default for the changes API.
Let’s see what the changes for our test database look like. First, the request (we’re using curl
again):
curl -X GET $HOST/db/_changes
The result is simple:
{"results":[ ], "last_seq":0}
There’s nothing there because we didn’t put anything in yet—no surprise. But you can guess where we’d see results—when they start to come in. Let’s create a document:
curl -X PUT $HOST/db/test -d '{"name":"Anna"}'
CouchDB replies:
{"ok":true,"id":"test","rev":"1-aaa8e2a031bca334f50b48b6682fb486"}
Now let’s run the changes request again:
{"results":[ {"seq":1,"id":"test","changes":[{"rev":"1-aaa8e2a031bca334f50b48b6682fb486"}]} ], "last_seq":1}
We get a notification about our new document. This is pretty neat! But wait—when we created the document and got information like the revision ID, why would we want to make a request to the changes API to get it again? Remember that the purpose of the changes API is to allow you to build decoupled systems. The program that creates the document is very likely not the same program that requests changes for the database, since it already knows what it put in there (although this is blurry, the same program could be interested in changes made by others).
Behind the scenes, we created another document. Let’s see what the changes for the database look like now:
{"results":[ {"seq":1,"id":"test","changes":[{"rev":"1-aaa8e2a031bca334f50b48b6682fb486"}]}, {"seq":2,"id":"test2","changes":[{"rev":"1-e18422e6a82d0f2157d74b5dcf457997"}]} ], "last_seq":2}
See how we get a new line in the result that represents the new document? In addition, the first document we put in there got listed again. The default result for the changes API is the history of all changes that the database has seen.
We’ve already seen the change for "seq":1
, and we’re no longer really interested in it. We can tell the changes API about that by using the since=1
query parameter:
curl -X GET $HOST/db/_changes?since=1
This returns all changes after the seq
specified by since
:
{"results":[ {"seq":2,"id":"test2","changes":[{"rev":"1-e18422e6a82d0f2157d74b5dcf457997"}]} ], "last_seq":2}
While we’re discussing options, use style=all_docs
to get more revision and conflict information in the changes
array for each result row. If you want to specify the default explicitly, the value is main_only
. If you only want a specific number of result rows, you can use the limit=N
parameter, where N
is the number of rows you like to retrieve.
The technique of long polling was invented for web browsers to remove one of the problems with the regular polling approach: it doesn’t run any requests if nothing changed. Long polling works like this: when making a request to the long polling API, you open an HTTP connection to CouchDB until a new row appears in the changes result, and both you and CouchDB keep the HTTP connection open. As soon as a result appears, the connection is closed.
This works well for low-frequency updates. If a lot of changes occur for a client, you find yourself opening many new requests, and the usefulness of this approach over regular polling declines. Another general consequence of this technique is that for each client requesting a long polling change notification, CouchDB will have to keep an HTTP connection open. CouchDB is well capable of doing so, as it is designed to handle many concurrent requests. But you need to make sure your operating system allows CouchDB to use at least as many sockets as you have long polling clients (and a few spare for regular requests, of course).
To make a long polling request, add the feed=longpoll
query parameter. For this listing, we added timestamps to show you when things happen.
00:00: > curl -X GET "$HOST/db/_changes?feed=longpoll&since=2" 00:00: {"results":[ 00:10: {"seq":3,"id":"test3","changes":[{"rev":"1-02c6b758b08360abefc383d74ed5973d"}]} 00:10: ], 00:10: "last_seq":3}
At 00:10
, we create another document behind your back again, and CouchDB promptly sends us the change. Note that we used since=2
to avoid getting any of the previous notifications. Also note that we have to use double quotes for the curl
command because we are using an ampersand, which is a special character for our shell.
The style
option works for long polling requests just like for regular polling requests.
Networks are a tricky beast, and sometimes you don’t know whether there are no changes coming or your network connection went stale. If you add another query parameter, heartbeat=N
, where N is a number, CouchDB will send you a newline character each N milliseconds. As long as you are receiving newline characters, you know there are no new change notifications, but CouchDB is still ready to send you the next one when it occurs.
Long polling is great, but you still end up opening an HTTP request for each change notification. For web browsers, this is the only way to avoid the problems of regular polling. But web browsers are not the only client software that can be used to talk to CouchDB. If you are using Python, Ruby, Java, or any other language really, you have yet another option.
The continuous changes API allows you to receive change notifications as they come in using a single HTTP connection. You make a request to the continuous changes API and both you and CouchDB will hold the connection open “forever.” CouchDB will send you new lines for notifications when they occur and—as opposed to long polling—will keep the HTTP connection open, waiting to send the next notification.
This is great for both infrequent and frequent notifications, and it has the same consequence as long polling: you’re going to have a lot of long-living HTTP connections. But again, CouchDB easily supports these.
Use the feed=continuous
parameter to make a continuous changes API request. Following is the result, again with timestamps. At 00:10
and 00:15
, we’ll create a new document each:
00:00: > curl -X GET "$HOST/db/_changes?feed=continuous&since=3" 00:10: {"seq":4,"id":"test4","changes":[{"rev":"1-02c6b758b08360abefc383d74ed5973d"}]} 00:15: {"seq":5,"id":"test5","changes":[{"rev":"1-02c6b758b08360abefc383d74ed5973d"}]}
Note that the continuous changes API result doesn’t include a wrapping JSON object with a results member with the individual notification results as array items; it includes only a raw line per notification. Also note that the lines are no longer separated by a comma. Whereas the regular and long polling APIs result is a full valid JSON object when the HTTP request returns, the continuous changes API sends individual rows as valid JSON objects. The difference makes it easier for clients to parse the respective results. The style
and heartbeat
parameters work as expected with the continuous changes API.
The change notification API and its three modes of operation already give you a lot of options for requesting and processing changes in CouchDB. Filters for changes give you an additional level of flexibility. Let’s say the messages from our first scenario have priorities, and a user is interested only in notifications about messages with a high
priority.
Enter filters. Similar to view functions, a filter is a JavaScript function that gets stored in a design document and is later executed by CouchDB. They live in special member filters
under a name of your choice. Here is an example:
{ "_id": "_design/app", "_rev": "1-b20db05077a51944afd11dcb3a6f18f1", "filters": { "important": "function(doc, req) { if(doc.priority == 'high') { return true; } else { return false; }}" } }
To query the changes API with this filter, use the filter=designdocname/filtername
query parameter:
curl "$HOST/db/_changes?filter=app/important"
The result now includes only rows for document updates for which the filter function returns true
—in our case, where the priority
property of our document has the value high
. This is pretty neat, but CouchDB takes it up another notch.
Let’s take the initial example application where users can send messages to each other. Instead of having a database per user that acts as the inbox, we now use a single database as the inbox for all users. How can a user register for changes that represent a new message being put in her inbox?
We can make the filter function using a request parameter:
function(doc, req) { if(doc.name == req.query.name) { return true; } return false; }
If you now run a request adding a ?name=Steve
parameter, the filter function will only return result rows for documents that have the name
field set to “Steve.” If you are running a request for a different user, just change the request parameter (name=Joe
).
Now, adding a query parameter to a filtered changes request is easy. What would hinder Steve from passing in name=Joe
as the parameter and seeing Joe’s inbox? Not much. Can CouchDB help with this? We wouldn’t bring this up if it couldn’t, would we?
The req
parameter of the filter function includes a member userCtx
, the user context. This includes information about the user that has already been authenticated over HTTP earlier in the phase of the request. Specifically, req.userCtx.name
includes the username of the user who makes the filtered changes request. We can be sure that the user is who he says he is because he has been authenticated against one of the authenticating schemes in CouchDB. With this, we don’t even need the dynamic filter parameter (although it can still be useful in other situations).
If you have configured CouchDB to use authentication for requests, a user will have to make an authenticated request and the result is available in our filter function:
function(doc, req) { if(doc.name) { if(doc.name == req.userCtx.name) { return true; } } return false; }
The changes API lets you build sophisticated notification schemes useful in many scenarios with isolated and asynchronous components yet working to the same beat. In combination with replication, this API is the foundation for building distributed, highly available, and high-performance CouchDB clusters.