BLARGH!!

Error Handling in NodeJS

Error handling in Node is kind of weird. And by "kind of weird", I mean "different than other languages". The semantics of which I shan't get into, but it is my opinion that Node was designed... oddly. At least in terms of error handling. There are several non-obvious ways in which an error can bubble up or otherwise make your program crash, so here's a brief-ish rundown of what to look out for. And how to not do things.

try, catch and throw

This is the easy one. Except for the parts where it's not easy. Those are hard.

The normal Java/C#/C++/etc. way of handling errors is via the familiar try..catch construct. It behaves like you would expect:

try {
  var json = JSON.parse('nope');
} catch (e) {
  console.log(e); //[SyntaxError: Unexpected token o]
}

JSON.parse is a synchronous function, and therefore it throws errors. That's important to remember. You should never, never, never, ever throw an exception from asynchronous code. Ever. Srsly.

error events

error events are kind of stupid. They are a "special" event within Node, and they're "special" behavior is that if an error event is emitted without any listeners, it crashes the program. Let me repeat that, because repeating things is something you do when writing to help emphasize something. Or so my high school English teachers taught me.

error events without listeners crash programs.

There are no exceptions to this rule, and you can't get around it. Except by listening for error event. They are very similar to checked exceptions in Java, so those of you who sold your soul to the AbstractStrategyFactoryFactory can rest easy, as this paradigm should be familiar to you already.

Now, normally, this isn't a huge problem, as it's documented and standardized, so if you care about something emitting an error and want it to not crash your program, you can add a listener and handle it yourself. But it's not so easy.

It's not easy because it's turtles all the way down. Many userland libraries often do some sort of networking, e.g. connecting to a database. This means it has to open a socket. Usually that's all abstracted so that the library is easy to use. Often it's the reason for the library. Abstractions are supposed to make things easy (and why FactoryFactorys seem to be so abundant).

So, now you're connecting to your super awesome NoSQL CouchiakooseandradisDB database using some random dude's library, which happens to abstract a socket connection. But is that random dude's library handling the error event? If not, you'll need to dig through that dude's probably well-tested and intelligently-factored code to figure it out. And if it isn't handled, then you'll have to figure out how to get a hold of the underlying socket connection and handle the error yourself. Which kind of defeats the purpose of an abstraction. I assure you this has happened to me before. It's rather annoying.

The point is, error events can bite you in the ass, because they're hard to pinpoint, the stack trace isn't always useful and they crash your program for seemingly no reason. Be careful and be aware.

Asynchronous errors

Now the real fun. Node is all about those non-blocking system calls. It's what makes it semi-awesome. Or, at the very least, it's what makes it different. Depending on how angry you are at random dudes, it might be the worst thing since the goto statement.

First, let's distinguish between "asynchronous" and "functions that return stuff via callback". Asynchronous means it's literally not blocking execution. When you call an asynchronous function, it's going off to do something and meanwhile, the rest of your program is executing until that other thing finishes. The act of passing around a callback function does not mean that it is asynchronous. If you can remember this fact, you'll have a leg up on many random dudes. But not in that way, pervert.

The Node convention for passing errors/results in asynchronous functions is to pass a callback function as the last argument which takes two arguments: an err and a result. The err is an error, and if it's truthy, that means something broke.

fs.stat('/nonexistent/file', function(err, stat) {
  if (err) {
    console.error(err);
    return;
  }

  //do something with the stat object
});

And that's pretty much it. Always remember to handle your errors. Oftentimes the result will be null or simply undefined if an error occurs. Oh yeah. One other thing. Don't ever mix try..catch idioms with callback idioms. I'll hate you forever if you do. And it's bad practice and very confusing, and your stack trace will not be what you think it should be.

The only time an asynchronous function (or one that returns its result via a callback) should throw an error is if the arguments are bad. Those are programmer errors (e.g. compile-time errors) and should be fixed by a programmer. For example, if a function expects an object but you pass it a string, it's okay if it throws an exception in that case.

function asyncSomething(someObject, callback) {
  //okay to throw an exception in this case, although just passing back an error
  //in the callback is totally reasonable as well
  if (typeof(someObject) !== 'object') {
    throw new Error('First argument should be an object');
  }
}

A seemingly common error

I've noticed that several well-known and well-used libraries (node-redis and mongoose, specifically) fall into this trap of poor error handling. There are probably others as well, but those are the two where I've noticed this.

Basically, they have a function definition like so:

function doSomething(callback) {
  //do something asynchronously, and then...
  try {
    callback();
  } catch (e) {
    this.emit('error', e);
  }
}

Can you spot the problem?

The problem is that it's attempting to catch an error that occurs outside of its scope, and then captures that error and emits it as if it was an error that occurred inside of the library. The following code sample should demonstrate why that is bad:

doSomething(function(err) {
  //ReferenceError: foo is not defined, i.e. programmer error
  //this should crash the program, as it needs to fixed by a programmer
  foo; 
});

What happens in this situation? Suppose the library that has the doSomething function was not super important to your application, and you don't really care if it error'd. So that means you'll be handling the error event, logging it, and ignoring it. What happens when the above code is executed? Will the program crash as it should?

The answer is a quizzical "No". doSomething will catch the undefined error and "re-throw" it by emitting it as an error event. Since you are listening for those error events and ignoring them, your program will carry on as if nothing bad happened. But in fact, something terrible happened. This is an uncaught exception from the runtime's point of view, and here is what the Node docs have to say about those:

An unhandled exception means your application - and by extension node.js itself - is in an undefined state. Blindly resuming means anything could happen.

By catching every random error and emitting it as if it originated from itself, the doSomething function is potentially placing Node itself into an undefined state. This is bad.

You might be tempted to say "But why don't you just add an error event listener and crash the program yourself?" The answer is because that's a super brittle way of handling errors, e.g. doing a regex test on the error message looking for "SyntaxError".

The point is that a library should never, ever capture an error that did not originate from itself. I actually discovered this by accidentaly spelling a variable wrong and being extremely confused when the stack trace implied that it originated from Redis, when in fact it was just me being an idiot. I was even more confused when my program didn't crash. Redis was not abundantly important to my application, so if it was unavailable for whatever reason (e.g. connection issues) I didn't really care and wanted things to keep running. So I had an error event listener on Redis which logged the error and let the program keep on running. So imagine my surprise when a legitimate syntax error didn't crash the program as it should have.

In conclusion, always handle your errors. But don't handle errors that don't belong to you.

Constructors should not have side effects

I was recently reading a pretty well written article about Angular vs Backbone, and I noticed this little snippet of code:

Router = Backbone.Router.extend({
    routes: {
        '': 'phonesIndex',
    },
 
    phonesIndex: function () {
        new PhonesIndexView({ el: 'section#main' });
    }
});

Naturally the lack of a var keyword is generally a really bad thing (particularly in a tutorial), but whatever. Maybe he's one of those weird, archaic people like Crockford who still insist on putting all of their var declarations at the top of a function. Like a caveman.

But I digress.

The real annoying thing about that code snippet is the constructor that does nothing. Or rather, the constructor that does too much. That kind of code really only seems to crop up in JavaScript and PHP (from what I've seen), and I'm not completely sure if that correlation means something. I've also occasionally seen property accessors in C♯ that have side effects, which is not quite as terrible but still really stupid.

Constructors are eponymous in nature, meaning that their only job is to construct things. I mean, it's not even confusing, so it's pretty weird when people get it wrong.

If you read the next paragraph of the linked article, you'll see the following snippet, which explains the reason why nothing is done with the constructed object.

PhonesIndexView = Backbone.View.extend({
 
    initialize: function () {
        this.render();
    },
 
    render: function () {
        this.$el.html(JST['phones/index']());
    }
});

In Backbone, the initialize property doubles as a constructor. It's kind of weird, but it's how a lot of libraries handle classical inheritance. Basically, when new PhonesIndexView() is executed it internally calls PhonesIndexView.initialize(). So effectively initialize is a constructor.

The astute reader will also notice the this.render() call inside initialize, which, not surprisingly, renders the view onto the page. Hence the reason for not doing anything after constructing the object: the constructor does all the relevant work.

There are a few objective reasons why this is bad:

  1. It violates the purity of a constructor.
  2. It makes the code harder to read and understand.

Purity

A pure function or method is one that doesn't have side effects. So if your function does modify state (e.g. add or delete something), then your function is no longer pure. Note that it's not code smell if a function is not pure; in fact, it's kind of impossible (or at least extremely annoying) to write completely pure code.

Constructors, however, should always be pure. They shouldn't be establishing network connections (e.g. connecting to a database), or modifying state (e.g. rendering a view); they should just declare some local variables and exit. Methods and functions on the constructed object should be the ones doing things, not the constructor.

Take this example: which code do you like better?

Pure constructor

function Database(credentials) {
  this.username = credentials.username;
  this.password = credentials.password;
  this.conn = null;
}

Database.prototype.connect = function() {
  if (!this.conn) {
    this.conn = someDatabaseLibrary.connect(this.username, this.password);
  }
};

var db = new Database({
  username: 'tmont',
  password: 'heartbleed'
});

try {
  db.connect();
} catch (e) {
  console.error('failed to connect');
}

Constructor with side effects

function Database(credentials) {
  this.username = credentials.username;
  this.password = credentials.password;
  this.conn = someDatabaseLibrary.connect(this.username, this.password);
}

try {
  var db = new Database({
    username: 'tmont',
    password: 'heartbleed'
  });
} catch (e) {
  console.error('failed to connect');
}

The difference between the two is that the connection in the second example occurs implicitly. The constructor has side effects: it's attempting to connect to the database server. This makes it harder to pinpoint errors (say the db server is down) because too many things occur by magic. If you explicitly call connect(), and an exception is thrown, it's not too hard to deduce there is a problem with trying to connect. You wouldn't expect merely invoking a constructor to give a connection error.

And if one side effect occurs in the constructor, why not more? May as well do some logging while we're in there (hopefully the log file is writable and the disk isn't full, or there will be more exceptions), or send an email notification (hopefully the email server is up!).

And suddenly you've got code that looks like this:

function Database(credentials) {
  this.username = credentials.username;
  this.password = credentials.password;
  this.logger = Logger.get('myApp');

  this.logger.debug('attempting to connect to db with username ' + this.username);
  this.conn = someDatabaseLibrary.connect(this.username, this.password);
  this.logger.debug('successfully connected to db');

  var connections = parseInt(this.conn.get('dbstats:connect')) + 1;
  this.conn.set('dbstats:connect', connections);

  if (connections % 100 === 0) {
    var emailer = new Emailer('emailserver.com:25');
    emailer.send(
      'tmont@tmont.com', 
      'Database connection update', 
      'Connected to db server ' + connections + ' times'
    );
  }
}

There are so many areas of possible failure that it's impossible to keep it straight in your head. Particularly if you have a codebase full of code like this.

The Single Responsibility Principle is usually in reference to classes: each class should have a single responsibility. But functions and methods should also follow that rule. If you have a method called connect() it shouldn't do anything but connect.

JavaScript is kind of a weird entity, since it's not classical, but is often written classically for larger applications. For this reason, large JavaScript code bases can get out of hand very quickly and turn into giant piles of spaghetti if you're not careful. Following best practices and design/architecture patterns can alleviate a lot of mess with very little effort.

A Convenient Way to Write a jQuery Plugin

I've written quite a few jQuery plugins in my day. The early ones were horrible mishmashes of spaghetti code interlaced with $.fn. Back then I didn't even know $.fn was supposed to be. I thought fn was just some magic string that did something magical, and then you suddenly had written a jQuery plugin!

Since then, for decently sophisticated plugins (i.e. ones that create state) I've started constructing them in a more consistent manner. More specifically, I totally ripped off the format that the Bootstrap authors used for their plugins, like $('selector').button('reset'). I had to hack the button plugin because for some reason it put every action in the event loop with setTimeout and I couldn't rely on things occurring in the proper order. As I was cursing the authors out for doing something so annoying, I was simultaneously praising them for writing their plugins in a way that was consistent and easy to work with. Even if they don't use semicolons. The heathen bastards.

Before I get to the actual format, I want to emphasize that for simple plugins, doing things like this is fine (and probably preferable):

$.fn.redOrBlueLol = function() {
  return this.each(function() {
    var $this = $(this);
    $this.css('color', $this.css('color') === 'red' ? 'blue' : 'red');
  });
};

If all you're doing is some minor DOM manipulation, then you don't really need to worry about writing your plugin in some consistent format.

The Format

First, you have your stateful object and its prototype:

var ns = 'my-plugin';
function MyStatefulObject($element, options) {
  this.$element = $element;
  this.doStuff = !!options.doStuff;
}

MyStatefulObject.prototype = {
  helloWorld: function(name) {
    name = name || 'world';
    this.$element.text('Hello ' + name + '!');
  },
  
  destroy: function() {
    //stuff to tear down all the mess you've made
    this.$element.off('.' + ns);
  }
};

Then the actual definition of the plugin:

$.fn.myPlugin = function(options) {
  options = options || {};
  return this.each(function() {
    var $element = $(this),
        thing = $element.data(ns),
        method = typeof(options) === 'string' ? options : '';
    
    if (!thing) {
      var realOptions = !options || typeof(options) !== 'object' ? {} : options;
      $element.data(ns, (thing = new MyStatefulObject($element, realOptions));
    }
    
    if (typeof(thing[method]) === 'function') {
      thing[method].apply(thing, [].slice.call(arguments, 1));
    }
  });
};

And that's it. Now you can use your plugin as such:

$('selector').myPlugin('helloWorld');
$('.billy').myPlugin('helloWorld', 'Billy');
$('selector, .billy').myPlugin('destroy');

You can see an example of this in action in an audio player plugin I'v been working lately: jquery.rach3.js

And that's it. Nothing very complicated or magical about it. Just convenient.

Uploading to S3 in Bash

There are already a couple of ways to do this using a 3rd party library, but I didn't really feel like including and sourcing several hundred lines of code just to run a CURL command. So here's how you can upload a file to S3 using the REST API.

This example uploads a gzipped tarball; you'll need to adjust the content-type accordingly. And obviously use a real API key and secret.

file=/path/to/file/to/upload.tar.gz
bucket=your-bucket
resource="/${bucket}/${file}"
contentType="application/x-compressed-tar"
dateValue=`date -R`
stringToSign="PUT\n\n${contentType}\n${dateValue}\n${resource}"
s3Key=xxxxxxxxxxxxxxxxxxxx
s3Secret=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
signature=`echo -en ${stringToSign} | openssl sha1 -hmac ${s3Secret} -binary | base64`
curl -X PUT -T "${file}" \
  -H "Host: ${bucket}.s3.amazonaws.com" \
  -H "Date: ${dateValue}" \
  -H "Content-Type: ${contentType}" \
  -H "Authorization: AWS ${s3Key}:${signature}" \
  https://${bucket}.s3.amazonaws.com/${file}

As someone who isn't abundantly talented at writing shell scripts, the tricky part was finding the -e option for echo, which makes it handle character escapes (e.g. \n). It's kind of annoyingly complex to actually have a newline character in a string in bash.

Anyway, this little snippet is suitable for running as a cron job or just a one-off from the shell. Note that if you want to add other amazon-specific headers (such as setting permissions) you'll need to manually add those to stringToSign since they need to be part of the authorization signature.

Backup Script

The reason I needed to figure this out was that I wanted to run a backup script that uploaded stuff to an S3 bucket. I run this in a cron job once a week. It backs up a Git server, a MySQL database and some nginx configuration files. It's just a real-world example of how to upload to S3 from the shell.

#!/bin/bash

cd /tmp
rm -rf backup
mkdir backup
cd backup

mkdir sql && cd sql
databases=`echo 'show databases;' | mysql -u backup | tail -n +2 | grep -v _schema | grep -v mysql`
for database in $databases
do
    mysqldump -u backup --databases $database > "${database}.sql"
done

cd ..
mkdir nginx && cd nginx
cp -R /etc/nginx/sites-enabled .
cp /etc/nginx/nginx.conf .

cd ..
mkdir git && cd git
repos=`ls -1 /home/git | grep '.git$'`
for repo in $repos; do
    cp -R "/home/git/${repo}" .
done    

cd ..
date=`date +%Y%m%d`
bucket=my-bucket
for dir in git nginx sql; do
    file="${date}-${dir}.tar.gz"
    cd $dir && tar czf $file *
    resource="/${bucket}/${file}"
    contentType="application/x-compressed-tar"
    dateValue=`date -R`
    stringToSign="PUT\n\n${contentType}\n${dateValue}\n${resource}"
    s3Key=xxxxxxxxxxxxxxxxxxxx
    s3Secret=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    signature=`echo -en ${stringToSign} | openssl sha1 -hmac ${s3Secret} -binary | base64`
    curl -X PUT -T "${file}" \
        -H "Host: ${bucket}.s3.amazonaws.com" \
        -H "Date: ${dateValue}" \
        -H "Content-Type: ${contentType}" \
        -H "Authorization: AWS ${s3Key}:${signature}" \
        https://${bucket}.s3.amazonaws.com/${file}
    cd ..
done

cd
rm -rf /tmp/backup
Writing to the Syslog with Winston

Update: this code is now in its own GitHub repository.

Many log aggregators (such as Papertrail or Loggly) have the ability to parse data from the syslog. This is often more convenient than writing to a file, handling rotation of said log file, handling archival of said rotated log file, and a myriad of other annoyances. Plus, many other system programs (ssh, cron, etc.) already log stuff to the syslog, so most things are already aggregated.

I use Winston as my logging library for Node. It works pretty well inasmuch as it lets you log stuff. Winston exposes a Transport object, which is an interface to a logging destination (a file, the console, a webhook, etc.). You can use this to log to the syslog.

Note that there are other modules for Winston that expose a syslog transport, but they all made me nervous, in that they weren't tested, and after glancing through the code, I wasn't convinced they were doing proper socket management. And the last thing I wanted was my logging library to create hundreds of undisposed file descriptors. So I decided to use node-posix and just use the OS to write to the syslog rather than UDP. Plus syslogd doesn't have datagram support enabled by default, so you need to tweak the configuration to make it accept UDP packets, and I don't know enough about syslog to feel confident in messing with the configuration. And also I didn't feel like spending an hour learning about it. Because I'm weak and stupid.

Anyway, here's some sample code to write to the syslog using the OS's POSIX functions.

var winston = require('winston'),
	util = require('util'),
	posix = require('posix');

function SyslogTransport(options) {
	options = options || {};
	winston.Transport.call(this, options);
	this.id = options.id || process.title;
	this.facility = options.facility || 'local0';
	this.showPid = !!options.showPid;
}

util.inherits(SyslogTransport, winston.Transport);

util._extend(SyslogTransport.prototype, {
	name: 'syslog',
	log: function(level, msg, meta, callback) {
		if (this.silent) {
			callback(null, true);
			return;
		}

		if (level === 'error') {
			level = 'err';
		} else if (level === 'warn') {
			level = 'warning';
		}

		var message = '[' + level + '] ' + msg;
		if (typeof(meta) === 'string') {
			message += ' ' + meta;
		} else if (meta && typeof(meta) === 'object' && Object.keys(meta).length > 0) {
			message += ' ' + util.inspect(meta, false, null, false);
		}

		message = message.replace(/\u001b\[(\d+(;\d+)*)?m/g, '');

		var options = {
			cons: true,
			pid: this.showPid
		};
		posix.openlog(this.id, options, this.facility);
		posix.syslog(level, message);
		posix.closelog();

		callback(null, true);
	}
});

module.exports = SyslogTransport;

And you could use it like so:

var transport = new SyslogTransport({
	level: 'debug',
	id: 'hello syslog',
	facility: 'user',
	showPid: true
});

var log = new winston.Logger({
	level: 'debug',
	transports: [ transport ]
});

log.info('hello world');

And then if you run tail /var/log/syslog (/var/log/messages on OS X, I think), you should see something like this:

Dec 23 12:01:50 mesia hello syslog[17605]: [info] hello world