Intro to Closures in PHP

PHP 5.3 introduced closures. It's already been pointed out that the Closure class in PHP is not necessarily a closure. Lexical scope is optional in PHP's implementation, which is a little goofy looking:

//lambda ftw
$lambda = function ($x) {
  echo 'I am an anonymous function ftw.' . "\n" . $x;
};

//closure lol
$closure = function ($x) use ($lambda) {
  $lambda('Actually, I am an anonymous function lexically bound in a closure ' . $x '.');
}

$closure('lol');

//I am an anonymous function ftw.
//Actually, I am an anonymous function lexically bound in a closure lol.

It's all very academic.

Object-Oriented Closures

Creating a closure inside a class is easy enough… until the desideratum is a closure around $this. It's time for some JavaScript-style naming conventions!

class Foo {
  public function bar() {
    $closure = function() use ($this) {}; //syntax error: Cannot use $this as lexical variable
  }
}

This yields a syntax error in the form of Cannot use $this as lexical variable. That is truly misfortunate. Fortunately, we can reach into our bag of JavaScript tricks and do something ridiculous:

class Foo {
  public function bar() {
    $that = $this; //lulz
    $closure = function() use ($that) {
      $that->baz = "I was set by " . __FUNCTION__;
    }

    $closure();
    var_dump($this);
  }
}

$x = new Foo();
$x->bar();

/*
object(Foo)#1 (1) {
  ["baz"]=>
  string(22) "I was set by {closure}"
}
*/

True to PHP's nature, it's all magic and voodoo. (ab)Use with discretion.

A real world example

So, I was sitting there the other day, porting ASP.NET MVC to PHP, and I ran across this class:

internal sealed class ControllerDescriptorCache : ReaderWriterCache<Type, ControllerDescriptor> {
    public ControllerDescriptorCache() {
    }

    public ControllerDescriptor GetDescriptor(Type controllerType) {
        return FetchOrCreateItem(controllerType, () => new ReflectedControllerDescriptor(controllerType));
    }
}

Now, the significant part of this class actually takes part in the base class, ReaderWriterCache, which provides a thread-safe object cache. Basically. In this context, that's really all it's doing. The FetchOrCreateItem() method either retrieves an already-existing item in the cache, or creates it, stores it in the cache keyed to the specified type, and then returns it. Simple enough.

Now, thread safety is not a concern in PHP. Well, it's of grave concern if you're writing an extension, or working on PHP itself, but if you're writing PHP code, you need not concern yourself with thread safety. The point is that this ReaderWriterCache is thoroughly useless in the context of PHP, so I needed to decide what to do with it. It turns out that I actually need it, but I already have a Collection class that does typesafe object storage, which is really all I need. The way it works is that it takes a Closure as a constructor parameter, and uses that for validation: if it returns true, the object can be stored in the collection.

//a collection of stdClasses. how useful.
$stdClassCollection = new \PhpMvc\Framework\Collection(function($value) {
  return $value instanceof stdClass;
});

Now, how to implement a generic object cache? Well, Collection is already generic (as generic as a dynamically typed language gets, anyway), so really we just need an implementation of FetchOrCreateItem() and we're home free. But, since we're awesome, and this post is all about closures, we want to use closures. Also, if you dig a little deeper into the MVC source, you'll see the significance of the lambda expression in the second argument given to FetchOrCreateItem() (hint: look at line 15):

protected TValue FetchOrCreateItem(TKey key, Func<TValue> creator) {
    // first, see if the item already exists in the cache
    _rwLock.AcquireReaderLock(Timeout.Infinite);
    try {
        TValue existingEntry;
        if (_cache.TryGetValue(key, out existingEntry)) {
            return existingEntry;
        }
    }
    finally {
        _rwLock.ReleaseReaderLock();
    }

    // insert the new item into the cache
    TValue newEntry = creator(); //OH HAI! I CAN HAZ LAMDUH!
    _rwLock.AcquireWriterLock(Timeout.Infinite);
    try {
        TValue existingEntry;
        if (_cache.TryGetValue(key, out existingEntry)) {
            // another thread already inserted an item, so use that one
            return existingEntry;
        }

        _cache[key] = newEntry;
        return newEntry;
    }
    finally {
        _rwLock.ReleaseWriterLock();
    }
}

The reason for a lambda expression instead of just passing in the object instance is because it's a cache: we don't want a bunch of object instances floating around, so we're waiting until the last possible moment to create it. Notice that first we check if the object exists in the cache before creating it; you can't have that foreknowledge (well, you could, but lambda expressions are fun, and I'm assuming that's the reason why this method is needlessly complicated).

So, let's port it to PHP.

class ObjectCache extends Collection {
	
	private $creator;
	
	public function __construct(Closure $creator, Closure $validator = null) {
		parent::__construct($validator);
		$this->creator = $creator;
	}
	
	public function offsetGet($key) {
		if (!$this->offsetExists($key)) {
			$this[$key] = call_user_func($this->creator, $key);
		}
		
		return parent::offsetGet($key);
	}
	
}

There's some serious magic going on here. I love it. First, recall that Collection implements the ArrayAccess interface, which means we can use array-style indexers on it. What happens behind the scenes when you do that is that it calls the underlying offset[Get|Set|Unset|Exists]() methods. So we override offsetGet() and either return the object if it already exists in the cache, or create it, store it, and then return it. Let's see it in action:

$cache = new ObjectCache(
  function ($key) { 
    return new \ReflectionClass($key); //when adding an object to the cache, this function creates that object
  }, 
  function ($value) { 
    return $value instanceof \ReflectionClass; //only allow instances of ReflectionClass in our collection
  }
);
$stdclass = $cache['stdClass'];
var_dump($stdclass);
var_dump($stdclass === $cache['stdClass']);

/*
object(ReflectionClass)#4 (1) {
  ["name"]=>
  string(8) "stdClass"
}
bool(true)
*/

That triple equals, to those of you who forgot, means the object instances are identical, i.e. they refer to the same space in memory. Mostly what it means is that our object cache works.

Making the Object Cache Better

So, we can still set stuff in our object cache, since I didn't override offsetSet(). Doing something like this will totally bypass the caching mechanism:

$cache['foo'] = new ReflectionClass('stdClass');

That's not really very good. There are a couple ways to short circuit that:

  1. Abuse the hell out of debug_backtrace() and fish the calling scope out of the stack frame (awesome)
  2. Make offsetSet() throw an exception every time it's called, and set the data another way

Well, it turns out that in my implementation #2 is not even possible, because the internal storage in Collection is a private variable not available to derived classes. What this means is that the only way to add an object to the collection is via offsetSet(). But there are ways around that, with a little refactoring. Since both #1 and #2 sound fun, let's do both.

#1: Stack Frame Rape

public function offsetSet($key, $value) {
	//this method is only allowed to be called from within an offsetGet() in this ObjectCache instance
	$fail = true;
	$stackFrame = debug_backtrace(true);
	foreach ($stackFrame as $frame) {
		if (isset($frame['object']) && $frame['object'] === $this && $frame['function'] === 'offsetGet') {
			//called from $this->offsetGet(), that's okay
			$fail = false;
			break;
		}
	}
	
	if ($fail) {
		if (count($stackFrame) <= 1) {
			//called without scope
			$type = '{main}';
		} else {
			//the offending stack frame is the last one
			if (isset($frame['class'])) {
				//object scope
				$type = $frame['class'] . $frame['type'] . $frame['function'] . '()';
			} else if (substr($frame['function'], -9) === '{closure}') {
				$type = 'a closure';
			} else if (substr($frame['function'], -13) === '__lambda_func') {
				$type = 'an anonymous function';
			} else {
				$type = $frame['function'] . '()';
			}
		}

		throw new \LogicException(sprintf('You tried to set an object in the cache from %s, plz stop.', $type));
	}
	
	parent::offsetSet($key, $value);
}

I've never felt so dirty. This actually works, too.

$cache = new ObjectCache(
  function ($key) {
    return new \ReflectionClass($key); 
  }, 
  function ($value) { 
    return $value instanceof \ReflectionClass; 
  }
);
$stdclass = $cache['stdClass']; //ok
var_dump($stdclass);
var_dump($stdclass === $cache['stdClass']);

class Foo {
	function bar(ObjectCache $cache) {
		$cache['foo'] = new \ReflectionClass('ReflectionClass');
	}
}

$foo = new Foo();
$foo->bar($cache); //LogicException: You tried to set an object in the cache from PhpMvc\Descriptor\Foo->bar(), plz stop.

$cache['foo'] = new \ReflectionClass('ReflectionClass'); //LogicException: You tried to set an object in the cache from {main}, plz stop.

$lambda = function() use ($cache) {
	$cache['foo'] = new \ReflectionClass('ReflectionClass');
};

$lambda(); //LogicException: You tried to set an object in the cache from a closure, plz stop.

function func(ObjectCache $cache) {
	$cache['foo'] = new \ReflectionClass('ReflectionClass');
}

func($cache); //LogicException: You tried to set an object in the cache from PhpMvc\Descriptor\func(), plz stop.

$func = create_function('$cache', '$cache[\'foo\'] = new \ReflectionClass(\'ReflectionClass\');');
$func($cache); //LogicException: You tried to set an object in the cache from an anonymous function, plz stop.
Let's just forget we ever saw that...

#2: Override offsetSet()

Like I mentioned earlier, this will require a bit of refactoring, because Collection hides the internal data storage from derived classes.

class Collection implements ArrayAccess, Countable, Iterator {
	
	private $validator;
	private $data;
	private $index;
	private $numericIndex;
	
	public function __construct(Closure $validator = null) {
		$this->validator = $validator ?: function($value) { return true; };
		$this->index = 0;
		$this->numericIndex = 0;
		$this->data = array();
	}

	//snip

	protected function valueIsValid($value) {
		return call_user_func($this->validator, $value);
	}
	
	public function offsetExists($key) {
		return array_key_exists($key, $this->data);
	}
	
	public function offsetGet($key) {
		if (!$this->offsetExists($key)) {
			throw new OutOfBoundsException('The key "' . $key . '" does not exist in the collection');
		}
		
		return $this->data[$key];
	}
	
	public function offsetSet($key, $value) {
		if (!$this->valueIsValid($value)) {
			throw new InvalidArgumentException('This collection does not allow values of type ' . (is_object($value) ? get_class($value) : gettype($value)));
		}
		
		//instead of $collection[] = $value, do $collection[null] = $value
		if ($key === null) {
			$key = $this->numericIndex++;
		}
		
		$this->data[$key] = $value;
	}
	
	public function offsetUnset($key) {
		unset($this->data[$key]);
	}

	//snip
}

What we can do is factor out the actual setting of the data into a protected function that offsetSet() and all derived classes can use.

protected final function set($key, $value) {
	if (!$this->valueIsValid($value)) {
		throw new InvalidArgumentException('This collection does not allow values of type ' . (is_object($value) ? get_class($value) : gettype($value)));
	}
	
	//instead of $collection[] = $value, do $collection[null] = $value
	if ($key === null) {
		$key = $this->numericIndex++;
	}
	
	$this->data[$key] = $value;
}

public function offsetSet($key, $value) {
	$this->set($key, $value);
}

And then in ObjectCache, we just call set() from offsetGet(), and throw an exception if someone tries to call offsetSet(). While we're at it, let's make a get() method so we can avoid the call super anti pattern as much as possible. Now our get and set methods in ObjectCache are much cleaner:

public function offsetGet($key) {
	if (!$this->offsetExists($key)) {
		$this->set($key, call_user_func($this->creator, $key));
	}
	
	return $this->get($key);
}

public function offsetSet($key, $value) {
	throw new LogicException('Cannot set objects in the object cache');
}

In Conclusion

Closures are fun. Learn to love them.