Add a 2 pass implementation of this op. This uses Siarhei Siamashka's idea mentioned here: http://lists.freedesktop.org/archives/pixman/2011-April/001156.html