The difference in glass is noticeable.
However there is been a surprising side effect: the new version is also faster (check the samples/sec above). I would have expected to see the new version slightly slower due to some small synchronization required. This may be explained with a better cache coherency of the new version. Anyway, I'm not going to complain

The only unknown now is to see how it will work in OpenCL (the above results are on CPU). The code is on "new_samplers" branch.
P.S. at this point I may work also on a new version of the TILEPATH sampler to use Sobol sequence instead of stratification + RANDOM.