Anyone who's watched that demo has noticed exactly what he said, that their demo is basically a 2D tilemap of a few repeated 3D objects, most likely to make the memory requirements of storing voxels at that level of detail feasible.
If I recall in the demo, they kept on referring to procedural generation, and so perhaps they have a few tricks up their sleeve for generating content (trees being an obvious example) that would be varied yet take up little memory.
Not with ray casting of sparse voxel octrees, which is probably what they're doing. Depending on how the procedural generation work, you only need to generate that which you render.