As we see more and more about virtual reality in the media, we also see apps claiming they are the next big thing in VR. As with so many technologies though, it seems that some apps are “more” VR than others. What’s the difference and how can you be sure of what you’re getting?
Since VR Voyaging isn’t a gaming-focused site, we’ll focus our discussion on experiential apps that are location-based (visiting Pompeii or Stonehenge), educational, and/or related to cultural heritage. The promise of VR is to put you in the middle of a new experience where you feel immersed in an environment and can feel more like you are actually part of the action.
To really be considered VR, I want to feel as if I’ve stepped into another world. You aren’t going to be fooled by the realism in VR yet, but it helps when you can move around and see things from every viewpoint, as naturally as possible.
Capturing VR scenes
In the end, there are several techniques to capture scenes for virtual reality. The realism is usually proportional to how well developers implement them. We’ll go over each of these, from least immersive to most immersive.
Flat images and videos (Least immersive)
Images are a nice window into a scene, but they don’t provide much in the way of immersion. Viewing them through a headset is slightly better than viewing them in an album in the real world, but it’s definitely still a photo. Ignoring completely flat images that we’ve all taken with phones and cameras, there are a few ways to make images more immersive. Keep in mind that images include photos, but also computer-generated scenes.
I use “flat” as part of the description since all of these formats are actually flat. Normal videos are comprised of flat image frames. Spherical and 3D images are also flat, but they get stretched to shape when you view them. Think about how a spherical globe becomes a flat map. It’s a similar principal.
The above photo is a single frame from a spherical video. Drag your mouse around to see the image be reprojected from different points of view. Click the VR button to be inside of it in your headset. As you move your mouse or look around, it adjusts the view based on where you look. The actual photo is just a flat rectangle though, as you can see below.
Spherical (360°) 2D Images
These can be quite nice, but it’s like standing inside of a sphere with an image painted or projected around you. The biggest issue with these images is that you can’t move your body around. At all. You can turn your head in-place to see every direction, but absolutely no lateral movement since the image is one single point-of-view. It’s easy to feel slightly nauseated if you do shift a bit since your eyes and inner ear don’t agree.
There’s also the fact that each eye is seeing the same thing, so there’s no depth. Even though spherical is better than flat, it’s hard to call this “VR.”
Pros: Spherical photos can look pretty great. You can view them in your web browser by dragging your mouse around (think Google Street View). From a headset, you can simply turn your head around to see every angle. Creating them is fairly easy, simply requiring a two-sided camera with fish-eye lenses. More expensive cameras mean better quality, but you can get started without spending too much.
Cons: The scale is often way off with odd distortion and seams due to the way the two or more photos are processed (stitched). This can be mitigated somewhat with more expensive equipment, but it’s also a matter of choosing the right location to capture.
Half-dome (180°) 3D
A few years ago, Google announced a new format called VR180. This format uses two lenses to create a 3D photo, but instead of being flat or spherical it captures everything in front of, above, and around the camera (nothing behind) in 180°.
Pros and Cons: These images have the potential to look better with less distortion since they don’t require so many lenses. As with spherical 2D, half-dome 3D only requires two fisheye lenses. I find these images to provide pretty good clarity and comfort in a VR headset, but once again, don’t move your head!
Spherical (360°) 3D
Taking the best of spherical imagery and 3D, this format adds the second dimension, so each eye gets its own spherical view. This requires much more expensive equipment to capture, but the image quality and depth can really improve the end result.
Pros: Spherical with depth can provide some very nice imagery for a given angle.
Cons: Since the camera rig has a certain spacing between the lenses which likely won’t line up with your own eyes’ spacing (IPD), there’s likely to be some strain. Just like with the 2D spheres, scale remains challenging, and most importantly, you still can’t move your head laterally.
Videos (180°/360°, 2D/3D)
Here we’ll just repeat the above section but add motion to the images! Sadly, I see a large amount of content marked as “VR” that’s just 2D or 3D spherical videos. This is simply not VR. Yes, you can watch it in a headset, but you can watch Netflix videos in VR too (on a virtual flat screen). VR implies an immersive experience. Immersion requires that you can move around within the scene. Just because you can turn your head doesn’t make it virtual reality. Basically, if you can find it on YouTube, it’s not really VR.
Pros: You can feel like you’re in the middle of a scene when it unfolds around you in every direction. Apps like Ecosphere make use of 3D spherical video to great effect.
Cons: Now you can feel nauseous even without turning your head! The distortion is present in every frame, and especially if the scale is off, it is often a strain to view this video.
Artistic hand-modeling (More immersive)
I struggled with what to call this one. You could say CG art or “computer generated”, except that could imply that a person isn’t involved, which is definitely not true. Typically, an artist works in special software to create 3D models (shapes) and 2D textures (surfaces). So, an artist could create a marble block by first making a rectangular model shape, then painting a marble pattern texture in a paint program, and finally wrapping the texture onto the model (like wrapping paper). This marble texture could be painted or even taken from a photograph of marble for an ultra-realistic look.
By building blocks, columns, and tiles, you can assemble a full scene. It might not look completely real, but it can look pretty convincing depending on time, talent, and resources.
Pros: We’ll all seen incredible computer-generated imagery using artistic models. From movies to games to immersive environments, it’s amazing how good it can look.
Cons: It takes a lot of talent to achieve realism with hand-modeled computer graphics. With large enough budgets, you can do amazing things, but not every small studio can do that. Worse still, computers and standalone headsets like the Meta Quest 2 have limited resources to render scenes so there’s always a trade-off between realism and performance.
Photogrammetry/volumetric capture (Most immersive)
The best way to capture all the details in a scene is to capture everything from every angle. Yes, every angle. This sounds ridiculous on the surface, but it’s actually a fairly common method of reproducing a scene in its entirety. You can read more about it in our article all about photogrammetry.
Pros: Very high fidelity. You can look around and see things from every angle. Unlike hand-modeled graphics which can often look too “clean” and generic, every last speck, crack, and stain is preserved.
Cons: Even though you save the time needed to create all the elements of the scene from scratch, time is spent taking the photos, processing the photos (using specialized software), cleaning up the scene (looking for holes or other problems), and optimizing it for headsets. Just like with hand-modeled graphics, it can require a powerful computer/headset to render everything at full detail.
Finding the best experiences
There are only so many experiences out there to choose from, so you may or may not have much choice for a given subject. If you really want to feel immersed in a scene and you do have choices, you’ll want either computer generated or photogrammetry/volumetric capture. Apps often don’t make this clear, so you may not even realize you’re just getting a video rather than something you can explore. Look for tags like “360 Experience” or “360 Video” if you want to avoid these.
Some apps will make use of 360 videos in some parts, but fully immersive scenes in others, so it’s not necessarily a dichotomy. Videos may work fine for things like watching a ceremony, while exploring a temple in a natural manner needs an immersive rendered scene. Hopefully being aware of the different techniques will help you to get the types of experiences you’ll enjoy!