MIT uses shadows to model 3D scenes, including occluded objects

24 Jun 2024

Partnership with Meta creates new vision technique could enable smart cars to “see through traffic”.

Shadows analyzed to model 3D scenes including objects blocked from view.

Researchers from MIT and Meta have developed a computer vision technique that could enable an autonomous vehicle to perceive their wider surroundings – including right through nearby vehicles.

The partners have introduced a method that creates physically accurate, 3D models of an entire scene, including areas blocked from view, using images from a single camera position. The technique uses shadows to determine what lies in obstructed portions of the scene.

They call their approach PlatoNeRF, based on Plato’s allegory of the cave, a passage from the Greek philosopher’s Republic in which prisoners chained in a cave discern the reality of the outside world based on shadows cast on the cave wall.

By combining lidar technology with machine learning, PlatoNeRF can generate more accurate reconstructions of 3D geometry than some existing AI techniques, say the team. Additionally, PlatoNeRF is better at smoothly reconstructing scenes where shadows are hard to see, such as those with high ambient light or dark backgrounds.

In addition to improving the safety of autonomous vehicles, PlatoNeRF could make AR/VR headsets more efficient by enabling a user to model the geometry of a room without the need to walk around taking measurements. It could also help warehouse robots find items in cluttered environments faster.

“Our key idea was taking these two things that have been done in different disciplines before and pulling them together — multibounce lidar and machine learning. It turns out that when you bring these two together, that is when you find a lot of new opportunities to explore and get the best of both worlds,” commented Tzofi Klinghoffer, an MIT graduate student in media arts and sciences, affiliate of the MIT Media Lab.

Klinghoffer is also lead author of a CVPR paper on PlatoNeRF. The research was presented last week at the Conference on Computer Vision and Pattern Recognition in Seattle, WA.

Shedding light on the problem

Reconstructing a full 3D scene from one camera viewpoint is a complex problem. Some machine-learning approaches employ generative AI models that try to guess what lies in the occluded regions, but these models can hallucinate objects that are not really there. Other approaches attempt to infer the shapes of hidden objects using shadows in a color image, but these methods can struggle when shadows are hard to see.

For PlatoNeRF, the MIT researchers built off these approaches using a new sensing modality called single-photon lidar. The researchers use a single-photon lidar to illuminate a target point in the scene. Some light bounces off that point and returns directly to the sensor. However, most of the light scatters and bounces off other objects before returning to the sensor. PlatoNeRF relies on these second bounces of light.

By calculating how long it takes light to bounce twice and then return to the lidar sensor, PlatoNeRF captures additional information about the scene, including depth. The second bounce of light also contains information about shadows.

The system traces the secondary rays of light — those that bounce off the target point to other points in the scene — to determine which points lie in shadow (due to an absence of light). Based on the location of these shadows, PlatoNeRF can infer the geometry of hidden objects. The lidar sequentially illuminates 16 points, capturing multiple images that are used to reconstruct the entire 3D scene.

“Every time we illuminate a point in the scene, we are creating new shadows. Because we have all these different illumination sources, we have a lot of light rays shooting around, so we are carving out the region that is occluded and lies beyond the visible eye,” said Klinghoffer.

In the future, the researchers want to try tracking more than two bounces of light to see how that could improve scene reconstructions. In addition, they are interested in applying more deep learning techniques and combining PlatoNeRF with color image measurements to capture texture information.