We are also releasing Synthetic Homes, a large-scale dataset of synthetic home interiors, and the associated dataset generator. Use cases for in-home CV are innumerable, but gathering diverse home interior data in the real world is notoriously difficult due to privacy concerns and data collection restrictions. Synthetic Homes aims to accelerate model training for such applications by providing a large dataset of varied home interiors with accurate and rich labeling, as well as a configurable dataset generator. 

We include a variety of randomizations to maximize diversity. These include materials, furniture type and configuration, sunlight angle and temperature, day/night switching, interior lighting temperature, camera angles, clutter, skybox, door and curtain animations, and more. The dataset generator gives you control over many of these elements, enabling you to tune them to your liking.

Interior lighting in homes is complex and intentional, making photorealism especially important. We used Unity’s multi-bounce path tracing to accomplish physically accurate global illumination and reflections. This accuracy can help bridge the so called “Sim2Real gap”, improving a model’s ability to perform well in the real world after training on synthetic data.

The Synthetic Homes project includes a 100,000 image dataset, a configurable dataset generator, and a notebook for data analysis. The dataset includes rich labels for semantic and instance segmentation, bounding boxes, depth, and normals. It also includes environmental information like occlusion percentage and camera position. To enable you to iterate on the data we also provide the dataset generator, where you can tweak parameters like camera positioning, blur randomization, and image size.

Source: Unity Technologies Blog