description |
---|
Come learn how to generate photorealistic images in Nvidia Replicator and build object detection model using Edge Impulse. |
Created By: George Igwegbe
Public Project Link: GitHub | Edge Impulse
Building an object detection model can be tricky since it requires a large dataset. Sometimes, data can be few or not diverse enough to train a robust model. Synthetic data offers an alternative to generating well-represented datasets to build a quality model. By applying domain randomization, we developed photorealistic datasets, trained a neural network, and validated the model using real datasets. To create a diverse dataset, we created a variety of simulated environments with randomized properties: changing lighting conditions, camera position, and material textures. We also show that synthetic, randomized datasets can help generalize a model to adapt to the real-world environment.
We wanted to replicate the object detection work by Louis Moreau, but this time using synthetic data rather than real data. The project aims to demonstrate how to build and deploy the Edge Impulse object detection model using synthetic datasets generated by Nvidia Omniverse Replicator. The Replicator is an Nvidia Omniverse extension that provides means of generating physically accurate synthetic data.
Computer vision tasks such as classification, object detection, and segmentation require a large-scale dataset. Data collected from some real-world applications tend to be narrow and less diverse, often collected from a single environment, and sometimes is unchanged and stays the same for the most time. In addition, data collected from a single field tend to have fewer examples of tail-end scenarios and rare events, and we cannot easily replicate these situations in the real world.
Andrej Karpathy's presentation - (source: Tesla AI Day, 2021) |
---|
Consequently, models trained in a single domain are brittle and often fail when deployed in another environment; thus, it requires another training cycle to adapt to the new environment. It raises the question, how can we efficiently and cheaply collect generalized data across several domains? A simple unreasonable effective solution is Domain Randomization, which varies the texture and colour of the foreground object, the background image, the number of lights in the scene, the pose of the lights, and the camera position etc. Domain randomization can further improve the variability in the texture of synthetic data of rare events generated in the simulator.
The purpose of domain randomization is to provide enough simulated variability at training time such that at test time the model is able to generalize to real-world data.” - Tobin et al, Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World, 2017
Domain Randomization for Transferring Deep Neural Networks - source: Tobin et al, 2017) |
---|
Nvidia Replicator enables us to perform Domain Randomization. The Replicator is one module within the Omniverse family, and it offers tools and workflow to generate data for various computer vision and non-visual tasks. The Replicator is a highly interoperable tool that integrates with over 40+ modelling/rendering applications across different verticals. The seamless integration is possible thanks to Pixar's Universal Scene Description(USD), which serves as a protocol for various applications such as Blender, 3DMax, Maya, Revit, C4D etc., to work with the Nvidia Replicator.
Traditional machine learning workflow is often model-centric, focusing more on the model's development by iteratively improving the algorithm design, etc. In this project, we chose the Data-centric approach, where we fixed the model and iteratively improved the quality of the generated dataset. This approach is more robust since we know our model is as good as the dataset. This method hence systematically changes the dataset performance on an AI task. At its core, it is thinking about ML in terms of data, not the model.
Data generation and model building workflow |
---|
- Nvidia Omniverse Replicator
- Edge Impulse Studio
- Logitech Webcam HD Pro - C920
Nvidia Omniverse Replicator is a computation-intensive application requiring a moderate-size GPU and decent RAM. My hardware setup consists of 32GB RAM, 1TB storage space and 8GB GPU with an Intel i9 processor.
Hardware Specification | Hardware Specification |
---|---|
The application can run on both Windows and Linux operating systems. For this experiment, we used Ubuntu 20.04 LTS distro, given Ubuntu 18.04 is no longer supported by Nvidia Omniverse as of November 2022. In addition, we selected the appropriate Nvidia driver, v510.108.03 and installed it on a Linux machine.
Software Specification | Software Specification |
---|---|
The environment for the experiment consists of movable and immovable objects (dynamic and static positioning objects). The immovable object consists of Lights, a Table and two Cameras. At the same time, the movable objects are the cutlery which is a spoon, fork and knife. We will use domain randomization to alter the properties of some of the movable and immovable objects. Assets which include objects and scenes are represented in the Replicator as USD.
Experimental Setup |
---|
Every object in Omniverse Replicator is represented as USD. A 3D model file with varying extensions such as obj, fbx, and glif can be imported into the Replicator using Nvidia Omniverse's CAD Importer extension. The extension converts the 3D files into USD. We imported our assets (Table, knife, spoon and fork) into the simulator by specifying the path of the assets.
Rectangular Light | Dome Light |
---|---|
Lightning plays a crucial role in data generation. There are different built-in lighting types in the Nvidia replicator. We choose two rectangular lights and a dome light since they give us better lighting options and capabilities for generating photorealistic images. The rectangular light emulates light generated from a panel, and the dome light lets you dynamically lighten the entire scene. We randomized some light parameters such as temperature and intensity, and both parameters were sampled from a normal distribution. In addition, the scale parameter was sampled from a uniform distribution while keeping the rotation and position of the lights fixed.
# Lightning setup for Rectangular light and Dome light
def rect_lights(num=2):
lights = rep.create.light(
light_type="rect",
temperature=rep.distribution.normal(6500, 500),
intensity=rep.distribution.normal(0, 5000),
position=(-131,150,-134),
rotation=(-90,0,0),
scale=rep.distribution.uniform(50, 100),
count=num
)
return lights.node
def dome_lights(num=1):
lights = rep.create.light(
light_type="dome",
temperature=rep.distribution.normal(6500, 500),
intensity=rep.distribution.normal(0, 1000),
position=(0,0,0),
rotation=(270,0,0),
count=num
)
return lights.node
We fixed the position and rotation, selected the tabletop materials, chose an additional Mahogany material, and alternated the material in the data generation process.
# Import and position the table object
def table():
table = rep.create.from_usd(TABLE_USD, semantics=[('class', 'table')])
with table:
rep.modify.pose(
position=(-135.39745, 0, -140.25696),
rotation=(0,-90,-90),
)
return table
To improve our dataset's quality further, we chose two cameras of different resolutions, which we strategically positioned in various locations within the scene. In addition, we varied the position of the cameras in a different version of the data generation process.
# Multiple setup cameras and attach it to render products
camera = rep.create.camera(focus_distance=focus_distance, focal_length=focal_length, position=cam_position, rotation=cam_rotation, f_stop=f_stop)
camera2 = rep.create.camera(focus_distance=focus_distance2, focal_length=focal_length2, position=cam_position2, rotation=cam_rotation, f_stop=f_stop)
# Will render 1024x1024 images and 512x512 images
render_product = rep.create.render_product(camera1, (1024, 1024))
render_product2 = rep.create.render_product(camera2, (512, 512))
Finally, for the movable objects, which include a knife, spoon and fork, we ensure that these objects can only translate within the bound of the table. So we chose a bounding position where the objects were expected to translate and rotate with the table. We sampled position and rotation from a uniform distribution while maintaining the number of movable objects generated at each iteration to be five.
# Define randomizer function for CULTERY assets.
def cutlery_props(size=5):
instances = rep.randomizer.instantiate(rep.utils.get_usd_files(current_cultery), size=size, mode='point_instance')
with instances:
rep.modify.pose(
position=rep.distribution.uniform((-212, 76.2, -187), (-62, 76.2, -94)),
rotation=rep.distribution.uniform((-90,-180, 0), (-90, 180, 0)),
)
return instances.node
At this juncture, we have instantiated all objects in our scene. We can now run the randomizer to generate 50 images at each synthetic generation cycle.
# Register randomization
with rep.trigger.on_frame(num_frames=50):
rep.randomizer.table()
rep.randomizer.rect_lights(1)
rep.randomizer.dome_lights(1)
rep.randomizer.cutlery_props(5)
# Run the simulation graph
rep.orchestrator.run()
To ensure we generated photorealistic images, we switched to RTXinteractive(Path Tracing) mode, which gave high-fidelity renderings.
Data generation process |
---|
V1 - Normal to the object |
---|
V2 - Angled to the object | V3 - Normal to the object and object suspended in space |
Generated Dataset - V2 | Generated Dataset - V3 |
Data Labeler | Data Annotation |
We uploaded the generated images to Edge Impulse Studio, where we annotated the dataset into different classes. We carefully annotated each dataset version and trained using the Yolov5 object detection model. We tried a couple of input sizes ranging from 320, 512 and 1024 pixels before settling with 320. Edge Impulse provided an excellent version control system for models, which enabled us to track model performance across different dataset versions and hyperparameters.
Create Impulse | Generate Feature |
Version Control in Edge Impulse |
---|
We used the Edge Impulse CLI tool to evaluate the model's accuracy by downloading, building and running the model locally. A Logitech C920 webcam streamed the live video of objects on a table from 50 cm to 80 cm from the camera. The position of the camera remains fixed during the experiment. The clips below show that the trained model does not generalize well to real-world objects. Thus we needed to improve the model by uploading, annotating and training the model with the V2 dataset.
V1 failure - model failed to identify objects |
---|
We observed improved model performance when trained with the V2 dataset. The model could identify various objects distinctly, although the model failed when we changed the objects' orientations. Thus, we trained the model with the remaining V3 dataset to mitigate these issues and increase other hyperparameters, such as epochs from 500 to 2000. We also tested the performance of our object detector on real objects with different background textures, and the model performed well in these conditions.
V2 success - model can identify objects |
---|
V2 failure - model failed to identify objects in different orientations |
---|
After several cycles of iterating over various hyperparameters, we got a model that generalizes well across different orientations.
V3 success - model can identify objects in different orientations |
---|
V3 success - model can identify different materials |
---|
The core idea behind the data-centric approach to solving ML problems is to create more data around the failure points of the model. We improved the model by iteratively improving the data generation, especially in areas where the model had previously failed.
In this work, we learned how the domain randomization approach helps generate quality and well-generalized datasets for the object detection task. We also demonstrated the effectiveness of data-centric machine learning workflow in improving the model performance. Although this work is restricted to visual problems, we can extend domain randomization to other sensors such as lidar, accelerometer, and ultrasonic sensors.