Skip to main content

AI Assistant

Physical AI & Humanoid Robotics

Hello! I'm your AI assistant for the AI-Native Guide to Physical AI & Humanoid Robotics. How can I help you today?

04:57 AM

Topic 3 — Sensor Simulation, Ground Truth & Validation

Digital twins are only useful if their sensors and ground truth are realistic enough to support perception and control algorithms. This topic dives deeper into camera, depth, and LiDAR simulation, explains how to extract ground truth and metrics from Gazebo and Isaac Sim, and outlines a validation workflow for comparing simulated behavior to real or expected behavior.


3.1 Camera & Depth Simulation

Camera Intrinsics

A simulated pinhole camera is defined by:

  • Focal length ( f_x, f_y ) (or field of view).
  • Principal point ( c_x, c_y ) (image center).
  • Resolution (width × height).

In Gazebo, these appear as:

  • horizontal_fov
  • <image><width>...</width><height>...</height></image>

In Isaac Sim, you control intrinsics via camera prim attributes (or through the GUI). To match a real camera (e.g., Intel RealSense):

  • Use known intrinsics from calibration.
  • Match resolution and FOV.

Realistic Artifacts

Real cameras are imperfect:

  • Lens distortion: radial and tangential distortion that bends straight lines.
  • Sensor noise: shot noise, read noise, quantization.
  • Motion blur: from moving camera or objects during exposure.
  • Rolling shutter: different rows exposed at slightly different times.

Both Gazebo (via plugins) and Isaac Sim (via camera settings and post-processing) allow you to:

  • Add noise (Gaussian or more complex).
  • Approximate distortion.
  • Simulate rolling shutter and motion blur.

Rule of thumb: Start with clean images to debug geometry and lighting, then progressively add realistic noise and artifacts until simulated images match real samples.

Depth Cameras

Depth cameras (e.g., RealSense) measure per-pixel distance. In simulation:

  • Depth is often rendered from a perfect z-buffer (Isaac Sim) or ray cast (Gazebo).
  • Real devices exhibit:
    • Greater noise at longer ranges.
    • Invalid pixels on reflective or transparent surfaces.
    • Quantization and temporal jitter.

To approximate this:

  • Add distance-dependent noise.
  • Mask out pixels based on surface properties.
  • Downsample or quantize depth maps.

3.2 LiDAR Simulation

LiDAR sensors emit beams and measure return times. In simulation:

  • Gazebo’s ray or gpu_ray sensors produce point clouds or scan messages.
  • Isaac Sim’s LiDAR sensors generate dense point clouds with intensity and multiple returns.

Key parameters:

  • Field of view (horizontal and vertical).
  • Angular resolution (degrees per step).
  • Range (min/max distance).
  • Scan rate (Hz).

Environmental Effects

Advanced simulations may model:

  • Rain/dust/fog: scattering reduces returns and adds noise.
  • Reflectance: different materials reflect differently (intensity values).
  • Motion distortion: the robot moves during a scan.

For this course, you can approximate:

  • Slight Gaussian noise on ranges.
  • Intensity variations based on material tags.
  • Optional motion distortion (important for fast-moving platforms).

3.3 Ground Truth: What the Simulator Knows

One of the biggest advantages of simulation is access to perfect ground truth that is hard or impossible to measure directly on hardware.

Types of ground truth you can extract:

  • Robot pose: position and orientation over time (world or map frame).
  • Object poses: 6D pose for any object in the scene.
  • Segmentation masks: per-pixel class or instance labels.
  • Depth maps: per-pixel distance from the camera.
  • Camera intrinsics/extrinsics: exact parameters used for rendering.

In Gazebo, you can:

  • Use world or model plugins to query entity poses each time step.
  • Publish them as ROS 2 messages (e.g., geometry_msgs/PoseStamped).

In Isaac Sim, you can:

  • Use the Python API or replicator tools to request:
    • Object poses.
    • Segmentation masks.
    • Bounding boxes (2D and 3D).

3.4 Evaluation Metrics

To validate algorithms and models, you need quantitative metrics:

  • Pose estimation:
    • Translation error (e.g., Euclidean distance between estimated and true pose).
    • Rotation error (e.g., angle between quaternions).
  • Detection:
    • Precision, recall, F1 score.
    • Average Precision (AP) over IoU thresholds.
  • Segmentation:
    • Intersection over Union (IoU) per class.
    • Mean IoU across classes.
  • SLAM / localization:
    • Absolute Trajectory Error (ATE).
    • Relative Pose Error (RPE).

Simulation lets you compute these metrics:

  1. Log algorithm outputs (estimated poses, detections, masks).
  2. Log ground truth from the simulator.
  3. Align timestamps or frame indices.
  4. Compute metrics offline (e.g., Python/NumPy/Pandas).

3.5 Validation: Comparing Simulated and Real Data

Your goal is not just to succeed in simulation; it is to predict performance on real hardware.

Validation workflow:

  1. Collect real data:
    • Record RGB-D, LiDAR, and IMU streams from your real robot in a simple environment.
    • Log ground truth if possible (motion capture, AprilTags, or careful measurements).
  2. Recreate the environment in simulation:
    • Match wall positions, floor textures, object locations.
    • Match sensor intrinsics and extrinsics.
    • Tune lighting to resemble the real scene.
  3. Collect simulated data:
    • Run the same trajectories or sensor motions in the digital twin.
    • Log simulated sensor data and ground truth.
  4. Compare:
    • Are noise statistics similar? (histograms, variance, bias).
    • Do perception algorithms perform similarly on both datasets?
    • Do trajectories and contact events look similar?

If there is a large gap:

  • Adjust noise models and physics parameters (system identification in Topic 5).
  • Introduce more domain randomization.
  • Re-run validation until metrics converge to an acceptable band.

3.6 Hands-On Lab: Digital Twin Validation for Reach-and-Grasp

In this lab, you will validate a reach-and-grasp behavior using your digital twin.

Scenario

Your humanoid’s arm must reach to and grasp a simple object (e.g., a box) on a table.

Tasks

  1. Build the scene in Gazebo (or optionally Isaac Sim):
    • Humanoid arm with gripper (URDF/SDF).
    • Table and object with appropriate collision and friction properties.
  2. Configure physics:
    • Set gravity, friction, and damping to plausible values.
    • Ensure the object does not slide unrealistically.
  3. Instrument sensors:
    • Joint state publisher (angles, velocities).
    • Gripper force/torque sensing (if available).
    • Camera and/or depth sensor observing the grasp.
  4. Plan and execute motion:
    • Use a simple planner or scripted trajectory to:
      • Reach to pre-grasp pose.
      • Close gripper.
      • Lift object slightly.
  5. Record ground truth:
    • Robot end-effector pose over time.
    • Object pose over time.
    • Contact events between gripper and object.
  6. Analyze success:
    • Did the object remain in the gripper?
    • How closely did executed trajectories follow planned ones?
    • Did sensor data match expectations (e.g., depth, contact forces)?

Success Criteria

  • Grasp success rate is effectively 100% under ideal simulated conditions.
  • No major physics artifacts (object tunneling through gripper, unrealistic bounces).
  • Ground truth logs are sufficient to compute:
    • End-effector tracking error.
    • Object displacement during grasp.
    • Time to grasp and lift.

This lab builds your intuition for what a trustworthy digital twin looks like, and sets the stage for more complex validation in navigation and whole-body motion.