Skip to main content

AI Assistant

Physical AI & Humanoid Robotics

Hello! I'm your AI assistant for the AI-Native Guide to Physical AI & Humanoid Robotics. How can I help you today?

04:57 AM

Topic 5 — Integrating Perception with Planning and Control

Perception is only useful if it meaningfully influences where the robot goes and what it does. This topic explains how to connect your perception stack to planning and control, define ROS 2 interfaces, and ensure the same architecture works in both simulation and on real hardware.


5.1 Interfaces Between Perception and Planning

Your goal is to design clear contracts between perception and the rest of the system.

Common ROS 2 interfaces:

  • Topics from perception:
    • /perception/detections — List of objects with classes, scores, and 2D/3D poses.
    • /perception/scene_graph — Higher-level structure of the scene (objects + relationships).
    • /perception/maps — Occupancy grids and/or 3D maps from SLAM.
  • Topics/services to planning:
    • /planning/request_goal — Service for requesting navigation or manipulation goals.
    • /planning/goal_status — Feedback about goal progress.
  • Topics to control:
    • /control/trajectory — Planned joint or base trajectories.
    • /control/state — Current joint states and base pose.

Principles:

  • Keep perception outputs declarative:
    • Describe what the world looks like, not how to move.
  • Let planners and controllers handle how to reach goals:
    • They consume world state and produce actions.

5.2 World State Representations

Planning and control need a consistent view of the world, even as sensors update.

Typical components:

  • Occupancy grid / costmap:
    • 2D grid with:
      • Free, occupied, and unknown cells.
      • Inflated obstacles to account for robot footprint.
  • Object database:
    • Each object has:
      • ID, class label, pose, size, and confidence.
      • Optional attributes (color, affordances).
  • Scene graph:
    • Graph nodes: objects and key locations.
    • Edges: spatial/semantic relations (near, on, behind).

Design choices:

  • Decide which representations are authoritative:
    • For example, the costmap might be the source of truth for navigation.
    • The object database might be the source of truth for manipulation.
  • Decide how often representations are updated and who owns them:
    • A dedicated world model node can maintain and publish them.

5.3 Simulation vs Real Hardware

One of the key goals of this course is to ensure the same ROS 2 graph works in:

  • Gazebo/Isaac Sim (digital twin).
  • Real humanoid hardware.

Guidelines:

  • Use the same topics and message types in both worlds:
    • /camera/rgb/image_raw should exist in sim and real.
    • /perception/detections should look identical.
  • Keep hardware-specific details in:
    • Driver nodes.
    • Parameter files (e.g., camera intrinsics, frame IDs).
  • Use launch files to:
    • Swap simulators and real drivers without changing core perception/planning logic.

If you follow these patterns, moving from simulation to hardware becomes:

  • A matter of swapping launch configurations, not rewriting perception code.

5.4 Evaluation and Monitoring

To ensure your integrated system is working:

  • Monitor key metrics:
    • Detection accuracy on relevant objects.
    • SLAM trajectory error and map quality.
    • Planning success rate (reaching goals without collision).
    • Control tracking error (planned vs executed motion).
  • Use tools from previous chapters:
    • RViz for visualization.
    • ros2 bag for logging and offline analysis.
    • Digital twin (Chapter 3) for regression tests.

Establish a small test suite:

  • Fixed sensor logs and simulated scenarios.
  • Scripts to re-run perception and planning.
  • Automatic checks to catch regressions.

5.5 Capstone Alignment

At this point, your capstone humanoid should have:

  • A working perception stack from Chapter 4:
    • Real-time detections and segmentations.
    • SLAM-based maps and pose estimates.
    • Optional Vision-Language querying.
  • Clear ROS 2 interfaces:
    • World state topics and services for planners and controllers.
  • Integration with the digital twin:
    • Ability to run the full perception stack on simulated sensor data.

In Chapter 5, you will:

  • Design and implement navigation and manipulation policies.
  • Use the perception outputs defined here to drive autonomous behavior.
  • Validate the complete loop: perception → planning → control → perception.

If you can already answer questions like "Where is the nearest free space?" or "Where is the red mug?" and plan motions accordingly (even in simulation), you are well-prepared for the autonomy-focused work ahead.