Topic 2 — Technical Stack & Platform Choices
This topic explains why the course uses Python, ROS 2, Gazebo, NVIDIA Isaac Sim, Whisper/LLMs, Intel RealSense, and Nav2, and how these tools support the capstone goal: an autonomous humanoid that understands voice, navigates, perceives, and manipulates objects.
It is derived from the research findings in specs/1-physical-ai-course/research.md and is intentionally technology‑focused to complement the conceptual foundations in Chapter 1.
1. Language Choice: Python 3.x
The primary language for this course is Python 3.x.
- Why Python?
- Excellent support across ROS 2, NVIDIA Isaac Sim, and AI/ML libraries.
- High readability and rapid prototyping for complex, multi‑component systems.
- Strong ecosystem for data processing, numerical computing, and experimentation.
- What about C++?
- C++ remains important for performance‑critical ROS 2 nodes.
- In this course, Python is the default for teaching and integration; C++ can be introduced where low‑level optimization is required.
In practice, you should expect to write most ROS 2 nodes, Isaac scripts, and lab code in Python, while being aware that many underlying libraries are implemented in C++.
2. Core Dependencies: ROS 2, Gazebo, NVIDIA Isaac, Whisper, LLMs, Intel RealSense, Nav2
The course standardizes on the following stack:
- ROS 2 — Middleware and “nervous system” for all humanoid behaviors.
- Gazebo — Open‑source physics simulator for early digital twin work.
- NVIDIA Isaac Sim — High‑fidelity, GPU‑accelerated simulation and synthetic data.
- OpenAI Whisper & LLMs — Voice recognition and natural language understanding.
- Intel RealSense SDK — RGB‑D and IMU‑based perception (SLAM, object detection).
- Nav2 — Navigation stack for path planning, localization, and control in ROS 2.
Why this combination?
- Ecosystem strength: These tools are widely used in both academia and industry.
- Interoperability: ROS 2 acts as the integration layer; Gazebo and Isaac Sim both have ROS 2 integrations; Nav2 sits on top of ROS 2; RealSense has ROS 2 drivers.
- Future‑proofing: Isaac Sim and Isaac ROS align with NVIDIA’s roadmap for robotics and accelerated perception.
- Course alignment:
- ROS 2 → Constitution Principle II (ROS 2 Mastery)
- Gazebo / Isaac Sim → Principle III (Simulation‑First) & IV (NVIDIA Isaac Integration)
- RealSense, Nav2, LLMs, Whisper → Principles V (Humanoid Design) & VI (Conversational Robotics)
3. Storage Model: Real‑Time Robotics, Not Data Warehouse
The capstone system is primarily real‑time:
- Sensor data (images, depth, IMU, LiDAR) and control commands are transient.
- Perception, planning, and control loops operate in memory, inside ROS 2 nodes.
Therefore, the default design uses no dedicated database inside the robot:
- Long‑term logging, dataset creation, and model training can write to files or external services.
- This keeps the on‑robot runtime focused on latency, safety, and determinism rather than data warehousing.
You can still add databases later (for telemetry, experiment tracking, or student projects), but they are not a core requirement for the course’s learning outcomes.
4. Testing Strategy: ROS 2 Tests + pytest
Reliable physical AI requires testing at multiple levels:
- ROS 2 test tools:
rostest/gtestfor node‑to‑node interactions, message flow, and integration tests.- Ideal for verifying that perception, planning, and control nodes coordinate correctly.
- Python unit tests:
pytestfor algorithms, utility functions, planners, and VLA logic.
In labs and projects, you will:
- Write
pytesttests for pure Python logic (e.g., planners, data transforms). - Use ROS 2 test patterns for end‑to‑end behaviors (e.g., “publisher sends message, subscriber reacts correctly”).
5. Target Platforms: Ubuntu 22.04 & Jetson Orin
The course standardizes on:
- Ubuntu 22.04 LTS for the Digital Twin Workstation:
- Stable ROS 2 (Humble) support.
- Officially supported by NVIDIA for Isaac Sim and GPU drivers.
- Ideal for running full‑fidelity simulations and training workloads.
- NVIDIA Jetson Orin Nano for the Physical AI Edge Kit:
- Edge deployment of ROS 2 nodes and trained models.
- Represents realistic constraints: limited power, memory, and compute compared to a desktop GPU.
This split enforces a sim‑to‑real mindset:
- Heavy training and experimentation run on the workstation.
- Deployed behaviors are flashed or synced to the edge device for on‑robot execution.
6. Performance & Constraints
The research defines clear performance and constraint targets:
- Performance goals:
- Real‑time control loops with tight latency requirements.
- Real‑time sensor processing.
- ~60 FPS simulation for smooth, realistic digital twin behavior.
- Low‑latency NLP for conversational interaction.
- Constraints:
- Shared GPUs and lab equipment (you are not alone on the cluster).
- Limited numbers of RealSense cameras, LiDARs, and robots.
- Ethical and safety constraints around physical interaction.
In practice, this means:
- You must be mindful of GPU usage, batch sizes, and simulation complexity.
- You should design controllers and planners that degrade gracefully when resources are tight.
- Safety and transparency are treated as non‑negotiable design requirements, not nice‑to‑haves.
7. Scope: A Hybrid Simulation + Physical Robotics System
The capstone is explicitly hybrid:
- Simulation:
- Design and validate behaviors in Gazebo and Isaac Sim.
- Generate synthetic data for training and evaluation.
- Physical deployment:
- Deploy validated behaviors to Jetson‑powered edge kits.
- Connect RealSense, microphones, and (where available) humanoid platforms.
The target level of capability:
- Voice‑driven commands (Whisper + LLM).
- Path planning and navigation (Nav2).
- Object recognition and manipulation using RGB‑D perception.
- Robustness to environment variation within the constraints of the lab.
This topic should give you a clear picture of why the course uses this stack and how it maps directly onto the project constitution and capstone goals.