Topic 1 — ROS 2 Architecture & Core Concepts
This topic introduces the fundamental question: Why do robots need middleware? We explore the distributed architecture problem in robotics, compare ROS 1 and ROS 2, and establish the conceptual foundation for everything that follows in this chapter.
1.1 What is ROS 2? The Middleware Problem
Why Robots Need Middleware
Imagine building a humanoid robot from scratch. You need:
- Sensors streaming data: cameras capturing RGB-D images at 30 FPS, LiDAR scanning 360° environments, IMUs tracking balance
- Perception algorithms processing sensor data: object detection, SLAM, state estimation
- Planning systems computing trajectories: path planning, manipulation planning, gait generation
- Control systems executing commands: motor controllers, balance controllers, safety monitors
In a monolithic architecture, all of this runs in a single process. This approach has critical flaws:
- Tight coupling: Changing the camera driver breaks the planner
- No modularity: Can't reuse perception code on a different robot
- Single point of failure: One crash brings down the entire system
- Resource contention: Heavy perception blocks time-critical control loops
- Testing difficulty: Can't test components in isolation
A distributed architecture solves these problems by separating concerns:
- Sensors run on dedicated hardware (edge devices, specialized boards)
- Perception runs on powerful GPUs (workstation, Jetson)
- Planning runs on CPUs with access to maps and models
- Control runs on real-time hardware (robot's onboard computer)
But distributed systems create a new problem: How do these components communicate?
This is where middleware comes in. Middleware provides:
- Standardized communication protocols — All components speak the same language
- Discovery and naming — Components find each other automatically
- Type safety — Messages are validated before transmission
- Quality of Service (QoS) — Guarantees about delivery, latency, reliability
- Hardware abstraction — Same code works with different sensors/actuators
ROS 2 as a Publish-Subscribe Message Bus
Robot Operating System 2 (ROS 2) is middleware specifically designed for robotics. At its core, ROS 2 provides:
- Nodes: Independent processes that perform specific tasks
- Topics: Named data streams for publish-subscribe communication
- Services: Request-response operations for discrete queries
- Actions: Goal-oriented tasks with feedback and cancellation
- Parameters: Configuration values accessible at runtime
ROS 2 uses DDS (Data Distribution Service) as its underlying communication layer. DDS is an industry-standard middleware used in aerospace, defense, and industrial automation. It provides:
- Real-time guarantees: Deterministic latency for critical control loops
- Type safety: Strong typing prevents message mismatches
- Discovery: Automatic detection of publishers and subscribers
- QoS policies: Fine-grained control over reliability, durability, history
ROS 1 vs ROS 2: Why ROS 2 Matters
ROS 1 (the original ROS) revolutionized robotics but had fundamental limitations:
| Feature | ROS 1 | ROS 2 |
|---|---|---|
| Real-time support | Limited, not deterministic | Full real-time support via DDS |
| Type safety | Weak, runtime errors common | Strong typing, compile-time checks |
| Network security | None (plain TCP) | Built-in security (DDS Security) |
| Multi-robot support | Difficult, namespace hacks | Native multi-robot support |
| Cross-platform | Linux only | Linux, Windows, macOS, RTOS |
| Lifecycle management | Manual, error-prone | Managed nodes with state machines |
| QoS control | None | Granular QoS policies |
Key improvements in ROS 2:
- Real-time determinism: Control loops can run with guaranteed latency
- Production-ready: Used in commercial robots (Boston Dynamics Spot, Fetch Robotics)
- Security: DDS Security prevents unauthorized access
- Modularity: Better separation of concerns, easier testing
For this course, we use ROS 2 Humble (or Iron), the current LTS (Long-Term Support) release.
1.2 The ROS 2 Computation Graph
The computation graph is the conceptual model of how ROS 2 systems are organized. It consists of:
Nodes
Nodes are independent processes that perform specific tasks. Each node has:
- Single responsibility: One node does one thing well
- Unique name: Identified by namespace and name (e.g.,
/perception/camera_driver) - Lifecycle: Managed startup, shutdown, and error recovery
- Interfaces: Publishes/subscribes to topics, provides/uses services, handles actions
Example nodes in a humanoid robot:
/sensors/camera— Publishes RGB-D images/perception/object_detector— Subscribes to images, publishes detections/planning/navigator— Provides navigation service/control/motor_controller— Subscribes to commands, controls motors
Topics
Topics are named data streams for asynchronous, one-to-many communication. They use the publish-subscribe pattern:
- Publishers send messages without knowing who receives them
- Subscribers receive messages without knowing who sends them
- Decoupling: Publishers and subscribers are independent
Example topics:
/camera/rgb— RGB images (published by camera driver)/lidar/scan— LiDAR point clouds/joint_states— Current joint positions and velocities/motor_commands— Desired joint velocities
Services
Services provide synchronous, request-response communication. Unlike topics (which stream continuously), services are:
- One-to-one: One client calls one server
- Blocking: Client waits for response
- Discrete: Used for queries, not continuous data
Example services:
/get_robot_pose— Returns current robot position/plan_trajectory— Takes start/goal, returns path/set_parameters— Updates configuration
Actions
Actions are asynchronous, goal-oriented tasks with feedback. They combine:
- Goal: Client sends a goal (e.g., "navigate to position X")
- Feedback: Server reports progress (e.g., "50% complete")
- Result: Server returns final outcome (e.g., "goal reached" or "failed")
Example actions:
/navigate_to_goal— Long-running navigation task/grasp_object— Manipulation with progress updates/execute_trajectory— Motion execution with feedback
Parameters
Parameters are configuration values accessible at runtime. They enable:
- Dynamic reconfiguration: Change behavior without restarting nodes
- Environment-specific settings: Different values for sim vs. real
- Tuning: Adjust control gains, thresholds, limits
Example parameters:
control/max_velocity— Maximum joint velocityperception/confidence_threshold— Object detection thresholdplanning/timeout— Planning timeout in seconds
1.3 Node Lifecycle & Executors
Node Lifecycle States
ROS 2 supports lifecycle-managed nodes that transition through well-defined states:
- Unconfigured — Node created but not initialized
- Inactive — Node configured but not active
- Active — Node running and processing
- Finalized — Node cleaned up and shut down
Lifecycle transitions:
configure— Initialize node (load parameters, setup)activate— Start processing (begin publishing/subscribing)deactivate— Stop processing (pause, but keep state)cleanup— Clean up resourcesshutdown— Final shutdown
Why lifecycle management matters:
- Predictable startup: Nodes initialize in correct order
- Graceful shutdown: Clean resource cleanup
- Error recovery: Nodes can restart without full system reboot
- Safety: Critical nodes can be paused without losing state
Executors: Single-Threaded vs Multi-Threaded
Executors control how nodes process callbacks (messages, service requests, timers). ROS 2 provides two models:
SingleThreadedExecutor:
- All callbacks run in one thread
- Deterministic: Predictable execution order
- Real-time friendly: No thread contention
- Use case: Control loops, time-critical nodes
MultiThreadedExecutor:
- Callbacks run in thread pool
- Higher throughput: Parallel processing
- Non-deterministic: Order depends on scheduling
- Use case: Perception, planning (can tolerate jitter)
Best practice: Use single-threaded executors for control, multi-threaded for perception/planning.
1.4 Quality of Service (QoS) Profiles
QoS policies control how messages are delivered. This is critical for robotics where different data types have different requirements.
Reliability
- Reliable: Messages guaranteed to be delivered (may retry)
- Best-effort: Messages may be dropped if queue is full
Use cases:
- Reliable: Motor commands, safety signals (must not be lost)
- Best-effort: Camera images, high-frequency sensor data (can tolerate drops)
Durability
- Volatile: Only current subscribers receive messages
- Transient Local: New subscribers receive last message
Use cases:
- Volatile: Real-time sensor streams
- Transient Local: Robot state, map data (new nodes need current state)
History
- Keep Last: Keep N most recent messages
- Keep All: Keep all messages (may grow unbounded)
Use cases:
- Keep Last (depth=1): Latest state only
- Keep Last (depth=10): Small buffer for jitter tolerance
- Keep All: Debugging, logging (use with caution)
Common QoS Profiles
ROS 2 provides pre-configured profiles:
- Sensor Data: Best-effort, volatile, keep last (depth=5)
- Services: Reliable, volatile, keep last (depth=10)
- Parameters: Reliable, transient local, keep last (depth=1000)
- System Default: Reliable, volatile, keep last (depth=10)
Matching QoS: Publishers and subscribers must have compatible QoS. If incompatible, they won't connect.
1.5 Hands-On Preview
In the following topics, you will build:
- Minimal nodes: Publisher and subscriber nodes
- Service nodes: Server and client for planning queries
- Action nodes: Goal-oriented navigation with feedback
- Parameter nodes: Dynamic configuration management
- Multi-node system: Complete 3-node pipeline
Each topic includes:
- Conceptual explanation: Why and when to use each pattern
- Code examples: Working Python (
rclpy) implementations - Best practices: Common mistakes and how to avoid them
- Debugging tips: Tools and techniques for troubleshooting
Summary
ROS 2 solves the distributed coordination problem in robotics by providing:
- Standardized communication: Topics, services, actions
- Real-time guarantees: DDS-based deterministic delivery
- Modularity: Independent nodes with clear interfaces
- Type safety: Strong typing prevents runtime errors
- QoS control: Fine-grained control over message delivery
Understanding these fundamentals is essential for building robust, scalable robot systems. The next topics dive deep into implementation details, code examples, and hands-on labs.
References
- ROS 2 Design: https://design.ros2.org/
- DDS Specification: https://www.omg.org/spec/DDS/
- ROS 2 Humble Documentation: https://docs.ros.org/en/humble/