Topic 1 — ROS 2 Architecture & Core Concepts

This topic introduces the fundamental question: Why do robots need middleware? We explore the distributed architecture problem in robotics, compare ROS 1 and ROS 2, and establish the conceptual foundation for everything that follows in this chapter.

1.1 What is ROS 2? The Middleware Problem

Why Robots Need Middleware

Imagine building a humanoid robot from scratch. You need:

Sensors streaming data: cameras capturing RGB-D images at 30 FPS, LiDAR scanning 360° environments, IMUs tracking balance
Perception algorithms processing sensor data: object detection, SLAM, state estimation
Planning systems computing trajectories: path planning, manipulation planning, gait generation
Control systems executing commands: motor controllers, balance controllers, safety monitors

In a monolithic architecture, all of this runs in a single process. This approach has critical flaws:

Tight coupling: Changing the camera driver breaks the planner
No modularity: Can't reuse perception code on a different robot
Single point of failure: One crash brings down the entire system
Resource contention: Heavy perception blocks time-critical control loops
Testing difficulty: Can't test components in isolation

A distributed architecture solves these problems by separating concerns:

Sensors run on dedicated hardware (edge devices, specialized boards)
Perception runs on powerful GPUs (workstation, Jetson)
Planning runs on CPUs with access to maps and models
Control runs on real-time hardware (robot's onboard computer)

But distributed systems create a new problem: How do these components communicate?

This is where middleware comes in. Middleware provides:

Standardized communication protocols — All components speak the same language
Discovery and naming — Components find each other automatically
Type safety — Messages are validated before transmission
Quality of Service (QoS) — Guarantees about delivery, latency, reliability
Hardware abstraction — Same code works with different sensors/actuators

Robot Operating System 2 (ROS 2) is middleware specifically designed for robotics. At its core, ROS 2 provides:

Nodes: Independent processes that perform specific tasks
Topics: Named data streams for publish-subscribe communication
Services: Request-response operations for discrete queries
Actions: Goal-oriented tasks with feedback and cancellation
Parameters: Configuration values accessible at runtime

ROS 2 uses DDS (Data Distribution Service) as its underlying communication layer. DDS is an industry-standard middleware used in aerospace, defense, and industrial automation. It provides:

Real-time guarantees: Deterministic latency for critical control loops
Type safety: Strong typing prevents message mismatches
Discovery: Automatic detection of publishers and subscribers
QoS policies: Fine-grained control over reliability, durability, history

ROS 1 vs ROS 2: Why ROS 2 Matters

ROS 1 (the original ROS) revolutionized robotics but had fundamental limitations:

Feature	ROS 1	ROS 2
Real-time support	Limited, not deterministic	Full real-time support via DDS
Type safety	Weak, runtime errors common	Strong typing, compile-time checks
Network security	None (plain TCP)	Built-in security (DDS Security)
Multi-robot support	Difficult, namespace hacks	Native multi-robot support
Cross-platform	Linux only	Linux, Windows, macOS, RTOS
Lifecycle management	Manual, error-prone	Managed nodes with state machines
QoS control	None	Granular QoS policies

Key improvements in ROS 2:

Real-time determinism: Control loops can run with guaranteed latency
Production-ready: Used in commercial robots (Boston Dynamics Spot, Fetch Robotics)
Security: DDS Security prevents unauthorized access
Modularity: Better separation of concerns, easier testing

For this course, we use ROS 2 Humble (or Iron), the current LTS (Long-Term Support) release.

1.2 The ROS 2 Computation Graph

The computation graph is the conceptual model of how ROS 2 systems are organized. It consists of:

Nodes

Nodes are independent processes that perform specific tasks. Each node has:

Single responsibility: One node does one thing well
Unique name: Identified by namespace and name (e.g., /perception/camera_driver)
Lifecycle: Managed startup, shutdown, and error recovery
Interfaces: Publishes/subscribes to topics, provides/uses services, handles actions

Example nodes in a humanoid robot:

/sensors/camera — Publishes RGB-D images
/perception/object_detector — Subscribes to images, publishes detections
/planning/navigator — Provides navigation service
/control/motor_controller — Subscribes to commands, controls motors

Topics

Topics are named data streams for asynchronous, one-to-many communication. They use the publish-subscribe pattern:

Publishers send messages without knowing who receives them
Subscribers receive messages without knowing who sends them
Decoupling: Publishers and subscribers are independent

Example topics:

/camera/rgb — RGB images (published by camera driver)
/lidar/scan — LiDAR point clouds
/joint_states — Current joint positions and velocities
/motor_commands — Desired joint velocities

Services

Services provide synchronous, request-response communication. Unlike topics (which stream continuously), services are:

One-to-one: One client calls one server
Blocking: Client waits for response
Discrete: Used for queries, not continuous data

Example services:

/get_robot_pose — Returns current robot position
/plan_trajectory — Takes start/goal, returns path
/set_parameters — Updates configuration

Actions

Actions are asynchronous, goal-oriented tasks with feedback. They combine:

Goal: Client sends a goal (e.g., "navigate to position X")
Feedback: Server reports progress (e.g., "50% complete")
Result: Server returns final outcome (e.g., "goal reached" or "failed")

Example actions:

/navigate_to_goal — Long-running navigation task
/grasp_object — Manipulation with progress updates
/execute_trajectory — Motion execution with feedback

Parameters

Parameters are configuration values accessible at runtime. They enable:

Dynamic reconfiguration: Change behavior without restarting nodes
Environment-specific settings: Different values for sim vs. real
Tuning: Adjust control gains, thresholds, limits

Example parameters:

control/max_velocity — Maximum joint velocity
perception/confidence_threshold — Object detection threshold
planning/timeout — Planning timeout in seconds

1.3 Node Lifecycle & Executors

Node Lifecycle States

ROS 2 supports lifecycle-managed nodes that transition through well-defined states:

Unconfigured — Node created but not initialized
Inactive — Node configured but not active
Active — Node running and processing
Finalized — Node cleaned up and shut down

Lifecycle transitions:

configure — Initialize node (load parameters, setup)
activate — Start processing (begin publishing/subscribing)
deactivate — Stop processing (pause, but keep state)
cleanup — Clean up resources
shutdown — Final shutdown

Why lifecycle management matters:

Predictable startup: Nodes initialize in correct order
Graceful shutdown: Clean resource cleanup
Error recovery: Nodes can restart without full system reboot
Safety: Critical nodes can be paused without losing state

Executors: Single-Threaded vs Multi-Threaded

Executors control how nodes process callbacks (messages, service requests, timers). ROS 2 provides two models:

SingleThreadedExecutor:

All callbacks run in one thread
Deterministic: Predictable execution order
Real-time friendly: No thread contention
Use case: Control loops, time-critical nodes

MultiThreadedExecutor:

Callbacks run in thread pool
Higher throughput: Parallel processing
Non-deterministic: Order depends on scheduling
Use case: Perception, planning (can tolerate jitter)

Best practice: Use single-threaded executors for control, multi-threaded for perception/planning.

1.4 Quality of Service (QoS) Profiles

QoS policies control how messages are delivered. This is critical for robotics where different data types have different requirements.

Reliability

Reliable: Messages guaranteed to be delivered (may retry)
Best-effort: Messages may be dropped if queue is full

Use cases:

Reliable: Motor commands, safety signals (must not be lost)
Best-effort: Camera images, high-frequency sensor data (can tolerate drops)

Durability

Volatile: Only current subscribers receive messages
Transient Local: New subscribers receive last message

Use cases:

Volatile: Real-time sensor streams
Transient Local: Robot state, map data (new nodes need current state)

History

Keep Last: Keep N most recent messages
Keep All: Keep all messages (may grow unbounded)

Use cases:

Keep Last (depth=1): Latest state only
Keep Last (depth=10): Small buffer for jitter tolerance
Keep All: Debugging, logging (use with caution)

Common QoS Profiles

ROS 2 provides pre-configured profiles:

Sensor Data: Best-effort, volatile, keep last (depth=5)
Services: Reliable, volatile, keep last (depth=10)
Parameters: Reliable, transient local, keep last (depth=1000)
System Default: Reliable, volatile, keep last (depth=10)

Matching QoS: Publishers and subscribers must have compatible QoS. If incompatible, they won't connect.

1.5 Hands-On Preview

In the following topics, you will build:

Minimal nodes: Publisher and subscriber nodes
Service nodes: Server and client for planning queries
Action nodes: Goal-oriented navigation with feedback
Parameter nodes: Dynamic configuration management
Multi-node system: Complete 3-node pipeline

Each topic includes:

Conceptual explanation: Why and when to use each pattern
Code examples: Working Python (rclpy) implementations
Best practices: Common mistakes and how to avoid them
Debugging tips: Tools and techniques for troubleshooting

Summary

ROS 2 solves the distributed coordination problem in robotics by providing:

Standardized communication: Topics, services, actions
Real-time guarantees: DDS-based deterministic delivery
Modularity: Independent nodes with clear interfaces
Type safety: Strong typing prevents runtime errors
QoS control: Fine-grained control over message delivery

Understanding these fundamentals is essential for building robust, scalable robot systems. The next topics dive deep into implementation details, code examples, and hands-on labs.

References

ROS 2 Design: https://design.ros2.org/
DDS Specification: https://www.omg.org/spec/DDS/
ROS 2 Humble Documentation: https://docs.ros.org/en/humble/

AI Assistant

1.1 What is ROS 2? The Middleware Problem​

Why Robots Need Middleware​

ROS 2 as a Publish-Subscribe Message Bus​

ROS 1 vs ROS 2: Why ROS 2 Matters​

1.2 The ROS 2 Computation Graph​

Nodes​

Topics​

Services​

Actions​

Parameters​

1.3 Node Lifecycle & Executors​

Node Lifecycle States​

Executors: Single-Threaded vs Multi-Threaded​

1.4 Quality of Service (QoS) Profiles​

Reliability​

Durability​

History​

Common QoS Profiles​

1.5 Hands-On Preview​

Summary​

References​

1.1 What is ROS 2? The Middleware Problem

Why Robots Need Middleware

ROS 2 as a Publish-Subscribe Message Bus

ROS 1 vs ROS 2: Why ROS 2 Matters

1.2 The ROS 2 Computation Graph

Nodes

Topics

Services

Actions

Parameters

1.3 Node Lifecycle & Executors

Node Lifecycle States

Executors: Single-Threaded vs Multi-Threaded

1.4 Quality of Service (QoS) Profiles

Reliability

Durability

History

Common QoS Profiles

1.5 Hands-On Preview

Summary

References