Topic 1 — Foundations of Autonomy & Agent-Based Robotics
Perception, mapping, and control are necessary but not sufficient for autonomy. This topic defines what it means for a humanoid robot to be autonomous, introduces the architecture of an agent-based robotics system, and explains how high-level reasoning, planning, and control fit together.
1.1 What Makes a Robot Autonomous?
An autonomous robot is more than a remote-controlled machine. It must:
- Perceive its environment (state awareness).
- Decide what to do next (planning and decision-making).
- Act on those decisions through motors and actuators.
- Adapt to changes and recover from failures without human intervention.
Key distinctions:
- Teleoperated robot:
- Human operator issues low-level commands ("move joint 3", "turn left 10 degrees").
- Robot has little or no independent decision-making.
- Autonomous robot:
- Receives high-level goals ("deliver this object to room B").
- Decides how to achieve them given its current state and environment.
Autonomy requires:
- A state estimate of the world (from perception and SLAM).
- A notion of goals and constraints.
- A policy or planning mechanism to choose actions.
- A way to monitor execution and react to deviations.
1.2 Architecture of an Autonomous Agent
A common pattern for agentic robotics systems is a hierarchical stack:
- High-Level Reasoning (LLM / Task Planner)
- Interprets natural language commands.
- Decomposes goals into sub-tasks or skills.
- Chooses strategies (e.g., "search room", "replan route").
- Task Planner / Behavior Manager
- Represents tasks as graphs, behavior trees, or finite state machines.
- Sequences skills, handles branching logic, and implements retry/fallback.
- Skill Layer
- Encapsulates reusable behaviors such as:
- Navigate to pose.
- Pick up object.
- Follow person.
- Place object at target.
- Each skill has a defined API (inputs, outputs, success/failure conditions).
- Encapsulates reusable behaviors such as:
- Low-Level Control
- Executes trajectories and control laws:
- Base navigation controllers.
- Arm and hand controllers.
- Balance and compliance controllers.
- Runs at high frequency (tens to hundreds of Hz).
- Executes trajectories and control laws:
- Sensing and World Model
- Maintains maps, detections, and state estimates.
- Feeds into all higher layers for decision-making.
These components form a closed loop:
- High-level goals → plans → skills → motor commands → new sensor data → updated world model → revised decisions.
1.3 Roles of Each Layer
High-Level Reasoning (LLM / VLM)
Responsibilities:
- Interpret human commands (e.g., "bring me the red mug from the kitchen").
- Translate them into structured task descriptions:
- Objectives (what to achieve).
- Constraints (avoid the wet floor).
- Preferences (fastest vs safest path).
Constraints:
- Non-real-time: may be relatively slow and not suitable for millisecond-level decisions.
- Should not directly command actuators; instead, it generates plans and task graphs.
Task Planner / Behavior Manager
Responsibilities:
- Turn high-level tasks into sequences of skills.
- Handle:
- Success paths.
- Failure branches (e.g., object not found).
- Parallel tasks (scan while walking).
Tools:
- Behavior trees.
- Task graphs.
- Hierarchical finite state machines.
Skills
Responsibilities:
- Provide reusable building blocks with clear contracts.
- Encapsulate:
- Perception queries (e.g., "where is object X?").
- Planning calls (e.g., "plan route to Y").
- Control calls (e.g., "execute grasp").
Example skills:
navigate_to_pose(goal_pose).pick_object(object_id).place_object(object_id, target_pose).follow_person(target_id).
Low-Level Control
Responsibilities:
- Translate desired motions into motor commands.
- Close feedback loops using joint states, IMU, force/torque, etc.
- Guarantee:
- Stability (no falls).
- Safety (respect joint limits and forces).
This layer is largely independent of how tasks are defined; it just executes trajectories safely.
1.4 Feedback Loops and Continuous Reevaluation
Real environments are dynamic:
- People move.
- Objects get bumped.
- Doors might be closed or opened unexpectedly.
An autonomous agent must:
- Continuously re-evaluate its assumptions based on new sensor data.
- Detect when plans are invalidated:
- Path blocked by a new obstacle.
- Target object missing from expected location.
- Trigger replanning or alternative strategies:
- Choose a different path.
- Search another room.
- Ask the user for clarification.
Design implications:
- Planning should not be a one-shot calculation—use receding horizon or continuous replanning.
- Behavior trees and task graphs should be designed with:
- Timeouts.
- Retry counts.
- Fallback behaviors.
1.5 From Perception to Agency
Chapters 2–4 provided:
- ROS 2 communication and control.
- A digital twin for safe experimentation.
- Perception and mapping pipelines.
Chapter 5 adds:
- Mechanisms for choosing what to do given that information.
- Structures for sequencing and monitoring complex tasks.
- Interfaces that tie natural language to physical behavior.
Keep this mental model:
- Chapter 3: "Where am I and what can I simulate safely?"
- Chapter 4: "What do I see and how do I represent it?"
- Chapter 5: "Given what I see and what I want, what should I do next, and how?"
The remaining topics in this chapter turn this conceptual architecture into concrete navigation, task execution, and agentic control pipelines.