Topic 1 — Foundations of Autonomy & Agent-Based Robotics

Perception, mapping, and control are necessary but not sufficient for autonomy. This topic defines what it means for a humanoid robot to be autonomous, introduces the architecture of an agent-based robotics system, and explains how high-level reasoning, planning, and control fit together.

1.1 What Makes a Robot Autonomous?

An autonomous robot is more than a remote-controlled machine. It must:

Perceive its environment (state awareness).
Decide what to do next (planning and decision-making).
Act on those decisions through motors and actuators.
Adapt to changes and recover from failures without human intervention.

Key distinctions:

Teleoperated robot:
- Human operator issues low-level commands ("move joint 3", "turn left 10 degrees").
- Robot has little or no independent decision-making.
Autonomous robot:
- Receives high-level goals ("deliver this object to room B").
- Decides how to achieve them given its current state and environment.

Autonomy requires:

A state estimate of the world (from perception and SLAM).
A notion of goals and constraints.
A policy or planning mechanism to choose actions.
A way to monitor execution and react to deviations.

1.2 Architecture of an Autonomous Agent

A common pattern for agentic robotics systems is a hierarchical stack:

High-Level Reasoning (LLM / Task Planner)
- Interprets natural language commands.
- Decomposes goals into sub-tasks or skills.
- Chooses strategies (e.g., "search room", "replan route").
Task Planner / Behavior Manager
- Represents tasks as graphs, behavior trees, or finite state machines.
- Sequences skills, handles branching logic, and implements retry/fallback.
Skill Layer
- Encapsulates reusable behaviors such as:
  - Navigate to pose.
  - Pick up object.
  - Follow person.
  - Place object at target.
- Each skill has a defined API (inputs, outputs, success/failure conditions).
Low-Level Control
- Executes trajectories and control laws:
  - Base navigation controllers.
  - Arm and hand controllers.
  - Balance and compliance controllers.
- Runs at high frequency (tens to hundreds of Hz).
Sensing and World Model
- Maintains maps, detections, and state estimates.
- Feeds into all higher layers for decision-making.

These components form a closed loop:

High-level goals → plans → skills → motor commands → new sensor data → updated world model → revised decisions.

1.3 Roles of Each Layer

High-Level Reasoning (LLM / VLM)

Responsibilities:

Interpret human commands (e.g., "bring me the red mug from the kitchen").
Translate them into structured task descriptions:
- Objectives (what to achieve).
- Constraints (avoid the wet floor).
- Preferences (fastest vs safest path).

Constraints:

Non-real-time: may be relatively slow and not suitable for millisecond-level decisions.
Should not directly command actuators; instead, it generates plans and task graphs.

Task Planner / Behavior Manager

Responsibilities:

Turn high-level tasks into sequences of skills.
Handle:
- Success paths.
- Failure branches (e.g., object not found).
- Parallel tasks (scan while walking).

Tools:

Behavior trees.
Task graphs.
Hierarchical finite state machines.

Skills

Responsibilities:

Provide reusable building blocks with clear contracts.
Encapsulate:
- Perception queries (e.g., "where is object X?").
- Planning calls (e.g., "plan route to Y").
- Control calls (e.g., "execute grasp").

Example skills:

navigate_to_pose(goal_pose).
pick_object(object_id).
place_object(object_id, target_pose).
follow_person(target_id).

Low-Level Control

Responsibilities:

Translate desired motions into motor commands.
Close feedback loops using joint states, IMU, force/torque, etc.
Guarantee:
- Stability (no falls).
- Safety (respect joint limits and forces).

This layer is largely independent of how tasks are defined; it just executes trajectories safely.

1.4 Feedback Loops and Continuous Reevaluation

Real environments are dynamic:

People move.
Objects get bumped.
Doors might be closed or opened unexpectedly.

An autonomous agent must:

Continuously re-evaluate its assumptions based on new sensor data.
Detect when plans are invalidated:
- Path blocked by a new obstacle.
- Target object missing from expected location.
Trigger replanning or alternative strategies:
- Choose a different path.
- Search another room.
- Ask the user for clarification.

Design implications:

Planning should not be a one-shot calculation—use receding horizon or continuous replanning.
Behavior trees and task graphs should be designed with:
- Timeouts.
- Retry counts.
- Fallback behaviors.

1.5 From Perception to Agency

Chapters 2–4 provided:

ROS 2 communication and control.
A digital twin for safe experimentation.
Perception and mapping pipelines.

Chapter 5 adds:

Mechanisms for choosing what to do given that information.
Structures for sequencing and monitoring complex tasks.
Interfaces that tie natural language to physical behavior.

Keep this mental model:

Chapter 3: "Where am I and what can I simulate safely?"
Chapter 4: "What do I see and how do I represent it?"
Chapter 5: "Given what I see and what I want, what should I do next, and how?"

The remaining topics in this chapter turn this conceptual architecture into concrete navigation, task execution, and agentic control pipelines.

AI Assistant

1.1 What Makes a Robot Autonomous?​

1.2 Architecture of an Autonomous Agent​

1.3 Roles of Each Layer​

High-Level Reasoning (LLM / VLM)​

Task Planner / Behavior Manager​

Skills​

Low-Level Control​

1.4 Feedback Loops and Continuous Reevaluation​

1.5 From Perception to Agency​

1.1 What Makes a Robot Autonomous?

1.2 Architecture of an Autonomous Agent

1.3 Roles of Each Layer

High-Level Reasoning (LLM / VLM)

Task Planner / Behavior Manager

Skills

Low-Level Control

1.4 Feedback Loops and Continuous Reevaluation

1.5 From Perception to Agency