Skip to main content

AI Assistant

Physical AI & Humanoid Robotics

Hello! I'm your AI assistant for the AI-Native Guide to Physical AI & Humanoid Robotics. How can I help you today?

04:57 AM

Topic 4 — LLM-Based Decision Making & Reasoning

Large language models (LLMs) and Vision-Language Models (VLMs) can serve as powerful high-level planners for humanoid robots. This topic explains how to translate natural language instructions into structured task graphs, how to keep autonomy closed-loop with perception, and how reinforcement-style feedback can refine task execution over time.


4.1 Natural Language → Task Graph Translation

High-level instructions:

  • "Bring me the red mug from the kitchen."
  • "Follow the person in the blue shirt and stop two meters behind them."
  • "Inspect the room and tell me what changed from yesterday."

LLM responsibilities:

  • Parse the instruction:
    • Identify goals (deliver, follow, inspect).
    • Identify entities (red mug, kitchen, person, room).
    • Identify constraints (distance, order, time).
  • Produce a structured plan, such as:
    • List of steps.
    • Behavior tree skeleton.
    • Goal and sub-goal definitions.

Example plan structure:

  • Task: "Deliver red mug from kitchen to user."
    • Step 1: Navigate to kitchen.
    • Step 2: Search for red mug.
    • Step 3: Pick red mug.
    • Step 4: Navigate to user location.
    • Step 5: Hand over mug.

Implementation approach:

  • Define a schema for tasks:
    • JSON or YAML structure listing steps, preconditions, and success criteria.
  • Prompt the LLM to output tasks in that schema.
  • Validate and sanitize LLM outputs before execution.

4.2 Closed-Loop Autonomy with Perception

Autonomy is not a one-shot plan; it is a loop:

  1. Sense (perception, SLAM, state estimation).
  2. Think (LLM + planners).
  3. Act (skills and controllers).
  4. Check (did the action succeed?).

LLMs can participate in this loop by:

  • Re-evaluating plans when:
    • Objects are not detected where expected.
    • Paths are blocked.
    • Goals become unreachable.
  • Generating clarifying questions:
    • "I cannot find a red mug in the kitchen. Should I search the living room?"
    • "The path to the storage room is blocked. Do you want me to wait or take a longer route?"

Design considerations:

  • Limit how often you call the LLM:
    • Use it for macro-decisions, not low-level control.
  • Provide it with:
    • A summary of the current world state (detected objects, maps, robot pose).
    • A history of recent actions and outcomes.

4.3 Self-Correction and Re-Scanning

Failures are inevitable:

  • Object not detected due to occlusion.
  • Grasp fails due to misalignment.
  • Localization drifts.

The agent must:

  • Detect failure conditions (e.g., no object in gripper, target not in view).
  • Trigger self-correction behaviors:
    • Re-scan the environment.
    • Adjust position or viewpoint.
    • Retry skills with slightly varied parameters.

LLM involvement:

  • Decide which correction strategy to use given:
    • The type of failure.
    • Task urgency and constraints.
  • Update the task graph or behavior tree accordingly:
    • Add new search steps.
    • Skip or reorder tasks if necessary.

4.4 Reinforcement-Based Task Optimization

Even with good planning, performance can improve with experience.

Concepts:

  • Reward signals:
    • Task success or failure.
    • Time to completion.
    • Smoothness and safety of motion (e.g., number of near-collisions).
  • Policy refinement:
    • Adjusting skill parameters (speeds, thresholds).
    • Selecting better sub-goal strategies (short vs long paths).

Practical approach:

  • Log each task execution:
    • Initial plan.
    • Sensor traces.
    • Decisions taken.
    • Outcomes and rewards.
  • Use logs offline to:
    • Train or fine-tune decision policies (e.g., RL or supervised learning).
    • Inform LLM prompting (“few-shot” examples of good vs bad plans).

The goal is not full-blown RL research, but to:

  • Build intuition for feedback-driven improvement.
  • Enable gradual refinement of behaviors over the course of the project.

4.5 Lab: Natural Language Command Execution

This lab ties together language, planning, and execution for simple command sets.

Objectives

  • Enable the robot to interpret a small set of natural language commands.
  • Map commands to task graphs or behavior trees and execute them end-to-end.

Tasks

  1. Command Set Definition
    • Choose a limited, well-defined set of commands, such as:
      • "Bring me the mug."
      • "Follow that person."
      • "Inspect the room."
  2. LLM Interface
    • Implement a node that:
      • Receives text commands (from CLI, UI, or speech-to-text).
      • Calls an LLM with prompts tailored to your schema.
      • Receives a structured plan or task graph.
  3. Plan Execution
    • Feed the plan into your behavior tree or task-execution framework.
    • Monitor progress and handle:
      • Success.
      • Failure with fallback.
  4. Evaluation
    • Run multiple trials with variations in:
      • Object positions.
      • Initial robot pose.
      • Minor language variations.

Deliverables

  • LLM integration node and plan schema.
  • Behavior tree/task graph definitions for chosen commands.
  • Logs and a brief report describing:
    • How commands are parsed.
    • How failures are handled.
    • Where human clarification is still needed.

This lab showcases your humanoid’s emerging agentic behavior: it can understand high-level instructions and coordinate navigation, perception, and manipulation to carry them out.