Topic 4 — LLM-Based Decision Making & Reasoning

Large language models (LLMs) and Vision-Language Models (VLMs) can serve as powerful high-level planners for humanoid robots. This topic explains how to translate natural language instructions into structured task graphs, how to keep autonomy closed-loop with perception, and how reinforcement-style feedback can refine task execution over time.

4.1 Natural Language → Task Graph Translation

High-level instructions:

"Bring me the red mug from the kitchen."
"Follow the person in the blue shirt and stop two meters behind them."
"Inspect the room and tell me what changed from yesterday."

LLM responsibilities:

Parse the instruction:
- Identify goals (deliver, follow, inspect).
- Identify entities (red mug, kitchen, person, room).
- Identify constraints (distance, order, time).
Produce a structured plan, such as:
- List of steps.
- Behavior tree skeleton.
- Goal and sub-goal definitions.

Example plan structure:

Task: "Deliver red mug from kitchen to user."
- Step 1: Navigate to kitchen.
- Step 2: Search for red mug.
- Step 3: Pick red mug.
- Step 4: Navigate to user location.
- Step 5: Hand over mug.

Implementation approach:

Define a schema for tasks:
- JSON or YAML structure listing steps, preconditions, and success criteria.
Prompt the LLM to output tasks in that schema.
Validate and sanitize LLM outputs before execution.

4.2 Closed-Loop Autonomy with Perception

Autonomy is not a one-shot plan; it is a loop:

Sense (perception, SLAM, state estimation).
Think (LLM + planners).
Act (skills and controllers).
Check (did the action succeed?).

LLMs can participate in this loop by:

Re-evaluating plans when:
- Objects are not detected where expected.
- Paths are blocked.
- Goals become unreachable.
Generating clarifying questions:
- "I cannot find a red mug in the kitchen. Should I search the living room?"
- "The path to the storage room is blocked. Do you want me to wait or take a longer route?"

Design considerations:

Limit how often you call the LLM:
- Use it for macro-decisions, not low-level control.
Provide it with:
- A summary of the current world state (detected objects, maps, robot pose).
- A history of recent actions and outcomes.

4.3 Self-Correction and Re-Scanning

Failures are inevitable:

Object not detected due to occlusion.
Grasp fails due to misalignment.
Localization drifts.

The agent must:

Detect failure conditions (e.g., no object in gripper, target not in view).
Trigger self-correction behaviors:
- Re-scan the environment.
- Adjust position or viewpoint.
- Retry skills with slightly varied parameters.

LLM involvement:

Decide which correction strategy to use given:
- The type of failure.
- Task urgency and constraints.
Update the task graph or behavior tree accordingly:
- Add new search steps.
- Skip or reorder tasks if necessary.

4.4 Reinforcement-Based Task Optimization

Even with good planning, performance can improve with experience.

Concepts:

Reward signals:
- Task success or failure.
- Time to completion.
- Smoothness and safety of motion (e.g., number of near-collisions).
Policy refinement:
- Adjusting skill parameters (speeds, thresholds).
- Selecting better sub-goal strategies (short vs long paths).

Practical approach:

Log each task execution:
- Initial plan.
- Sensor traces.
- Decisions taken.
- Outcomes and rewards.
Use logs offline to:
- Train or fine-tune decision policies (e.g., RL or supervised learning).
- Inform LLM prompting (“few-shot” examples of good vs bad plans).

The goal is not full-blown RL research, but to:

Build intuition for feedback-driven improvement.
Enable gradual refinement of behaviors over the course of the project.

4.5 Lab: Natural Language Command Execution

This lab ties together language, planning, and execution for simple command sets.

Objectives

Enable the robot to interpret a small set of natural language commands.
Map commands to task graphs or behavior trees and execute them end-to-end.

Tasks

Command Set Definition
- Choose a limited, well-defined set of commands, such as:
  - "Bring me the mug."
  - "Follow that person."
  - "Inspect the room."
LLM Interface
- Implement a node that:
  - Receives text commands (from CLI, UI, or speech-to-text).
  - Calls an LLM with prompts tailored to your schema.
  - Receives a structured plan or task graph.
Plan Execution
- Feed the plan into your behavior tree or task-execution framework.
- Monitor progress and handle:
  - Success.
  - Failure with fallback.
Evaluation
- Run multiple trials with variations in:
  - Object positions.
  - Initial robot pose.
  - Minor language variations.

Deliverables

LLM integration node and plan schema.
Behavior tree/task graph definitions for chosen commands.
Logs and a brief report describing:
- How commands are parsed.
- How failures are handled.
- Where human clarification is still needed.

This lab showcases your humanoid’s emerging agentic behavior: it can understand high-level instructions and coordinate navigation, perception, and manipulation to carry them out.

AI Assistant

4.1 Natural Language → Task Graph Translation​

4.2 Closed-Loop Autonomy with Perception​

4.3 Self-Correction and Re-Scanning​

4.4 Reinforcement-Based Task Optimization​

4.5 Lab: Natural Language Command Execution​

Objectives​

Tasks​

Deliverables​