Topic 4 — LLM-Based Decision Making & Reasoning
Large language models (LLMs) and Vision-Language Models (VLMs) can serve as powerful high-level planners for humanoid robots. This topic explains how to translate natural language instructions into structured task graphs, how to keep autonomy closed-loop with perception, and how reinforcement-style feedback can refine task execution over time.
4.1 Natural Language → Task Graph Translation
High-level instructions:
- "Bring me the red mug from the kitchen."
- "Follow the person in the blue shirt and stop two meters behind them."
- "Inspect the room and tell me what changed from yesterday."
LLM responsibilities:
- Parse the instruction:
- Identify goals (deliver, follow, inspect).
- Identify entities (red mug, kitchen, person, room).
- Identify constraints (distance, order, time).
- Produce a structured plan, such as:
- List of steps.
- Behavior tree skeleton.
- Goal and sub-goal definitions.
Example plan structure:
- Task: "Deliver red mug from kitchen to user."
- Step 1: Navigate to kitchen.
- Step 2: Search for red mug.
- Step 3: Pick red mug.
- Step 4: Navigate to user location.
- Step 5: Hand over mug.
Implementation approach:
- Define a schema for tasks:
- JSON or YAML structure listing steps, preconditions, and success criteria.
- Prompt the LLM to output tasks in that schema.
- Validate and sanitize LLM outputs before execution.
4.2 Closed-Loop Autonomy with Perception
Autonomy is not a one-shot plan; it is a loop:
- Sense (perception, SLAM, state estimation).
- Think (LLM + planners).
- Act (skills and controllers).
- Check (did the action succeed?).
LLMs can participate in this loop by:
- Re-evaluating plans when:
- Objects are not detected where expected.
- Paths are blocked.
- Goals become unreachable.
- Generating clarifying questions:
- "I cannot find a red mug in the kitchen. Should I search the living room?"
- "The path to the storage room is blocked. Do you want me to wait or take a longer route?"
Design considerations:
- Limit how often you call the LLM:
- Use it for macro-decisions, not low-level control.
- Provide it with:
- A summary of the current world state (detected objects, maps, robot pose).
- A history of recent actions and outcomes.
4.3 Self-Correction and Re-Scanning
Failures are inevitable:
- Object not detected due to occlusion.
- Grasp fails due to misalignment.
- Localization drifts.
The agent must:
- Detect failure conditions (e.g., no object in gripper, target not in view).
- Trigger self-correction behaviors:
- Re-scan the environment.
- Adjust position or viewpoint.
- Retry skills with slightly varied parameters.
LLM involvement:
- Decide which correction strategy to use given:
- The type of failure.
- Task urgency and constraints.
- Update the task graph or behavior tree accordingly:
- Add new search steps.
- Skip or reorder tasks if necessary.
4.4 Reinforcement-Based Task Optimization
Even with good planning, performance can improve with experience.
Concepts:
- Reward signals:
- Task success or failure.
- Time to completion.
- Smoothness and safety of motion (e.g., number of near-collisions).
- Policy refinement:
- Adjusting skill parameters (speeds, thresholds).
- Selecting better sub-goal strategies (short vs long paths).
Practical approach:
- Log each task execution:
- Initial plan.
- Sensor traces.
- Decisions taken.
- Outcomes and rewards.
- Use logs offline to:
- Train or fine-tune decision policies (e.g., RL or supervised learning).
- Inform LLM prompting (“few-shot” examples of good vs bad plans).
The goal is not full-blown RL research, but to:
- Build intuition for feedback-driven improvement.
- Enable gradual refinement of behaviors over the course of the project.
4.5 Lab: Natural Language Command Execution
This lab ties together language, planning, and execution for simple command sets.
Objectives
- Enable the robot to interpret a small set of natural language commands.
- Map commands to task graphs or behavior trees and execute them end-to-end.
Tasks
- Command Set Definition
- Choose a limited, well-defined set of commands, such as:
- "Bring me the mug."
- "Follow that person."
- "Inspect the room."
- Choose a limited, well-defined set of commands, such as:
- LLM Interface
- Implement a node that:
- Receives text commands (from CLI, UI, or speech-to-text).
- Calls an LLM with prompts tailored to your schema.
- Receives a structured plan or task graph.
- Implement a node that:
- Plan Execution
- Feed the plan into your behavior tree or task-execution framework.
- Monitor progress and handle:
- Success.
- Failure with fallback.
- Evaluation
- Run multiple trials with variations in:
- Object positions.
- Initial robot pose.
- Minor language variations.
- Run multiple trials with variations in:
Deliverables
- LLM integration node and plan schema.
- Behavior tree/task graph definitions for chosen commands.
- Logs and a brief report describing:
- How commands are parsed.
- How failures are handled.
- Where human clarification is still needed.
This lab showcases your humanoid’s emerging agentic behavior: it can understand high-level instructions and coordinate navigation, perception, and manipulation to carry them out.