Topic 4 — Testing, Evaluation & Demo Preparation

An impressive capstone is not just about code that “usually works”—it is about demonstrable reliability backed by tests, metrics, and a clear demo narrative. This topic shows you how to design experiments, collect evidence, and package everything into a compelling final presentation.

4.1 Testing Strategy for Physical AI Systems

Adopt a layered testing strategy:

Unit-level checks:
- Validate key algorithms (e.g., small planning utilities, filtering logic) with simple Python tests.
- Ensure message conversions and coordinate transforms behave as expected.
Integration tests:
- Launch core ROS 2 nodes in a minimal world and run scripted missions.
- Verify that end-to-end logic (perception → planning → control) functions in representative scenarios.
Regression tests:
- Save a few “golden” scenarios.
- Re-run them after changes to ensure behavior hasn’t regressed unexpectedly.

The goal is not exhaustive coverage but confidence in the most critical paths.

4.2 Designing Experiments & Metrics

Use the success metrics from Topic 1 to design simple but meaningful experiments:

Define 2–4 core scenarios (e.g., pick‑and‑place, room patrol, inspection loop).
For each, record:
- Success/failure of the task.
- Time to completion.
- Number of collisions or near-collisions, if applicable.
- Number of replans, retries, or fallbacks triggered.

Log results in a small table or CSV file, then visualize or summarize them in your final report.

4.3 Demo Script & Narrative

Your demo should tell a clear story:

Setup:
- Introduce the scenario: environment, robot(s), and user role.
- Describe the main capability in one or two sentences.
Live Walkthrough:
- Show how a human issues a command or configures a mission.
- Narrate what the robot is “thinking” at each stage: perception, planning, action.
- Point out how the system responds to small disturbances or unexpected events.
Wrap-Up:
- Summarize what worked, which metrics you achieved, and known limitations.
- Briefly sketch “what you would build next” with more time or resources.

Rehearse the demo multiple times in advance and log any failures so you can fix or work around them.

4.4 Artifacts: Logs, Visuals & Report

Prepare artifacts that support both technical and non-technical audiences:

Technical:
- ROS 2 bags, log files, and plots (trajectories, success rates, etc.).
- Architecture diagrams and data flow charts.
Non-technical / mixed:
- Short video clips or screenshots of key scenarios.
- Concise slides or a summary section in your report explaining the system at a high level.

Aim for reproducibility: another person should be able to follow your instructions to run the demo themselves.

4.5 Mini-Lab: Dry-Run Demo & Evaluation

Goal: Execute a full dry-run of your final demo and evaluation pipeline.

Tasks

Run through your entire demo script at least twice.
Collect metrics for at least 2–3 scenarios and update your results table.
Note all failure points and either:
- Fix them, or
- Document them as known limitations with explanations.

Deliverables

Updated demo script and slides/notes.
Evaluation results (tables, plots, or narrative summary).
A list of known issues and mitigations.

Summary

Testing, evaluation, and a well-prepared demo transform your capstone from “working code” into a convincing robotics project backed by evidence. In the final topic, you will focus on documentation, reflection, and future work, ensuring your project remains useful beyond the course.

AI Assistant

4.1 Testing Strategy for Physical AI Systems​

4.2 Designing Experiments & Metrics​

4.3 Demo Script & Narrative​

4.4 Artifacts: Logs, Visuals & Report​

4.5 Mini-Lab: Dry-Run Demo & Evaluation​

Tasks​

Deliverables​

Summary​