Dev
Planning

From 'Clean the Sticky Spill' to Motor Commands: Task Decomposition With VLA++

How natural-language operator instructions become a sequenced plan: Inspect → Pick → Sort → Vacuum → Wipe → Verify.

#08 · Dev17 minFor: Robotics + LLM engineers
01Architecture

Two Tiers, Not One Big Model

There is a recurring temptation to ask a single end-to-end model to take 'clean the sticky spill' and produce motor commands. We've concluded that is the wrong abstraction for service work. Operators give instructions at one level; controllers consume commands at another; the gap between them is exactly the kind of thing a small, inspectable plan should bridge.

Our architecture is a high-level planner (an LLM with tools) that decomposes the instruction into a sequence of cabin-cleaning stages, and a low-level VLA++ policy that executes each stage closed-loop on perception and force.

02Behavior Trees

Where the LLM Stops, the Tree Starts

Inside each stage, we use behavior trees as a fallback when the policy reports low confidence. The tree's nodes are deliberately boring: 'retreat 5 cm', 'request operator confirmation', 'switch tool to wiper', 'rerun perception'. This gives us deterministic recovery behavior without asking the LLM to be a real-time controller.

03Grounding

Tying Language to the Scene Graph

When an operator says 'the spill on the passenger seat', the planner needs to know which seat that is in the live scene graph. We maintain a labeled scene graph from perception (driver seat, passenger seat, rear bench, floor mats, console) and ground language references through a small classifier rather than relying on the LLM to reason about cabin layout from pixels.

04Verification

Did We Actually Clean It?

Every stage ends with a verification step: a before/after visual comparison of the target region. The simplest version is a learned 'cleanliness' classifier on the cropped patch. If the verification fails, the plan re-enters the relevant stage with an escalated tool selection (e.g., from microfiber wiper to spray-and-wipe).

The honest limit: visual verification cannot detect everything (residual stickiness, smell). We complement it with operator spot-check workflows in the early deployments and treat that gap as part of the spec, not a failure.

05Workflow

The Six Cabin Stages

Inspect (build scene graph). Pick (remove large debris by hand-equivalent grasps). Sort (separate trash from belongings; flag belongings for operator). Vacuum (loose debris on mats and seats). Wipe (hard surfaces, glass, high-touch points). Verify (per-region cleanliness check, regenerate plan if needed).

This sequence is a strong default; it is not a religion. The planner can skip stages that the inspection pass shows unnecessary.

Topics
LLM planningtask decompositionVLAbehavior trees
Continue

Read the planner notes

Visit handybot.ai →
Related from the other side
Safety by Design: A Service Robot Operators Can Trust Around People

Force limits, watchdogs, e-stops, and the certification path for commercial service environments.

More Dev posts