From 'Clean the Sticky Spill' to Motor Commands: Task Decomposition With VLA++

01— Architecture

Two Tiers, Not One Big Model

There is a recurring temptation to ask a single end-to-end model to take 'clean the sticky spill' and produce motor commands. We've concluded that is the wrong abstraction for service work. Operators give instructions at one level; controllers consume commands at another; the gap between them is exactly the kind of thing a small, inspectable plan should bridge.

Our architecture is a high-level planner (an LLM with tools) that decomposes the instruction into a sequence of cabin-cleaning stages, and a low-level VLA++ policy that executes each stage closed-loop on perception and force.

02— Behavior Trees

Where the LLM Stops, the Tree Starts

Inside each stage, we use behavior trees as a fallback when the policy reports low confidence. The tree's nodes are deliberately boring: 'retreat 5 cm', 'request operator confirmation', 'switch tool to wiper', 'rerun perception'. This gives us deterministic recovery behavior without asking the LLM to be a real-time controller.

03— Grounding

Tying Language to the Scene Graph

When an operator says 'the spill on the passenger seat', the planner needs to know which seat that is in the live scene graph. We maintain a labeled scene graph from perception (driver seat, passenger seat, rear bench, floor mats, console) and ground language references through a small classifier rather than relying on the LLM to reason about cabin layout from pixels.

04— Verification

Did We Actually Clean It?

Every stage ends with a verification step: a before/after visual comparison of the target region. The simplest version is a learned 'cleanliness' classifier on the cropped patch. If the verification fails, the plan re-enters the relevant stage with an escalated tool selection (e.g., from microfiber wiper to spray-and-wipe).

The honest limit: visual verification cannot detect everything (residual stickiness, smell). We complement it with operator spot-check workflows in the early deployments and treat that gap as part of the spec, not a failure.

05— Workflow

The Six Cabin Stages

Inspect (build scene graph). Pick (remove large debris by hand-equivalent grasps). Sort (separate trash from belongings; flag belongings for operator). Vacuum (loose debris on mats and seats). Wipe (hard surfaces, glass, high-touch points). Verify (per-region cleanliness check, regenerate plan if needed).

This sequence is a strong default; it is not a religion. The planner can skip stages that the inspection pass shows unnecessary.

Topics

LLM planningtask decompositionVLAbehavior trees

Continue

Read the planner notes

Visit handybot.ai →

Related from the other side

Safety by Design: A Service Robot Operators Can Trust Around People →

Force limits, watchdogs, e-stops, and the certification path for commercial service environments.

From 'Clean the Sticky Spill' to Motor Commands: Task Decomposition With VLA++

Two Tiers, Not One Big Model

Where the LLM Stops, the Tree Starts

Tying Language to the Scene Graph

Did We Actually Clean It?

The Six Cabin Stages

Nav2 on a Holonomic Base: Centimeter-Accurate Docking at the Wash Bay

VLA++ in Practice: Fusing Vision, Language, Acoustics & Force into One Action Head

Segmenting 15 Interior Materials in Real Time on a Jetson Orin