Dev
Edge AI

Latency Budgets for a Service Robot: TensorRT on Jetson Orin

What an end-to-end perception → action pipeline costs in milliseconds, and how we plan to keep the safety classifier ahead of the main policy.

#06 · Dev13 minFor: Edge ML engineers
01Budget

Where the Milliseconds Go

A useful exercise before optimizing anything: write down the budget. For a closed-loop manipulation step we target a control rate that the impedance controller can comfortably consume. The budget breaks down into camera capture, image preprocessing, perception inference, policy inference, action post-processing, and transport to the motor controller.

Each of these is bounded individually, not as an aggregate, so a single slow stage can't silently eat into the next. When a stage misses its budget, the watchdog (below) takes over.

02Two Models

A Safety Classifier in Parallel

The main policy is a multi-modal transformer. The safety classifier is a small, fast network whose only job is to answer one question: is there a hand, or any unexpected human body part, inside the cabin work envelope?

It runs on a separate stream, on a separate execution context, at a higher rate than the main policy. Its veto goes directly to the controller and can preempt any in-flight motion. This is deliberately redundant with the hardware safety system; defense in depth matters when the answer to 'is a person there?' has to be wrong essentially never.

03Quantization

INT8 for Perception, FP16 for the Head

TensorRT INT8 is great for the perception backbone — convolutions and attention layers tolerate quantization well with a representative calibration set. We are more conservative on the action head: FP16 here, because small numerical drift in the predicted action distribution shows up as noisy motion that the controller has to filter out.

The trade is worth it. Perception is the heavy stage; quantizing it to INT8 frees enough budget to keep the action head in FP16 without missing the control rate.

04Watchdog

Graceful Retract on Timeout

If any perception or policy stage misses its deadline, the watchdog issues a retract command: lift the tool a few centimeters along the surface normal, hold position, and re-issue the perception request. From the operator's perspective the robot pauses; from the system's perspective it is in a known safe state, not executing a stale plan.

This is far more useful than alarming and dropping out, which is what most off-the-shelf inference timeouts default to.

05Power & Thermal

Inside an Enclosed Wash Bay

Wash bays are warm and humid. The Jetson runs at a fixed power mode chosen to keep junction temperature comfortably below thermal throttling under continuous load, with airflow assisted by a sealed fan duct. We monitor `tegrastats` continuously and alert if sustained throttling is detected — a thermal-throttled inference path is a silent latency regression that will not show up in any unit test.

Topics
TensorRTJetson Orinlatencysafety
Continue

Read the latency notes

Visit handybot.ai →
Related from the other side
Why Physical AI Beats Software-Only Automation for Service Work

Scheduling apps don't clean cars. A look at why the next wave of operational leverage in service industries is embodied.

More Dev posts