Dev
Computer Vision

Segmenting 15 Interior Materials in Real Time on a Jetson Orin

Leather, alcantara, vinyl, fabric, glass, rubber mats — and why a SAM2 baseline gets fooled by reflections.

#03 · Dev15 minFor: CV engineers
01Dataset

Why Off-the-Shelf Doesn't Cover Cabins

Public segmentation datasets — ADE20K, COCO-Stuff, Cityscapes — have one or two cabin-relevant classes between them. None distinguish leather from vinyl, or alcantara from cloth, at the granularity a force-controlled wiper needs.

Our taxonomy has 15 classes: leather (real and synthetic), alcantara, woven fabric, vinyl, hard plastic (matte and glossy), wood trim, brushed metal, chrome trim, glass, rubber floor mat, carpet floor mat, headliner fabric, and exposed foam (a damage class).

Capture is done with a small handheld rig that pairs a polarized RGB camera with a depth sensor. Polarization helps separate dielectric reflections from the underlying material color, which is the single biggest source of label noise on glossy surfaces.

02Model

Distilling SAM2 for Edge Inference

SAM2 is a strong teacher: prompt it with points and it returns clean instance masks. It is also too heavy to run at sensor rate on a Jetson Orin alongside everything else the robot needs to compute.

Our approach is standard distillation. We pre-segment cabin imagery with SAM2 to get instance masks, label each instance with a material class via a small classifier head and human review, then train a compact student segmentation network (a MobileViT-class backbone with a lightweight decoder) to predict the 15-class semantic map directly.

The student does not need to match SAM2's mask quality on novel classes; it only needs to match it on the 15 classes we care about. That tradeoff is what makes real-time inference possible.

03Quantization

INT8 Without Killing Glass IoU

Glass is the class that suffers most from naive INT8 quantization, because the model relies on subtle highlight cues that compress poorly into 8-bit activations. The fix is per-channel quantization for the decoder layers and a calibration set that is deliberately over-weighted toward cabins with prominent glass — windshields, sunroofs, and infotainment screens.

We accept a small overall mIoU drop in exchange for keeping glass IoU close to FP16. The robot can tolerate confusing two kinds of plastic; it cannot tolerate confusing glass with vinyl.

04Closed Loop

From Segmentation to Force Setpoint

Segmentation is only useful if downstream control consumes it. The output of the model is sampled at the contact point of the wiper and used as an index into a per-material force lookup table — soft on leather, firm on glass, careful around exposed foam.

When the prediction is uncertain (low max-class probability or rapid class flicker between frames) the controller falls back to a conservative low-force regime and slows the wipe. Better a slow clean than a scratched A-pillar.

05Failure Modes

Where the Model Still Gets Fooled

Three failure modes recur. Chrome trim under direct sunlight reads as glass. Sun-bleached leather drifts toward fabric. Transparent floor mats over carpet read as carpet, which is technically correct for vision but wrong for the wiper, which now has to negotiate an invisible plastic layer.

We handle these with confidence thresholds and material-pair fallbacks rather than pretending the model is infallible. Honest failure modes are part of the spec.

Topics
segmentationSAM2TensorRTJetson Orinmaterials
Continue

Read the dataset notes

Visit handybot.ai →
Related from the other side
The Driverless Cleaning Problem: Keeping Robotaxis Spotless Without a Human in the Loop

Why scaled robotaxi services can't fully realize their economics until interior maintenance becomes autonomous too.

More Dev posts