PanoVine

Whole-Body Visuomotor Control for Soft Growing Vine Robot

Yimeng Qin*   Xiaomeng Xu*   William Heap   Aditi Oak   Shuran Song   Allison Okamura

*Equal Contributions   Equal Advising

PanoVine system overview
PanoVine features (A) a 6 m soft growing vine robot with (B) 19 cameras distributed along its body. (C) Identical action commands lead to drastically different configurations due to unpredictable buckling, hysteresis, and environment interaction. (D) A learned whole-body visuomotor policy enables diverse navigation and manipulation skills.

Abstract

Vine robots, a class of soft, growing robots, are well suited to navigating complex and confined environments thanks to their compliant bodies and self-supporting growth mechanism. However, hysteresis, tether interactions, and deformations make them difficult to predict and model, which limits conventional planning and control. In this work we present a data-driven, vision-based control framework for the first autonomous vine robot system. Our system integrates 19 cameras distributed along the robot's body to provide comprehensive feedback of both the robot state and the surrounding environment. Using this rich whole-body vision feedback, we train an end-to-end visuomotor policy from demonstrations for closed-loop autonomous control. The policy aggregates information from distributed sensing while remaining robust to inaccurate robot states and actuation. Experiments demonstrate robust navigation and manipulation in challenging scenarios—steering through branched structures, climbing slopes, traversing unsupported terrain, reaching objects precisely, and maneuvering through confined spaces and obstacles.

Robot Design

Growth and steering. The 6 m, 7-DoF robot lengthens by everting body material at the tip and steers its shape with six distributed revolute joints.

Whole-Body Vision

19 body-mounted RGB cameras are gradually revealed as the robot grows, collectively providing multi-perspective feedback of both the robot body and its environment.

Camera views during course navigation
Course navigation. Cameras are progressively revealed during growth; they observe the branch, obstacles, target etc.
Camera views during object reaching
Object reaching. The object becomes visible to successively more body cameras as the robot extends and steers toward it.

Whole-Body Visuomotor Policy

We learn an end-to-end visuomotor policy from teleoperated demonstrations. At each step it maps a history of multi-view images and proprioception to an action chunk.

PanoVine policy architecture
Environment and robot states are observed through 19 cameras plus growth/steering sensors. Each image is encoded by a ViT class token feature; vision tokens and proprioception are cross-attended by a diffusion-transformer policy that predicts six steering actions and a growing action.

Complex Course Navigation

A 6 m, 1.5 m-tall course chaining five skills—branch selection, slope climbing, unsupported-gap traversal, obstacle avoidance, and a sharp final turn. PanoVine reaches 80% success.

Ours · autonomous policy
Autonomous rollout. The policy reactively steers through the branch, climbs the 45° slope, bends across the unsupported gap, avoids obstacles, and makes the final sharp turn to the exit.
Baseline · open-loop trajectory replay
Replay baseline (0% success). Replaying a successful demonstration open-loop collides with obstacles and falls short of the goal, confirming the course is unsolvable without closed-loop visual feedback.

Precise Object Reaching

After 2 m of growth the robot must align its tip with an object to within a small angular tolerance, across seen and unseen objects at five locations. PanoVine reaches 85% success.

Ours · multi-camera policy
Multi-camera reaching. The policy grounds its steering on the object's visual appearance across multiple body cameras, incrementally adjusting its bend as the object comes into view.
Baseline · single-camera policy
Single-camera baseline (0% success). With only the base camera, the object is occluded or leaves the field of view; the policy fails to turn toward it and grows past it.

Acknowledgments

The authors would like to thank the CHARM Lab and REALab members for their helpful discussions and feedback on the manuscript. Xiaomeng Xu is supported by the Stanford Interdisciplinary Graduate Fellowship, and Yimeng Qin is supported by the Stanford Woods Institute for the Environment. This work was supported in part by NSF Awards #2143601, #2037101, and #2132519, an Amazon Research Gift, Stanford System-X, the Stanford Woods Institute for the Environment, and the Stanford University Sustainability Accelerator. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of the sponsors.

Citation

@misc{qin2026panovinewholebodyvisuomotorcontrol,
      title={PanoVine: Whole-Body Visuomotor Control for Soft Growing Vine Robot}, 
      author={Yimeng Qin and Xiaomeng Xu and William Heap and Aditi Oak and Shuran Song and Allison Okamura},
      year={2026},
      eprint={2606.22923},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2606.22923}, 
}