Coordinating Touch and Vision to Learn What Objects Look Like

Abstract

We use contemporary machine learning methods to explore Piaget's idea that active interaction across modalities may be the engine for constructing knowledge about objects. Modality-specific invariances are explored as a potential mechanism by which Piaget's ideas may be implemented in practice. For example, object segmentation and pose invariant recogntion are difficult in the visual domain but trivial in the tactile/proprioceptive domain; touching an object easily delineates its physical boundaries. We can also rotate an object without modifying the proprioceptive and tactile information from our hands. This information may provide invariants that could be useful for training a visual system to recognize and segment objects. We developed the instrumentation necessary to simultaneously collect tactile, proprioceptive and visual information of a person interacting with eveyday objects. We then developed a system that learns pose invariant visual representations using proprioceptive and tactile information as the only training signal.


Back to Table of Contents