In this study, we provide evidence for a cross-modal interaction between the meaning of pantomimes and words when the visuo-spatial and perceptual information of these last is enhanced. We recorded behavioral and electrophysiological responses with a cross-modal repetition priming. Pantomimes of objects and actions were used to prime visually presented nouns and verbs with an image formation task. The behavioral results showed that the image formation times of words primed by a preceding gesture were faster in the matching meaning condition than in the mismatching one. Electrophysiological results confirmed the interaction between gesture and word meanings showing a N400 localized all over the scalp with a peak on the left anterior hemisphere. Overall, these results support the idea of a tight interplay between the meaning of pantomimes and words when perceptual information is enhanced in words at both the behavioral and neurophysiological levels.