The reported study examined whether the degree of spatial information conveyed through a text influences the effectiveness of multimedia presentations. It was assumed that the processing of spatial text contents might interfere in the visuo-spatial sketchpad with the execution of eye movements, associated with looking at pictures and reading. Accordingly, performance impairments were expected when presenting spatial (rather than visual) text contents along with pictures and, furthermore, when presenting spatial text contents in written instead of spoken form. Fifty-nine students were randomly assigned to four groups, resulting from a 2 × 2 design, with text contents (visual vs. spatial) and text modality (spoken vs. written) as independent variables. Consistent with our assumptions, learners with spatial text contents showed worse recall than those with visual text contents. However, there were no differences between written and spoken spatial text contents. Implications for learning with multimedia are discussed.