Reference is the cognitive mechanism that binds real-world entities to their conceptual counterparts. Recent psycholinguistic studies using eye-tracking have shed light on the mechanisms used to establish shared referentiality across linguistic and visual modalities. It is unclear, however, whether vision plays an active role during linguistic processing. Here, we present a language production experiment that investigates how cued sentence encoding is influenced by visual properties in naturalistic scenes, such as the amount of clutter and the number of potential actors, as well as the animacy of the cue. The results show that clutter and number of actors correlate with longer response latencies in production, and with the generation of more complex structures. Cue animacy interacts with both clutter and number of actors, demonstrating a close coupling of linguistic and visual processing in reference assignment.