Previous research has shown that both adult and infant learners are able to learn the first-order regularities between individual words and objects under a cross-situational learning context. The present study further investigates whether, in addition to the first-order word-referent associations, adult learners are also sensitive to the second-order correlations between the phonological features of labels and the visuo-perceptual features of objects and whether they can use such features as a cue in categorizing novel objects. Two experiments were conducted to examine whether native speakers of English and Mandarin Chinese performed differently when exposed to training data that either reflected, or were inconsistent with, the linguistic features in their native language. We found that when the training stimuli reflected the linguistic structures of their native language, both English and Mandarin speakers were able to use the phonological features of labels as a cue in object categorization. Moreover, our results also suggest bi-directional real-time interactions between learning the first-order word-referent mappings and the higher-order mappings between phonological features in labels and perceptual features in visual objects.