Word Segmentation Without Statistics


A critical step in language acquisition is segmenting speech into words. Prior cross-linguistic research demonstrates the importance of language specific segmentation strategies (Gervain & Erra, 2012). English-learning infants can use statistical information to segment words in laboratory experiments (Saffran et al., 1996), but its informativeness in segmenting English is unclear (Yang & Gambell, 2004; Swingley, 2005). We analyzed one corpus of 611,837 child-directed utterances (Theakston et al, 2000; MacWhinney, 2000), and one corpus of 50,776 adult-directed utterances (Pitt et al., 2007) and found that, in both corpora, a simple strategy that assumes that each syllable is a word would be highly effective in segmenting words. Accuracy was 69.68% and 61.7% in the child and adult corpora, respectively, segmenting 87.44% and 87.90% of the corpora. These findings should guide further investigations into the processes infants actually use to segment speech, and more broadly how they learn appropriate language-specific word segmentation strategies.

Back to Table of Contents