Surprisal-based comparison between a symbolic and a connectionist model of sentence processing

Abstract

The `unlexicalized surprisal' of a word in sentence context is defined as the negative logarithm of the probability of the word's part-of-speech given the sequence of previous parts-of-speech of the sentence. Unlexicalized surprisal is known to correlate with word reading time. Here, it is shown that this correlation grows stronger when surprisal values are estimated by a more accurate language model, indicating that readers make use of an objectively accurate probabilistic language model. Also, surprisals as estimated by a Simple Recurrent Network (SRN) were found to correlate more strongly with reading-time data than surprisals estimated by a Probabilistic Context-Free Grammar (PCFG). This suggests that the SRN forms a more accurate psycholinguistic model.


Back to Friday Papers