Predicting when words may appear: A Connectionist Model of Sentence Processing

Abstract

We introduce a connectionist version of the Syntagmatic Paradigmatic model Dennis (2005) and train it on subcorpora drawn from the gigaword corpus. Decaying syntagmatic representations of the words to the left and right are used to estimate the paradigmatic associates of a word. The pattern of paradigmatic associates is then combined with an order independent associative representation of the surrounding words to predict the word that will appear in a given slot. The best performing version of the model produced a perplexity of 28.3 on a vocabulary of 5006 words, significantly lower than Good Turing and Kneser Ney ngram models trained under the same conditions. Furthermore, we changed parameters and lesioned components to isolate which properties of the model are critical to its performance. Online update of the weights - a kind of priming - allows the model to track short term contingencies and significantly improves performance. When we removed the paradigmatic and associative layers performance dropped, when we removed just the associative layer performance dropped and when we removed the right context from the syntagmatic and associative layers performance also dropped - suggesting that all of the hypothesized components of the model are crucial to its performance.


Back to Thursday Posters