Identifying Predictive Collocations


Idioms and common multi-word expressions are often argued to be stored as chunks of words or fixed configurations in the mind, and to therefore be accessed faster and interpreted more easily than fully compositional word combinations. Experimental research has furthermore shown that a specific “recognition point” can be identified in such expressions, at which enough information is present to access the meaning of the whole expression and predict the remaining words of the collocation. In this paper, we suggest measures for automatically identifying those multi-word expressions where the first part is particularly predictive of the rest, and evaluate our measures against human association data collected in a cloze test.

