Learning Semantic Representations with Hidden Markov Topics Models

Abstract

In this paper, we describe a model that learns semantic representations from the distributional statistics of language. This model, however, goes beyond the common bag-of-words paradigm, and infers semantic representations by taking into account the inherent sequential nature of linguistic data. The model we describe, and which we refer to as a Hidden Markov Topics model is a natural extension of the current state of the art in Bayesian bag-of-words models, i.e. the Topics model of Griffiths, Steyvers, and Tenenbaum (2007), preserving its strengths while extending its scope to incorporate more fine-grained linguistic information.


Back to Thursday Papers