Constant entropy rate and related hypotheses versus real language.

Abstract

Constant entropy rate (CER) and uniform information density (UID) are two hypotheses that have been put forward to explain a wide rage of linguistic phenomena. However, the concrete definition of these hypotheses is unclear for statistical research and a direct and in-depth evaluation of these hypotheses from their definition is missing to our knowledge. Here we consider four operational definitions of UID: full UID (UID holding for any combination of elements making the utterances), strong UID (UID holding for any utterance that has non-zero probability) and initial UID (strong UID holding for utterances beginning with a particular element). Here we examine the logical dependencies between these hypotheses. The comparison of the assumptions and predictions of these hypothesis with Hilberg's law and other statistical properties of real human language indicates that CER and related hypotheses are qualitative different from actual language and suggests that these hypotheses are incomplete and must be revised.


Back to Table of Contents