The missing baselines in arguments for the optimal efficiency of languages


I argue that linear correlations between log word frequency,and lexical measures, cannot be taken as evidence for a "Principle of Minimum Effort". The Principle of Maximum Entropy indicates that such relations are in fact the ones most probable to be found. For such claims, one needs to compare the correlations with adequate baselines reflecting what would be expected in a purely random system. I then introduce a way of computing such baselines, and use it to show that the correlations found in a corpus are actually weaker than what one would expect to find by chance. Therefore, if an argument were to be made based on them, it would paradoxically be that language is worse for communication than what one would expect to find in a random system. More appropriately however, what these results reflect is that such correlations are not the best places to look for linguistic optimality.

Back to Table of Contents