Tuesday, September 11, 2012

1209.1751 (Ramon Ferrer-i-Cancho et al.)

Information content versus word length in random typing    [PDF]

Ramon Ferrer-i-Cancho, Fermín Moscoso del Prado Martín
Recently, it has been claimed that a linear relationship between a measure of information content and word length is expected from word length optimization and it has been shown that this linearity is supported by a strong correlation between information content and word length in many languages (Piantadosi et al. 2011, PNAS 108, 3825-3826). Here, we study in detail some connections between this measure and standard information theory. The relationship between the measure and word length is studied for the popular random typing process where a text is constructed by pressing keys at random from a keyboard containing letters and a space behaving as a word delimiter. Although this random process does not optimize word lengths according to information content, it exhibits a linear relationship between information content and word length. The exact slope and intercept are presented for three major variants of the random typing process. A strong correlation between information content and word length can simply arise from the units making a word (e.g., letters) and not necessarily from the interplay between a word and its context as proposed by Piantadosi et al. In itself, the linear relation does not entail the results of any optimization process.
View original: http://arxiv.org/abs/1209.1751

No comments:

Post a Comment