%0 Journal Article %T Hilberg’s Conjecture – a Challenge for Machine Learning %A Dębowski, Łukasz %J Schedae Informaticae %V 2014 %R 10.4467/20838476SI.14.003.3020 %N Volume 23 %P 33-44 %K statistical language modeling, Hilberg’s conjecture, maximal repetition, grammar-based codes, Santa Fe processes %@ 1732-3916 %D 2015 %U https://ejournals.eu/en/journal/schedae-informaticae/article/hilbergs-conjecture-a-challenge-for-machine-learning %X We review three mathematical developments linked with Hilberg’s conjecture – a hypothesis about the power-law growth of entropy of texts in natural language, which sets up a challenge for machine learning. First, considerations concerning maximal repetition indicate that universal codes such as the Lempel-Ziv code may fail to efficiently compress sources that satisfy Hilberg’s conjecture. Second, Hilberg’s conjecture implies the empirically observed power-law growth of vocabulary in texts. Third, Hilberg’s conjecture can be explained by a hypothesis that texts describe consistently an infinite random object.