Studies in Polish Linguistics

Style-Markers in Authorship Attribution A Cross-Language Study of the Authorial Fingerprint

Publication date: 15.10.2011

Studies in Polish Linguistics, Volume 6 (2011), Vol. 6, Issue 1, pp. 99 - 114


Maciej Eder
Polish Academy of Sciences, Warsaw, Poland
University of the National Education Commission, Krakow
ul. Podchorążych 2, 30-084 Kraków, Poland
The present study addresses one of the theoretical problems of computer-assisted authorship attribution, namely the question which traceable features of language can betray authorial uniqueness (a stylistic fingerprint) of literary texts. A number of recent approaches show that apart from lexical measures — especially those relying on the frequencies of the most frequent words — also some other features of written language are considerably effective as discriminators of authorial style. However, there have been no attempts to compare the attribution potential of these features. The aim of the present study, then, was to examine the effectiveness of several style-markers in authorship attribution. The style-markers chosen for the empirical investigation are those that can be retrieved from a non-lemmatized corpus of plain text files, such as the most frequent words, word bi-grams, different letter sequences, and markers of different nature, combined in one sample. Equally important, however, was to compare usefulness of the chosen style-markers across a few languages: English, Polish, German, and Latin. The results confirmed a high attribution effectiveness of word-based style-markers in the English corpus, but the alternative markers are shown to be usually more effective in the other languages.


