View Complete Reference

Zenkov, AV (2017)

A New Statistical Method of Text Attribution

International Journal of Professional Science, 2017, No. 1, pp. 6–21.

ISSN/ISBN: 2542-1085 DOI: Not available at this time.



Abstract: A new method of statistical analysis of texts is suggested. The frequency distribution of the first significant digits in numerals of connected authorial Russian-language texts is considered. Benford's law is found to hold approximately for these frequencies with a marked predominance of the digit 1. Deviations from Benford's law are statistically significant author peculiarities that allow, under certain conditions, to consider the problem of authorship and distinguish between texts by different authors. At the end of {1, 2,…, 8, 9} row, the digits distribution is subject to strong fluctuations and thus unrepresentative for our purpose. The approach suggested and the conclusions are backed by the examples of the computer analysis of works by M. Ageyev, V. Nabokov, M. Sholokhov, N. Nekrasov et al. The results are confirmed on the basis of non-parametric Mann-Whitney U test and hierarchical cluster analysis.


Bibtex:
@article {, AUTHOR = {Zenkov, Andrei Viacheslavovich}, TITLE = {A New Statistical Method of Text Attribution}, JOURNAL = {International Journal of Professional Science}, YEAR = {2017}, VOLUME = {1}, PAGES = {6--21}, URL = {http://scipro.ru/wp-content/uploads/2016/02/PS_01_2017.pdf}, }


Reference Type: Journal Article

Subject Area(s): Social Sciences, Statistics