View Complete Reference

Gao, J, Zhao, Y and Cui, R (2020)

Research on the Applicability of Benford’s Law in Chinese Texts

Proceedings of 2nd International Conference on Artificial Intelligence and Advanced Manufacture (AIAM), Manchester, United Kingdom, pp. 13-17.

ISSN/ISBN: Not available at this time. DOI: 10.1109/AIAM50918.2020.00009



Abstract: This paper aims to research the applicability of Benford's Law in Chinese texts. Firstly, the Chinese corpus was collected and word segmentation was performed. The distributions of the first digit of frequency were calculated for words, low-frequency words and single characters respectively in Chinese texts, and the relative entropy (Kullback-Leibler distance) between the distributions and the general Benford's law. Secondly, the parameter value range of the Generalized Benford's law was researched, and in view of the limitation of Zipf's law that is only applicable to large amounts of data, we carried out a statistical analysis of small-scale data. Then, the experimental analysis of the probability of the first digit of the word frequency of the single character data was carried out to explore the applicability of the Generalized Benford's law for single-character data. Finally, the applicability of Benford's law was investigated for artificially modified corpus. The results show that the words and characters in Chinese texts conform to the Benford's law, and Benford's law overcomes the limitation of Zipf's law on the size of the data sets, and the Generalized Benford's law has the ability to discriminate the natural quality of the corpus, which has important practical significance for Chinese information processing.


Bibtex:
@INPROCEEDINGS{, author={Gao, Junlong and Zhao, Yahui and Cui, Rongyi}, booktitle={2020 2nd International Conference on Artificial Intelligence and Advanced Manufacture (AIAM)}, title={Research on the Applicability of Benford’s Law in Chinese Texts}, year={2020}, volume={}, number={}, pages={13-17}, doi={10.1109/AIAM50918.2020.00009}}


Reference Type: Conference Paper

Subject Area(s): Computer Science