International Journal of Accounting Information Systems, Vol. 11, No. 3, pp. 157–181.
ISSN/ISBN: Not available at this time. DOI: 10.1016/j.accinf.2010.08.001
Abstract: Fraud detection has become a critical component of financial audits and audit standards have heightened emphasis on journal entries as part of fraud detection. This paper canvasses perspectives on applying data mining techniques to journal entries. In the past, the impediment to researching journal entry data mining is getting access to journal entry data sets, which may explain why the published research in this area is a null set. For this project, we had access to journal entry data sets for 29 different organizations. Our initial exploratory test of the data sets had interesting preliminary findings. (1) For all 29 entities, the distribution of first digits of journal dollar amounts differed from that expected by Benford's Law. (2) Regarding last digits, unlike first digits, which are expected to have a logarithmic distribution, the last digits would be expected to have a uniform distribution. Our test found that the distribution was not uniform for many of the entities. In fact, eight entities had one number whose frequency was three times more than expected. (3) We compared the number of accounts related to the top five most frequently occurring three last digit combinations. Four entities had a very high occurrences of the most frequent three digit combinations that involved only a small set of accounts, one entity had a low occurrences of the most frequent three digit combination that involved a large set of accounts and 24 had a low occurrences of the most frequent three digit combinations that involved a small set of accounts. In general, the first four entities would probably pose the highest risk of fraud because it could indicate that the fraudster is covering up or falsifying a particular class of transactions. In the future, we will apply more data mining techniques to discover other patterns and relationships in the data sets. We also want to seed the dataset with fraud indicators (e.g., pairs of accounts that would not be expected in a journal entry) and compare the sensitivity of the different data mining techniques to find these seeded indicators.
title = "Data mining journal entries for fraud detection: An exploratory study ",
journal = "International Journal of Accounting Information Systems ",
volume = "11",
number = "3",
pages = "157 - 181",
year = "2010",
note = "2009 Research Symposium on Information Integrity & Information Systems Assurance ",
issn = "1467-0895",
doi = "http://dx.doi.org/10.1016/j.accinf.2010.08.001",
url = "http://www.sciencedirect.com/science/article/pii/S1467089510000540",
author = "Roger S. Debreceny and Glen L. Gray",
Reference Type: Journal Article
Subject Area(s): Accounting