Date of Award
1-1-2009
Language
English
Document Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
College/School/Department
Department of Information Science
Content Description
1 online resource (xiii, 194 pages) : illustrations (some color)
Dissertation/Thesis Chair
Jagdish Gangolly
Committee Members
Sue Faerman, Ozlem Uzuner
Keywords
10-Ks, annual report, detection, fraud, qualitative content, Corporation reports, Accounting fraud, Fraud investigation
Subject Categories
Accounting | Library and Information Science | Linguistics
Abstract
High profile cases of fraudulent financial reporting such as those that occurred at Enron and WorldCom have shaken public confidence in the U.S. financial reporting process and have raised serious concerns about the roles of auditors, regulators, and analysts in financial reporting. In order to address these concerns and restore public confidence, the Sarbanes-Oxley Act (SOX) of 2002 was enacted. However, SOX has not lived up to its promise. Numerous cases of fraudulent financial reporting have surfaced in the post-SOX era. So far, the major thrust of research has been on examining fraud that has already been discovered. This dissertation creates a methodology to proactively identify means to detect fraud by examining the qualitative content of annual reports using natural language processing tools. The methodology is created using Support Vector Machines, a supervised machine learning technique. In this research, we examine both the verbal content and the presentation style of the qualitative portion of the annual reports and seek to explore linguistic features that distinguish fraudulent annual reports from non-fraudulent annual reports. To detect fraud, it is important to investigate qualitative content as textual content of annual reports contains richer information than the financial ratios, which can be easily camouflaged. This study also creates a classification metric for early prediction of fraud by examining changes in the qualitative content of annual reports for pre-fraud, fraud and post-fraud periods of fraud companies. What distinguishes this methodology from earlier research on fraud detection is its use of qualitative textual content in annual reports as opposed to quantitative financial information such as ratios, which have limited ability to predict fraud as discussed in the literature. Our results indicate that employment of linguistic features is an effective means to detect fraud.
Recommended Citation
Goel, Sunita, "Qualitative information in annual reports & the detection of corporate fraud : a natural language processing perspective" (2009). Legacy Theses & Dissertations (2009 - 2024). 42.
https://scholarsarchive.library.albany.edu/legacy-etd/42