|
ABSTRACT
This article explores the effects of parameter settings in linguistic profiling, a technique in which large numbers of counts of linguistic features are used as a text profile which can then be compared to average profiles for groups of texts. Although the technique proves to be quite effective for authorship verification, with the best overall parameter settings yielding an equal error rate of 3% on a test corpus of student essays, the optimal parameters vary greatly depending on author and evaluation criterion.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Baayen, R. H., Van Halteren, H., Neijt, A., and Tweedie, F. 2002. An experiment in authorship attribution. In Proceedings of the International Conference in Textual Data Statistical Analysis (JADT). 69--75.
|
| |
2
|
Broeders, T. 2001. Forensic speech and audio analysis, forensic linguistics 1998-2001---A review. In Proceedings of the 13th Interpol Forensic Science Symposium Lyon, France.
|
| |
3
|
Chaski, C. 2001. Empirical evaluations of language-based author identification techniques. Forensic Linguistics 8, 1, 1--65.
|
| |
4
|
Coppen, P. A. 2003. Rejuvenating the amazon parser. Computational Linguistics in the Netherlands (CLIN'03), Antwerp, Belgium (Dec).
|
| |
5
|
|
| |
6
|
|
| |
7
|
|
| |
8
|
van Halteren, H., Haverkort, M., Baayen, R. H., Neijt, A., and Tweedie, F. 2005. New machine learning methods demonstrate the existence of a human stylome. J. Quantitat. Linguistics 12, 1, 65--78.
|
| |
9
|
Holmes, D. 1998. Authorship attribution. Literary Linguistic Comput. 13, 3, 111--117.
|
| |
10
|
Kešelj, V., Peng, F., Cercone, N., and Thomas, C. 2003. N-gram-based author profiles for authorship attribution. In Proceedings of the Pacific Conference Association for Computational Linguistics (PACLING'03). Halifax, Nova Scotia, Canada (Aug.).
|
| |
11
|
Koppel, M., and Schler, J. 2003. Exploiting stylistic idiosyncrasies for authorship attribution. In Proceedings of Workshop on Computational Approaches to Style Analysis and Synthesis, (IJCAI'03). Acapulco, Mexico.
|
| |
12
|
Koppel, M., Akiva, N., and Dagan, I. 2003. A corpus-independent feature set for style based text categorization. In Proceedings of Workshop on Computational Approaches to Style Analysis and Synthesis (IJCAI'03). Acapulco, Mexico.
|
| |
13
|
Mosteller, F. and Wallace, D. L. 1984. Applied Bayesian and Classical Inference in the Case of the Federalist Papers, 2nd ed. Springer Verlag, Berlin, Germany.
|
| |
14
|
Uit Den Boogaart, P. C. 1975. Woordfrequenties in geschreven en gesproken Nederlands. Oosthoek, Scheltema & Holkema, Utrecht.
|
|