NONPARAMETRIC METHODS OF AUTHORSHIP ATTRIBUTION IN ENGLISH LITERATURE

Authors

  • D. A. Klyushin Faculty of Computer Science and Cybernetics, Taras Shevchenko Kiev National University, Kiev, Ukraine
  • V. Yu. Mykhaylyuk Faculty of Computer Science and Cybernetics, Taras Shevchenko Kiev National University, Kiev, Ukraine

DOI:

https://doi.org/10.17721/2706-9699.2020.1.04

Keywords:

Text Attribution, Authorship Identification, Petunin Statistics, Clustering, Nonparametric Test

Abstract

The paper describes the results of comparison of two nonparametric methods of authorship identification in English literature. It describes testing methods with and without clustering. A method was also proposed to select the n-grams that would best serve as a marker to identify the author. More than 800 texts of 16 authors were used for testing. The method using the density of the distribution is suitable for identifying authors of both large texts (50000+ characters) and small (10000+ characters) ones. A method that uses p-statistics is only suitable for large texts.

References

Kjell B. Authorship determination using letter pair frequency features with neural network classifiers. Literary and Linguistic Computing. 1994. 9(2). P. 119–124.

Kjell B., Woods W., Frieder O. Discrimination of authorship using visualization. Information Processing and Management. 1994. 30(1). P. 141–150.

Stamatatos E. Intrinsic Plagiarism Detection Using Character n-gram Profiles. In: Benno Stein, Paolo Rosso, Efstathios Stamatatos, Moshe Koppel, and Eneko Agirre, editors, SEPLN 2009 Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN 09) Universidad Politecnica de Valencia and CEUR-WS.org, September 2009. P. 38–46.

Stamatatos E. A survey of modern authorship attribution methods. Journal of the American Society for Information Science and Technology. 2009. 60(3). P. 538–556.

Houvardas J., Stamatatos E. N-Gram Feature Selection for Authorship Identification. In: Euzenat J., Domingue J. (eds) Artificial Intelligence: Methodology, Systems, and Applications. AIMSA 2006. Lecture Notes in Computer Science. 2006. vol 4183. Springer, Berlin, Heidelberg, pp. 77–86.

Juola P. Authorship attribution. Found. Trends Inf. Retr.. 2006. 1(3). P. 233–334.

Orlov Yu. N. Osminin K. P. Determination of the genre and author of a literary work by statistical methods. Applied Informatics. 2010.Vol. 26. No. 2. P. 95–108.

Orlov Yu. N. Osminin K. P. Methods of statistical analysis of literary texts. M.: Editorial URSS, 2012.

Borisov L. A., Orlov Yu. N., Osminin K. P. Identification of a text author by the letter frequency empirical distribution. Keldysh Institute preprints. 2013. 027. 26 p.

Diurdeva P., Mikhailova E., Shalymov D. Writer identification based on letter frequency distribution. In: В T. Tyutina, S. Balandin (ed.), 19th Conference of Open Innovations Association. FRUCT 2016. P. 24–33.

Peng J., Choo K., Ashman H. Bit-level n-gram based forensic authorship analysis on social media: Identifying individuals from linguistic profiles. Journal of Networked and Computer Applications. 2016. 70. P. 171–182.

Keselj V., Peng F., Cercone N., Thomas C. N-gram-based author profiles for authorship attribution. Proc. of the Pacific association for computational linguistics. 2003. 3. P. 255–264.

Boughaci D, Benmesbah M., Zebiri A. An improved N-grams based Model for Authorship Attribution. 2019 International Conference on Computer and Information Sciences (ICCIS), Sakaka, Saudi Arabia. 2019. P. 1–6.

Yaroshevskiy A., Klyushin D. Nonparametric Methods of Authorship Attribution in Classic and Modern Literature. In: 2019 IEEE International Conference on Advanced Trends in Information Theory (ATIT), Kyiv, Ukraine. 2019. PP. 465–469.

Klyushin, D.A., Petunin, Yu.I. A Nonparametric Test for the Equivalence of Populations Based on a Measure of Proximity of Samples. Ukrainian Mathematical Journal. 2003. 55 (2), P. 181–198.

Hill, B.M.. Posterior distribution of percentiles: Bayes’ theorem for sampling from a population. Journal of the ASA. 1968. 63. P. 677–691.

Published

2020-07-02

How to Cite

Klyushin, D. A., & Mykhaylyuk, V. Y. (2020). NONPARAMETRIC METHODS OF AUTHORSHIP ATTRIBUTION IN ENGLISH LITERATURE. Journal of Numerical and Applied Mathematics, (1 (133), 50–58. https://doi.org/10.17721/2706-9699.2020.1.04